Android-based mobile educational platform for speech signal processing

79 downloads 0 Views 418KB Size Report
Android, speech signal processing, mobile educational platform, hands-on experiences ... 4. International Journal of Electrical Engineering Education 54(1) ...
Original Article

Android-based mobile educational platform for speech signal processing

International Journal of Electrical Engineering Education 2017, Vol. 54(1) 3–16 ß The Author(s) 2016 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0020720916639329 ije.sagepub.com

Nan Zhao, Minghu Wu and Jingjing Chen Abstract With the purpose of further mastering and grasping the course of speech signal processing, a novel Android-based, mobile-assisted educational platform (AEPS) is proposed in this paper. The goal of this work was to design AEPS as an educational signalprocessing auxiliary system by simulating signal analysis methods commonly used in speech signal processing and bridging the gap for transition from undergraduate study to industry practice or academic research. The educational platform is presented in a highly intuitive, easy-to-interpret and strongly maneuverable graphical user interface. It also has the characteristics of high portability, strong affordability, and easy adoptability for application extension and popularization. Through adequate intuitive user interface, rich visual information, and extensive hands-on experiences, it greatly facilitates students in authentic, interactive, and creative learning. This paper details a subjective evaluation of AEPS’s effectiveness as an educational tool. The result of the experiences shows that the proposed platform not only promotes the students’ learning interest and practical ability but also consolidates their understanding and impression of theoretical concepts. Keywords Android, speech signal processing, mobile educational platform, hands-on experiences

Introduction Speech signal processing (SSP) is often a required course for the undergraduate and postgraduate students in the major of electronic engineering. The course involves signal processing, information theory, linguistics, random processes, and many other concepts.1 To understand the course content well, students have to spend

Hubei Collaborative Innovation Center for High-efficient Utilization of Solar Energy, Hubei University of Technology, Wuhan, China Corresponding author: Nan Zhao, Hubei Collaborative Innovation Center for High-efficient Utilization of Solar Energy, Hubei University of Technology, Wuhan 430068, China. Email: [email protected]

4

International Journal of Electrical Engineering Education 54(1)

much more time compared with other subject areas. This probably leads to low attendance rate and teaching efficiency. Due to the advancement of mobile and wireless communication technologies, the concept of mobile learning has been arousing researchers’ interest. Through mobile devices, students can learn the content at any time and in any place. Additionally, students can interact with the surrounding environment related to the course content.2 Several researchers have studied the implications of the use of mobile devices for learning and the effects on students’ motivation and study efficiency.3–5 In Q1 2014, 279.4 million smart phones were shipped worldwide, 81% of which were Android devices. There are more than 600,000 apps available for Android devices,6 turning these devices into powerful general purpose computing platforms. These smart mobile devices and open platforms have opened many opportunities to provide students with low-cost, interactive education tools. Furthermore, Android is open-source with the graphical user environment written in Java. There is no need for us to maintain or update Android or any of its development tools.7 This allows us to focus limited resources on teaching rather than time consuming, in-house development. On the shoulder of the previous valuable works, this paper presents some ongoing work on developing a novel Android-based educational platform for SSP (AEPS) in universities and colleges. The platform is portable, affordable, and easy-to-adopt, which makes the delivery of course content much more flexible. The tool described here focus on SSP methods, not only demonstrating the principals of analyzing the human speech signal but also covering a broad range of topics about signal processing basics, such as digital processing of analog signals, sampling, quantization, convolution, stationary, spectral estimation, filtering, and statistical methods. Moreover, by utilizing a simple user interface (UI) of Androidbased mobile devices, AEPS is designed to provide an authentic and creative signal processing learning environment to facilitate students experiencing with augmented reality, encourage students to engage in activities, and accelerate their understanding and comprehension of content. This paper is organized as follows. First, a brief discussion of the mobile academic tools is given, and then a concise description of AEPS is discussed. Next, the demonstrations and evaluation are described. Finally, the limitations and challenges faced for AEPS are explained.

Educational framework in signal processing Electronic engineering students need to develop and master two main competencies: the knowledge and its application skills.8 With the evolution of affordable, accessible, and smart mobile devices, the concept of mobile education has attracted lots of interest. Mobile learning has advantages of providing pervasive learning environment, promoting the understanding of course content, and facilitating the interaction between teachers and students.

Zhao et al.

5

Compared to the extensive application of Matlab toolbox and web-based learning, the research of mobile-assisted learning platforms has just begun.9 A-JDSP10 and SEA11 are the two main developments applied to the Android platform recently. The A-JDSP focuses on the demonstration of basic signal processing content, while SEA enables students to understand typical speech enhancement techniques applied to the real-world noisy signals. The main feature of the two tools is the support for different aspect of SSP or digital signal processing. Although AEPS is regularly used for the educational platform of the course ‘‘SSP’’ in the School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, China, the final object is to design AEPS as an educational signal-processing auxiliary system by simulating signal analysis methods commonly used in SSP and bridging the gap for transition from undergraduate study to the industry practice or academic research. The main features of the AEPS include: 1. Promote students authentic learning. With Android mobile devices, students are immersed in an authentic learning environment. AEPS is able to provide students with extensive opportunities for hands-on learning of collecting, analyzing, and processing authentic signal in real-world applications. 2. Promote students interactive learning. AEPS provides more intuitive and easierto-interpret graphical UI than formula and command-line programs. Moreover, with the aid of file transfer methods in Android mobile devices, the results of signal analysis and processing can be saved and shared with someone else. 3. Promote students creative learning. By recording and analyzing their own voice, students are encouraged to interact with their ambient environment. With the various speech processing functions of AEPS, students can learn to ponder and solve problems based on what they have learned.

Android-based mobile educational platform Android-based mobile educational platform is designed to show students SSP methods by means of visualization and interactivity. The platform is developed in Java with the Android software development kit. Figure 1 shows the main interface with an interactive UI, which enables the users to record or load speech signal to be processed. Then, the users can choose some processing option to analysis the speech signal. AEPS presents users four primary speech analysis technologies to deepen the comprehension of the course content. The four elements are: 1. Time domain analysis: the analysis of a speech signal in time domain for each frame. 2. Frequency domain analysis: the analysis of a speech signal in frequency domain for each frame and the spectrogram in both time and frequency domains.

6

International Journal of Electrical Engineering Education 54(1)

Figure 1. The main UI and quick edit menu of AMEP.

3. Cepstral analysis: the analysis of a speech signal in cepstral domains for each frame. 4. Linear predictive coding (LPC) analysis: to calculate the LPC coefficients and spectrum for a speech signal for each frame. Since the speech signal is non-stationary in nature, short-term processing is often used to make sure that the speech signal is stationary with the frame of 10–30 ms.1 In AEPS, the users can configure the basic analysis parameters, such as frame duration, frame shift, window, FFT length, and LPC order, as shown in Figure 2.

Short-term time domain speech processing This section presents examples of AEPS in short-term time domain speech processing.

Short-term energy For a discrete-time signal xðnÞ, the short-term speech signal can be represented as (the nth frame speech after framing and windowing)1: xn ðmÞ ¼ xðmÞwðn  mÞ

nNþ1m n

ð1Þ

where wðnÞ is window function, n ¼ 0, 1T, 2T, . . . , N is the frame duration, and T is the frame shift.

Zhao et al.

7

Figure 2. Parameter setting of AMEP.

Then, the short-term energy En is denoted as: n X

En ¼

½xðmÞwðn  mÞ2

ð2Þ

m¼nNþ1

Figure 3 shows the short-term energy contours for speech signal loaded. Users can intuitively observe that the energy of voiced frame is larger than that of unvoiced frame and silence frame have the least energy. Therefore, short-term energy is commonly usable for the classification of voiced, unvoiced, and silence frames.

Short-term zero crossing rate The short-term ZCR of speech signal is defined as Zn ¼

1 X 1 N ½xðmÞwðn  mÞ 2N m¼0

ð3Þ

Figure 4 displays the short-term zero crossing rate of the same speech signal. As it is observed, compared to the unvoiced signal frame, the ZCR is appreciably low in the voiced frame. Hence, short-term ZCR can also be employed to distinguish voiced and unvoiced frames.

8

International Journal of Electrical Engineering Education 54(1)

Figure 3. Short-term energy contour for the speech signal.

Figure 4. Short-term zero crossing rate of a speech signal.

Short-term autocorrelation The short-term autocorrelation of a discrete-time signal xðnÞ is defined as Rn ðkÞ ¼

m¼N1k X m¼0

½xðmÞwðn  mÞ½xðm þ kÞwðn  k  mÞ

ð4Þ

Zhao et al.

9

Figure 5. The segments of speech and autocorrelation sequence: (a) voiced and (b) unvoiced.

In AEPS, the user can select one speech frame by moving a figure across the bottom waveform, and then the corresponding short-term autocorrelation sequence for the selected frame will be presented in the top of the figure. Figure 5 shows voiced and unvoiced frames and the corresponding short-term autocorrelation sequences. The position of red line in each bottom figure represents the corresponding speech frame. In the case of voiced frames, the short-term autocorrelation sequence is obviously periodical. As for the unvoiced signals, the shortterm autocorrelation sequence is presented without periodicity or peak, just like noise. The nature of short-term autocorrelation indicates the difference between voiced and unvoiced frames.

Short-term frequency domain speech processing Discrete-time short-term Fourier transform of the speech signal is given by 1 X   Xn ej! ¼ xðmÞwðn  mÞej!m

ð5Þ

m¼1

Based on the short-term Fourier analysis conducted with Hamming window, the spectrogram is obtained using the window size and frame shift of 30 ms and 4 ms, respectively. AEPS allows the user to select a frame by moving the cursor (red line) across the generated spectrogram in the bottom part of figure, and then the corresponding frequency analysis will be presented in the top part of figure, as shown in Figure 6.

10

International Journal of Electrical Engineering Education 54(1)

Figure 6. Frequency domain analysis of AMEP: (top) short-term Fourier analysis and (bottom) spectrogram.

Figure 7 depicts the short-term log magnitude spectrum of 30 ms unvoiced frame with Rectangular and Hamming windows. The peaks in the case of Rectangular window are relatively sharper than that of the other case. However, due to the high spectral leakage of Rectangular window, the spectrum seems to be more noisy than that with the Hamming window. Thus, Hamming window is preferred for the short-term spectral analysis of speech.

Cepstral analysis Short-time cepstrum is considered as a basis for estimating the parameters of the speech generation model. According to the speech production theory, the source filter model of speech production decomposes the speech signal sðnÞ into an excitation eðnÞ and a linear filter hðnÞ. In the frequency domain, the speech signal can be represented as Sð!Þ ¼ Eð!ÞHð!Þ

ð6Þ

Then, the magnitude spectrum of the speech signal is Sð!Þj ¼ jEð!Þj  jHð!Þj

ð7Þ

The logarithmic representation of equation (7) can be written by the following expression:       logSð!Þ ¼ logEð!Þ þ logHð!Þ ð8Þ

Zhao et al.

11

Figure 7. Short-term energy with various window types (30 ms): (a) Rectangular and (b) Hamming.

  In the low-frequency part of logSð!Þ, the gradual change components represent the vocal tract information related to Hð!Þ. While in the high-frequency part of logSð!Þ, the excitation information Eð!Þ varies rapidly. Hence, by using Fourier transform, the components of Eð!Þ and Hð!Þ can be separated. In AEPS, the user can select a speech frame by moving a cursor (red line) across the time waveform and view the corresponding cepstrum in the top of figure. Figure 8 shows voiced and unvoiced speech frames and the corresponding cepstrums. It can be observed that the initial few values (typically 13–15 cepstral values) in the voiced cepstrum represent the vocal tract information (formants). The large peak present after these initial values represents the excitation information (pitch). While in the unvoiced cepstrum, the variations in the lower quefrency region show the information about vocal tract and the fast varying nature towards the upper quefrency region represents the excitation characteristics. Thus, short-time cepstrum can be used to distinct between voiced and unvoiced speech and is also considered as efficient method to detect formants and pitch of speech signal.

Linear prediction analysis In linear prediction analysis, the current signal sample xðnÞ can be estimated from a weighted linear combination of the p previous samples, illustrated x^ ðnÞ ¼

p X

ak xðn  kÞ

k¼1

where x^ ðnÞ is the predicted value and ak is the kth linear prediction coefficient.

ð9Þ

12

International Journal of Electrical Engineering Education 54(1)

Figure 8. Cepstrum with various speech segment (30 ms): (a) unvoiced and (b) voiced.

Then, the prediction error eðnÞ is expressed as eðnÞ ¼ xðnÞ 

p X

ak xðn  kÞ

ð10Þ

k¼1

To achieve the minimum prediction error eðnÞ, the common method is to compute the LP coefficients with least squares auto correlation method.1 To analyze the characteristic in the frequency domain, the magnitude of power spectrum is given by P^ ð!Þ ¼

j1 

G2 jk!T j2 k¼1 ak e

Pp

ð11Þ

where G is the model gain. In AMPS, once the order of the LPC model in Figure 2 is selected, the LPCs will be calculated, and then the LPC magnitude spectrum for a specific frame will be presented in the top of figure. In practice, if LPC order p is large enough, the prediction error can be achieved arbitrarily small. Figure 9 shows an example of the LPC magnitude spectrum for a certain voiced frame with various LPC order p. Notice that the larger the LPC order, the more detailed the model frequency response. Moreover, the larger values of p lead to lower prediction errors, the more details of the spectrum are preserved.

Validation as an educational tool on SSP This section presents validation results obtained by surveying 20 students and 4 teachers in the speech processing course. An online survey was conducted after

Zhao et al.

13

Figure 9. LPC magnitude for a voiced frame (30 ms): (a) p ¼ 4, (b) p ¼ 8, and (c) p ¼ 16.

using AEPS for a one-hour laboratory. According to the content taught in the course, the tasks on the laboratory-covered topics such as speech production, the role of short-term processing, typical speech analysis, and signal processing methods. All the testers were provided with a LG Nexus 5 running the Android 4.4 operating system with AEPS pre-installed. After completing the laboratory, the testers were instructed to assess AEPS’s features, acceptance, and usability, shown in Table 1. The survey consisted of 17 questions with the score ranging from 1 (strongly disagree) to 5 (strongly agree) for each item. In respect of teaching and studying SSP, all the testers claimed to obtain more knowledge by using this platform, particularly the students (Q1 versus Q16). The motivation and interest of testers are probably inspired by the platform (Q2). AEPS is regarded to be suitable for the university teaching situation (Q3), which allows students to acquire, consolidate, and further theoretical concepts (Q4–Q7). In terms of usability (Q8–Q10), testers highlighted that AEPS has a friendly and interactive UI with abundant simulation capability and clear graphic illustration. Regarding AEPS’s technical and theoretical issues, testers scored a generally good rating (Q11–Q15). In general, testers give a relatively positive and satisfying assessment about AEPS (Q17). AEPS has contributed to an obvious enhancement of learning interest and consolidation of theoretical concepts.

14

International Journal of Electrical Engineering Education 54(1)

Table 1. Survey results of AMEP on a scale of 1 (strongly disagree) to 5 (strongly agree). No. 1 2 3

4 5 6 7 8 9 10

11 12 13 14 15

16

17

Requested features and capabilities Previous knowledge level on DSP The use of a mobile platform promotes motivation and interest This platform is feasible for implementation in the university situation This platform allows new theoretical concepts to be acquired This platform enables the consolidation of theoretical concepts This platform bridges theory to practice This platform encourages to further academic research This platform presents a clear and intuitive structure This platform was user friendly, interactive, and attractive The graphs and settings of this platform were logical, clear, and useful This speech analysis simulation platform was useful This platform allows the study of typical speech analysis methods This platform allows the study of real signal processing methods I would use this platform in a professional environment I would recommend this tools to help in the related teaching process The knowledge level of signal processing improved after using this platform My general assessment of this platform is positive

Students

Deviation

Teachers

Deviation

2.68 3.95

0.85 0.73

3.82 4.81

0.97 0.43

3.12

0.81

4.6

0.56

3.01

0.87

4.2

0.63

3.24

0.57

4.63

0.53

4.01

0.61

4.83

0.52

3.56

0.74

4.52

0.68

3.3

0.73

4.2

0.78

3.28

0.65

4.1

0.57

3.54

0.53

4.72

0.45

3.42

0.68

4.82

0.58

3.47

0.54

4.79

0.75

3.45

0.67

4.63

0.58

3.75

0.48

3.98

0.83

3.65

0.68

4.1

0.73

3.74

0.73

4.4

0.56

3.56

1

4

0

Zhao et al.

15

Conclusion Smart mobile devices are becoming readily available to university students, capitalizing on the mobility, computation ability, and interactive capabilities to increase the efficiency of teaching. The interest of this work lies in the presentation of a hands-on Android mobile educational platform, AEPS, devoted to assisting SSP education in both lectures and laboratories. By promoting students’ interactivity, creativity, and hands-on learning, AEPS can provide much flexibility in delivering the course content for shifts in conventional learning behaviors. A student–teacher evaluation survey indicated that AEPS was an effective complement to the teaching context with analysis visualization and hands-on interaction, and it had a positive impact on the student learning and comprehension ability. Declaration of Conflicting Interests The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Technology Research Program of Hubei Provincial Department of Education (No. D20141406), the PhD research startup foundation of Hubei University of Technology (No. BSQD13029), and the Teaching and Research Project of Hubei University of Technology (No. 2015014).

References 1. Rabiner L and Schafer R. Theory and applications of digital speech processing. Upper Saddle River, NJ: Prentice Hall, 2010. 2. Scott K and Benlamri R. Context-aware services for smart learning spaces. IEEE Trans Learn Technol 2010; 3: 214–227. 3. Swan K, van’t Hooft M, Kratcoski A, et al. Uses and effects of mobile computing devices in K–8 classrooms. J Res Technol Educ 2005; 38: 99–113. 4. Finkelstein J, Wood J and Cha E. Introducing a blackberry e-learning platform for interactive hypertension education. In: Second international conference on mobile, hybrid, and on-line learning, Saint Maarten, 10–16 February 2010, pp.77–81. New York: IEEE. 5. A third of smart phones shipped in Q1 had 500 -plus displays, http://www.canalys.com/ newsroom/third-smart-phones-shipped-q1-had-5-plus-displays (accessed 10 September 2014). 6. Andrus J and Nieh J. Teaching operating systems using android. In: Proceedings of the 43rd ACM technical Symposium on computer science education, Raleigh, NC, 29 February–3 March 2012, pp.613–618. New York: ACM. 7. Rugarcia A, Felder RM, Woods DR, et al. The future of engineering education—I: a vision for a new century. Chem Eng Educ 2000; 34: 16–25. 8. Syukur E and Loke SW. MHS learning services for pervasive campus environments. In: Proceedings of the 5th IEEE percom workshops, White Plains, NY, 19–23 March 2007, pp.204–210. New York: IEEE.

16

International Journal of Electrical Engineering Education 54(1)

9. Potts J, Moore N and Sukittanon S. Developing mobile learning applications for electrical engineering courses. In: Proceedings of IEEE southeastcon, Nashville, TN, 17–20 March 2011, pp.293–296. New York: IEEE. 10. Ranganath S, Thiagarajan JJ, Ramamurthy KN, et al. Undergraduate signal processing laboratories for the android operating system. In: ASEE annual conference, San Antonio, Texas, 10–13 June 2012. 11. Chappel R and Paliwal K. An educational platform to demonstrate speech processing techniques on Android based smart phones and tablets. Speech Comm 2014; 57: 13–38.

Suggest Documents