Malayalam speech controlled mulipurpose Robotic arm

Proceedings of 26th Kerala Science Congress Pookode, Wayanad: 28-31 Jan. 2014

07-05

Malayalam speech controlled mulipurpose Robotic arm Lajish V.L., Vivek P. and R.K. Sunil Kumar Department of Computer Science, University of Calicut, Kerala – 673 635 E-mail: [email protected], {vivekuoc, seuron74}@gmail.com

INTRODUCTION Speech is the most used way of communication for people. We born with the skills of speaking learn it easily during our early childhood and mostly communicate with each other with speech throughout our lives. By the developments of communication technologies in the last era, speech starts to be an important interface for many systems [Kamm.C.A,Walker.M, and Rabiner.L.R, (1997)]. Instead of using complex different interfaces, speech is easier to communicate with computers. Humans are used to interact with Natural Language (NL) in the social context. This idea leads Roboticist to make NL interface through Speech for the Multipurpose Robotic Arm (MRA) [Tejima, N. (2001)]. In this work, we take a step towards Malayalam Speech Controlled Multipurpose Robotic Arm (SCMRA) by considering the acoustic features of the spoken instruction. The robot is able to recognize spoken commands whose motions can be controlled by the user by giving specific voice commands. The speech recognition software running on a PC is capable of identifying the 8 voice commands given in the following table issued by any user. After processing the speech, the necessary motion instructions are given to the robotic arm via motor driver. Instructions in SI No: Meaning Malayalam 1

Left

2

Right

3

Pick

4

Place

5

Elbow Down

6

Elbow Up

7

Shoulder Down

8

Shoulder Up

Table 1. List of Malayalam instructions and its meaning

2243


The computer recognizes the command by speech recognition system. And then computer converts the voice command to direction command that predefined and recognizable by robot. When the robot gets the direction command, it moves according to spoken command. MATERIALS AND METHODS Automatic Speech Recognition Automatic Speech Recognition (ASR) is the process by which a computer (or other type of machine) identifies spoken words. The steps undergone in this work for speech recognition are explained below. Speech endpoint trimming As a pre-processing stage silence removal and segmentation of audio streams that contain speech is done. The method is based in two simple audio features (signal energy and spectral centroid). As long as the feature sequences are extracted, as thresholding approach is applied on those sequences, in order to detect the speech segments [Ganapathiraju, A., Webster, L. et.al (1996)]. Feature extraction (MFCC) The extraction of the best parametric representation of acoustic signals is an important task to produce a better recognition performance. The efficiency of this phase is important for the next phase since it affects its behavior. MFCC is based on human hearing perceptions which cannot perceive frequencies over 1KHz. In other words, MFCC is based on known variation of the human ear’s critical bandwidth with frequency [Zheng.F, Zhang.G, and Song.Z., (2001)]. A subjective pitch is present on Mel Frequency Scale to capture important characteristic of phonetic in speech. The mel-frequency scale is linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. Therefore we can use the following approximate formula to compute the mels for a given frequency f in Hz: (1) Feature matching (DTW) DTW algorithm is based on Dynamic Programming. This algorithm is used for measuring similarity between two time series which may vary in time or speed. This technique also used to find the optimal alignment between two times series if one time series may be “warped” nonlinearly by stretching or shrinking it along its time axis. This warping between two time series can then be used to find corresponding regions between the two time series or to determine the similarity between the two time series [Bala, A., Kumar, A., and Birla, N. (2010)]. To align two sequences using DTW, an n by m matrix where the (ith, jth) element of the matrix contains the distance d (qi, cj) between the two points qi and cj is constructed. Then, the absolute distance between the values of two sequences is calculated using the Euclidean distance computation as follows. (2)

2244


Each matrix element (i, j) corresponds to the alignment between the points qi and cj. Then, accumulated distance is measured by the equation (3). (3)

2245


Hardware In this project there are mainly three electronic components, that are properly integrated to design the speech controlled MRA. OWI 535 Robotic Arm The OWI-535 Robotic Arm Edge rides the wings of the award winning OWI Arm Trainer Robot Kit. The main functionalities of Robotic Arm Edge includes opening and closing of gripper, wrist motion of 120 degrees, elbow range of 300 degrees, base rotation of 270 degrees, base motion of 180 degrees, vertical reach of 15 inches, horizontal reach of 12.6 inches and lifting capacity of 100g.Some of the added features include a search light design on the gripper and a safety gear audible indicator is included on all five gear boxes to prevent any potential damage or gear breakage during operation. Total command and visual manipulation are done using the five switch wired controller, five motors, and five joints. Night time usage is possible and extended life on the gearbox prolongs predictable control of the robot arm's behavior.

Figure 1: Speech Controlled Multipurpose Robotic Arm (SCMRA) AT89C52 Microcontroller The AT89C52 is a low-power, high-performance CMOS 8-bit microcomputer with 8K bytes of Flash programmable and erasable read only memory (PEROM). The device is manufactured using Atmel’s high-density nonvolatile memory technology and is compatible with the industrystandard 80C51 and 80C52 instruction set and pinout. The on-chip Flash allows the program memory to be reprogrammed in-system or by a conventional nonvolatile memory programmer. By combining a versatile 8-bit CPU with Flash on a monolithic chip, the Atmel AT89C52 is a powerful microcomputer which provides a highly-flexible and cost-effective solution to many embedded control applications [Meshram, U., Bande, P., and Harkare, R. R., (2009)].

2246


L293D Motor Driver IC L293D H-bridge driver is the most commonly used driver for Bidirectional motor driving applications.The L293D has 4-half H-bridge drivers, which can be used to drive 2-DC motors bidirectionally. Here we are demonstrating how to drive a single DC motor using Half bridges 1 & 2. The DC motor is connected between OUT1 and OUT2 pins, and the pin IN1 is connected to the Microcontroller PWM output and pin IN2 is connected to a Microcontroller I/O port. Clockwise rotation: To rotate the motor in clockwise direction the IN2 pin is made LOW and a PWM signal is generated on IN1 pin. Anti-Clockwise rotation: To rotate the motor in clockwise direction the IN2 pin is made HIGH and a PWM signal is generated on IN1 pin. To simplify use as two bridges each pair of channels is equipped with an enable input. A separate supply input is provided for the logic, allowing operation at a lower voltage and internal clamp diodes are included. This device is suitable for use in switching applications at frequencies up to 5 kHz. The L293D is assembled in a 16 lead plastic package which has 4 center pins connected together and used for heat sinking The L293DD is assembled in a 20 lead surface mount which has 8 center pins connected together and used for heat sinking.

. Figure 2: (a) Pin diagram of at89c52

(b) Pin diagram of l293d ic

RESULTS AND DISCUSSION We have experimentally verified the effectiveness of the proposed SCMRA system. The training speech database consists of eight Malayalam speech instructions uttered by 20 different speakers (20 utterances of each instruction from every speaker) is constructed. The simulation experiments are conducted by integrating the SR unit and hardware units in a laboratory environment.

2247


Speech Input

Power Supply

AT89C52 Microcontroller

S R Unit

L293D Motor Driver IC

DC Motors

Robotic Arm Response

Figure 3: Block diagram of Speech Controlled Multipurpose Robotic Arm The word units are sampled at 8 kHz and quantized with 16 bits, and then processed at 10 ms frame rate with an overlapping Hamming window of 25ms. Then the speech units are parameterized with 12 Mel Frequency Cepstral Coefficients (MFCC) and normalized log energy, as well as their first and second order differences yielding a total of 39 components. In the next phase the non linear sequence alignment known as Dynamic Time Warping (DTW) introduced by Sakoe Chiba has been used as features matching techniques. This algorithm measures the similarity between two time series which may vary in time or speed and is also finds the optimal alignment between two times series if one time series may be “warped” non-linearly by stretching or shrinking it along its time axis. The code for the spoken instruction, which is recognized from the SR unit, is send to the microcontroller for the corresponding arm action. In response to the received instruction the motor driver IC embedded in the microcontroller operates the motors in SCMRA. The results obtained for the simulation experiments conducted using the above said methods are tabulated in table 2. Recognition Accuracy (%) Instructions

Average

Speaker dependent 93.33 86.67 80.00 75.18 96.66 83.33 90.00 83.33 86.06

Speaker independent 80.00 67.27 74.54 69.09 76.36 78.18 72.72 69.09 73.41

Table 2. Performance score of the proposed model

2248


CONCLUSION AND FUTURE WORK The Service Robot popularity gives the researcher more interest to work with speech interface for robots to make it more user friendly to the social context. In this work we designed a robot arm which is controlled with the Malayalam speech commands. The features of the commands are extracted with MFCC algorithm. The commands are recognized using DTW techniques. The recognized command converted to the form in which the robot can recognize. The final form of the commands is sent to the robot and the robot move accordingly. The system is tested with different command sets and the results are quite satisfactory (with an average recognition of accuracy 86.06 % for speaker depended system and 73.41% for speaker independent system). Our future work will focus on introducing more complex activities and sentence to the system and to implement functionality that makes use of Text-To-Speech to make a dialogue system combined with robotic arm. REFERENCES Bala, A., Kumar, A., & Birla, N. (2010). Voice command recognition system based on MFCC and DTW. International Journal of Engineering Science and Technology, 2(12), 73357342. Ganapathiraju, A., Webster, L., Trimble, J., Bush, K., and Kornman, P. (1996). Comparison of energy-based endpoint detectors for speech signal processing. In Southeastcon'96.'Bringing Together Education, Science and Technology'., Proceedings of the IEEE (pp. 500-503). IEEE. Kamm.C.A,Walker.M, Rabiner.L.R, (1997). The Role of Speech Processing in HumanComputer Intelligent Communication, Proc. HCI Workshop, Washington, DC, 169-190. Meshram, U., Bande, P., and Harkare, R. R. (2009, October). Hardware and Software Co-design for Robot Arm Position Control Using VHDL and FPGA. InAdvances in Recent Technologies in Communication and Computing, 2009. ARTCom'09. International Conference on (pp. 780-782). IEEE. Tejima, N. (2001). Rehabilitation robotics: a review. Advanced Robotics, 14(7), 551-564. Zheng, F., Zhang, G., and Song, Z. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology,16(6), 582-589.

2249