Natural Language Human-Robot Interface Using Evolvable Fuzzy Neural Networks for Mobile Technology Wojciech Kacalak and Maciej Majewski Koszalin University of Technology, Faculty of Mechanical Engineering Raclawicka 15-17, 75-620 Koszalin, Poland
[email protected],
[email protected]
Abstract. In this paper, a human-robot speech interface for mobile technology is described which consists of intelligent mechanisms of human identification, speech recognition, word and command recognition, command meaning and effect analysis, command safety assessment, process supervision as well as human reaction assessment. A review of selected issues is carried out with regards to recognition and evaluation of speech commands in natural language using hybrid neural networks. The paper presents experimental results of automatic recognition and evaluation of spoken commands of a manufacturing robot model simulating the execution of laser processing tasks in a virtual production process. Keywords: Human-robot interaction, Voice communication, Speech interface, Artificial intelligence, Hybrid neural networks, Mobile technology.
1
Introduction
The advantages of intelligent human-robot speech communication for mobile technology in a production process include the following: – More robustness against human’s errors and more efficient supervising of the process with the chosen level of supervision automation [1]. – Improvement of the co-operation between a human and a robot in respect to the richness of communication [2]. – Achievement of a higher level of organization of a technological process that is equipped with an intelligent two-way speech communication system, which is relevant for its efficiency and production humanization. – Technological decision and optimization systems can be remote elements with regard to a technological process. There is no need of a human to be present at the work station of the manufacturing robot. Commands produced in continuous speech are processed to text with a Tablet PC after identification and authentication of the human by the mobile system shown in abbreviated form on Fig. 1. The recognized text is analyzed by the meaning analysis subsystem performing word and command recognition. This D.-S. Huang et al. (Eds.): ICIC 2009, LNCS 5754, pp. 480–489, 2009. c Springer-Verlag Berlin Heidelberg 2009
Natural Language Human-Robot Interface
Human’s recognized meaningful commands
MODULE FOR ANALYZING STATUS CORRESPONDING TO HYPOTHETICAL COMMAND EXECUTION
Meaningful commands
Model of a technological process and process diagnostics
Analyzed command segments COMMAND SEGMENT ANALYSIS MODULE USING NEURAL NETWORKS Words in segments as command components COMMAND SYNTAX ANALYSIS MODULE Recognized words
Analyzed word segments
ANN learning file Words as command components
PROCESS STATE ESTIMATION MODULE USING NEURAL NETWORKS
COMMAND CORRECTNESS ASSESSMENT MODULE USING FUZZY LOGIC Incorrect command
WORD ANALYSIS MODULE USING NEURAL NETWORKS
HUMAN ERROR SIGNALING MODULE Correct command
Letters in segments as word components
MODULE OF GRADE ASSESSMENT OF AFFILIATION OF COMMANDS TO CORRECT COMMAND CATEGORY
LETTER STRING ANALYSIS MODULE Text commands in natural language
Grade of command safety or error message
Tablet PC transmitter to PDA
COMMAND CORRECTION MODULE Safe command
ERROR CONTROL MODULE
MODULE FOR PROCESSING VOICE COMMANDS TO TEXT Speech recognition engine on Tablet PC
MODULE FOR SIGNALING COMMANDS ACCEPTED FOR EXECUTION
Learning patterns of speech commands
PDA transmitter to Tablet PC
REACTION TIME ANALYSIS MODULE
HUMAN REACTION ASSESSMENT MODULE USING FUZZY LOGIC
VOICE COMMANDS ARCHIVE MODULE Speech signal Microphone Human’s speech command
SAFETY ASSESSMENT
WORD RECOGNITION MODULE USING EVOLVABLE FUZZY NEURAL NETWORKS
MEANING ANALYSIS
TECHNICAL SAFETY ESTIMATION MODULE USING NEURAL NETWORKS
EFFECT ANALYSIS
Process parameters
ANN learning files
COMMAND EXECUTION MODULE
Human-Robot Communication Cycle
COMMAND EXECUTION
COMMAND RECOGNITION MODULE USING EVOLVABLE FUZZY NEURAL NETWORKS
481
HUMAN AUTHENTICATION MODULE Definition of new parameters BIOMETRIC HUMAN IDENTIFICATION MODULE USING NEURAL NETWORKS Password input characteristics recognition Signature pattern recognition Knowledge pattern recognition Personality recognition
HUMAN Voice message Speaker Speech signal MODULE FOR PROCESSING TEXT TO VOICE MESSAGES
PROCESS STATE SIGNALING MODULE
SPEECH COMMUNICATION
Human’s knowledge
Human’s password
MODULE FOR SIGNALING PROCESS INACCURACY CAUSES
ADDITIONAL INFORMATION MODULE
LASER PROCESSING ROBOT Abnormal process symptoms TECHNOLOGICAL PROCESS SUPERVISION MODULE USING NEURAL NETWORKS Problem solution
MANUFACTURING ROBOT
Communication transmitter from PDA to Robot
Robot communication transmitter to PDA
Speech synthesis engine
Fig. 1. Architecture of an intelligent human-robot speech communication system for mobile technology in a production process
482
W. Kacalak and M. Majewski
is sent to the effect analysis subsystem which analyses the status corresponding to the hypothetical command execution, and consecutively estimates technical safety and the process state. The command is also sent to the safety assessment subsystem which assigns the command to the correct category and makes corrections. The command execution subsystem assesses the reactions of the human, and defines new parameters for the process. The manufacturing robot subsystem is composed of a laser processing robot, as well as modules for technological process supervision and signaling process state and causes of inaccuracy. The subsystem for speech communication produces voice messages to the human.
2
Automatic Command Recognition
In the automatic recognition process of commands in natural language, the speech signal is processed to text and numeric values with the module for processing voice commands to text on a Tablet PC.
[...] ALONG BY
COMMAND SYNTAX ANALYSIS MODULE
command recognition cycle
[...] HEAD HORIZONTALLY [...] LASER LINEAR
z: seg.1, seg.2, seg.3, seg4., seg.5
COMMAND SEGMENT ANALYSIS MODULE
z : move laser processing head wi : move laser processing tool
MOVE [...] PATH POSITION PROCESSING
horizontally along linear by position horizontally along linear path by position
COMMAND RECOGNITION MODULE
[...] TOOL [...]
move laser processing head horizontally along linear by position WORD RECOGNITION MODULE WORD ANALYSIS MODULE mov laser proces-sing head horizont-al along lin by position LETTER STRING ANALYSIS MODULE move the laser processing head of the machine in a horizontal direction along a line by a position
Hamming distance DH= 2 to the class representing command: ‘move laser processing tool horizontally along linear path by position’
Hamming distance DH= 11 to the class representing command: ‘move laser processing tool forward to next position’
[...] ALONG BY [...] HEAD HORIZONTALLY [...] LASER LINEAR
MOVE [...] PATH POSITION PROCESSING [...] TOOL [...]
[...] FORWARD [...] HEAD [...] LASER [...]
MOVE NEXT [...] POSITION PROCESSING [...] TO TOOL [...]
Fig. 2. Illustration of a command recognition cycle
Natural Language Human-Robot Interface
Classification to command classes (Set of possible meaningful command patterns) C1
C2
Output: binary image of the recognized command
C3
483
Evolvable learning of ANN
[...] ALONG BY
Cp
[...]
N31
N32
N33
N3p Output layer
HEAD HORIZONTALLY [...] LASER LINEAR
MOVE [...]
a1
a2
a3
N22
N21
ap Recursive layer MAXNET
PATH POSITION PROCESSING [...] TOOL [...]
N2p
N23
Evolutionary optimization: z, W 1
1
b1
b3
N12
N11
1
1
b2
Mpp
bp
Chromosome W=weights
N1p
N13
Binary distance layer
p=number of patterns
Minimum Hamming distance fitness function F(G)=DHmin(z,wi)
Wpn
Genotype G ={Chz, ChW }
[...] ALONG BY [...]
N01
N02
N03
N04
N05
x1
x2
x3
x4
x5
Fuzzy N06 connectionist structure
Chromosome z=input pattern
HEAD HORIZONTALLY [...] LASER LINEAR
N0n
b
x6
MOVE
xn
xi={-1,+1}
[...] PATH POSITION PROCESSING
a
WORD b
WORD 1 WORD 2 WORD 3
Input of the net: n=a*b b
Binary image of a command
[...] TOOL [...]
a
WORD a
4 3 2 1
WORD 4 WORD 3 WORD 2 WORD 1
Command syntax analysis using command segment patterns
Evolvable architecture of ANN
a
Words into a command , 1
}
Output: images of the recognized words
WORD b
WORD 1 WORD 2 WORD 3
Bit = { 0 (for ANN: -1) Represented word class
ABCDEFGH I J KLMNOPQRSTUVWXYZ 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Classification to word classes (Set of word patterns as command components) C2
C1 N32
N31
a1
a2
N21
N11
ap
N2p
1
1
Chromosome Evolvable b2 W=weights architecture of ANN N12
bp N1p
Evolvable learning of ANN
x2
a
ABCDEFGH I J KLMNOPQRSTUVWXYZ
Image of recognized word ‘HORIZONTALLY’
Mpp
p=number of patterns
Evolutionary optimization of parameters: z, W
Genotype G ={Chz, ChW } Input of the net: n=26*a
Fitness function F(G)=DHmin(z,wi)
b ABCDEFGH I J KLMNOPQRSTUVWXYZ
Wpn Chromosome z=input pattern
Fuzzy connectionist structure
xi={-1,+1} x1
N3p
N22
1
b1
Cp
H O R I Z O N T A L L Y
xn
Binary images as isolated components of the text created by the word analysis module
a
1 2 3 4 5 6 7 8 9 10 11 12 13 14
H O R I Z O N T A L
a
ABCDEFGH I J KLMNOPQRSTUVWXYZ
Binary image of letter string ‘HORIZONTAL’
Fig. 3. Evolvable fuzzy neural networks for word and command recognition
484
W. Kacalak and M. Majewski
LETTER STRING ANALYSIS MODULE using word segment patterns Output of letter string recognition and segment division:
Speech recognition output: start emitting beam...
emitting
emi
ing
ABCDE Z
Binary coding as input pattern:
1 2 3 4 5 a
WORD ANALYSIS MODULE using Hamming neural networks ABCDEFGH I J KLMNOPQRSTUVWXYZ
z seg.1
1 2 3 4
z seg.2
5 6 7 8
ABCDEFGH I J KLMNOPQRSTUVWXYZ E M I T
wi seg.1
1 2 3 4
T I N G
wi seg.2
5 6 7 8
ABCDEFGH I J KLMNOPQRSTUVWXYZ
E M I S
ABCDEFGH I J KLMNOPQRSTUVWXYZ S I O N
WORD RECOGNITION MODULE using evolvable fuzzy neural networks Learning patterns: letters: 26, word segments: 104, words as command components: 70 Chromosome chz = individual representing presence of the same letters in a word Evolutionary strategies and fuzzy connectionist structure
Coding of existence of rows in a pattern
Chromosome chz = [11111100000000000000000000] Chz = [11111111000000000000000000]
Minimum Hamming distance fitness function: Evolutionary algorithm indication of a binary image row to remove
1 F(chz) = DHmin(z,wi) = D (z,w H i) FOR analysis IF (zj=0 AND wij=1) THEN zj=1
[ 111111 00000000000000000000] [ 111111 00000000000000000000]
Word pattern z: Word patterns W:
INITIALIZATION (implementation of evolutionary ANN weights) and CLASSIFICATION PHASES ABCDEFGH I J KLMNOPQRSTUVWXYZ
chz
1 2 3 4 5 6 7 8
ABCDEFGH I J KLMNOPQRSTUVWXYZ E M I T T I N G
i
1 2 3 4 5 6 7 8
i
1 2 3 4 5 6 7 8
chw
ABCDEFGH I J KLMNOPQRSTUVWXYZ
Chz
1 2 3 4 5 6 7 8
E M I S S I O N
ABCDEFGH I J KLMNOPQRSTUVWXYZ E M I T T I N G
Chw
E M I S S I O N
Genotype G ={Chz, ChW }
Fig. 4. Selected strategies for evolution of the neural network for word recognition
The speech recognition engine is a continuous density mixture Gaussian Hidden Markov Model system. The speech signal is transmitted from a PDA to the Tablet PC and after a successful utterance recognition, a text command in natural language is sent to the PDA. Individual words treated here as isolated components of the text are processed with the letter string analysis module.
Natural Language Human-Robot Interface
485
COMMAND SYNTAX ANALYSIS MODULE using command segment patterns transmit emission beam through fiber for laser processing head seg.1
seg.2
seg.3
seg.4
Binary coding as input pattern:
seg.5
WORD 1 WORD 2 WORD 3 WORD 4 WORD 5 WORD b
12345
a
COMMAND SEGMENT ANALYSIS MODULE using Hamming neural networks seg. 1 seg. 1
seg. 2
seg. 2
seg. 3
seg. 4 seg. 4
seg. 3
[...] BEAM [...] EMISSION FIBER FOR HEAD [...] LASER LIGHT
[...] BEAM [...] EMISSION FIBER FOR HEAD [...] LASER LIGHT
[...] BEAM [...] EMISSION FIBER FOR HEAD [...] LASER LIGHT
[...] BEAM [...] EMISSION FIBER FOR HEAD [...] LASER LIGHT
[...] BEAM [...] EMISSION FIBER FOR HEAD [...] LASER LIGHT
[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]
[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]
[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]
[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]
[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]
1
1
23
23
456
456
7
7
z
wi
z
wi
z
wi
z
wi
seg. 5
seg. 5
8
8
wi
z
COMMAND RECOGNITION MODULE using evolvable fuzzy neural networks Learning patterns: meaningful commands: 108, words as command components: 70 Chromosome chz = individual representing presence of words in a command
chz = [0000010000000011010010001000000000000000000000001000000000100001000000] wi: [0000010000000001000000001001000000000001000000001000000000101001000000] Minimum Hamming distance fitness function:
1 F(chz) = DHmin(z,wi) = D (z,w H i)
Evolutionary algorithm indication of a binary image row to remove
FOR analysis IF (zj=0 AND wij=1) THEN zj=1
Evolutionary strategies and fuzzy connectionist structure
Coding of existence of rows in a pattern Command pattern z: [11111111111111011011 01111111111111111111111111111111111111111111111111] Command patterns W: [11111111111111011011 01111111111111111111111111111111111111111111111111] INITIALIZATION (implementation of evolutionary ANN weights) and CLASSIFICATION PHASES Segments: s1 s2
s3
s4
s5
Segments: s1 s2
s4
s3
s5
Segments: s1 s2
s3
s4
s5
Segments: s1 s2
[...] BEAM FOR EMISSION FIBER [...] HEAD [...] LASER LIGHT
[...] BEAM FOR EMISSION FIBER [...] HEAD [...] LASER LIGHT
[...] BEAM FOR EMISSION FIBER [...] HEAD [...] LASER LIGHT
[...] BEAM FOR EMISSION FIBER [...] HEAD [...] LASER LIGHT
[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]
[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]
[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]
[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]
chw
chz
i
Genotype G ={Chz, ChW }
Chz
transmit light beam through optical fiber to laser processing tool seg.1
seg.2
seg.3
seg.4
seg.5
s3
s4
s5
Chw
i
Classification to meaningful command class
Fig. 5. Selected strategies for neural network evolution for command recognition
486
W. Kacalak and M. Majewski
The letters grouped in segments are then processed by the word analysis module. The analyzed word segments are inputs of the evolvable fuzzy neural network for recognizing words. The network uses a training file containing also words and is trained to recognize words as command components, with words represented by output neurons. In the next stage, the recognized words are transferred to the command syntax analysis module which uses command segment patterns. It analyses and divides commands into segments with regards to meaning, and codes commands as vectors. They are sent to the command segment analysis module using Hamming neural networks equipped with command segment patterns. The commands become inputs of the command recognition module. The module uses a 3-layer evolvable fuzzy Hamming neural network either to recognize the command and find its meaning or else it fails to recognize it (fig. 2). The neural network of this module uses a training file containing patterns of possible meaningful commands. The word and command recognition modules contain Hamming neural networks (fig. 3) which feature evolvable architectures and learning. Selected strategies for evolution of the matrix of the neural network weights ’W’ and the input pattern ’z’ as shown in abbreviated form on Fig. 4 and 5, increase the computational effectiveness of the recognition problems.
3
Command Effect Analysis and Safety Assessment
The effect analysis and safety assessment subsystems (fig. 6) make analysis of the recognized human command. The technical safety of the manufacturing robot is checked by analyzing the state of execution of the commands required to have been done as well as the commands to execute in subsequent decisions. The process parameters to be modified by executing the command are checked and the allowable changes of the parameter values are determined. They are both the input signals of the neural network of the technical safety and process state estimation modules. The neural network classifiers, which are trained with a model of the manufacturing robot work, model of the technological process, model of process diagnostics, allow for estimation of the level of safety of the recognized command as well as analysis of the status corresponding to hypothetical command execution. New values of the process parameters are the input signals of the neural networks. An algorithm was created for assessing the technological safety of commands. The lines represent power dependence on the laser processing parameters for particular types of material. Based on the specified criteria, the laser processing power limit is determined for each type of material. Based on the laser processing power limit, the laser processing speed limit is assigned. According to the human’s command, if the increase in speed makes the speed of the laser processing smaller than the smallest speed determined with the power limit, then the command is safe to be executed.
Natural Language Human-Robot Interface
3% above Plimit1
P m ow at er er li ia m l t it yp fo e r nr 1
Plimit1
3% above Plimit2
Laser processing power P[W]
P m ow at er er li ia m l t it yp fo e r nr 2
Plimit2
vlimit2 vlimit1 v
v[m
m/
s]
v
ed
M1
pe
M2
es
ype
sin
M4
rial t
Pr
oc
M5
fa(P)
fa(P) 1
1
M1
vlimit1=v+ v
gs
M3
Mate
M1
P[W]
M2
0
P[W] fa(P) maximum fa(P) allowable power power
fa(P) low power
M3 M4
fa(P) - membership function of the laser processing power
M5 Material type
fa(P) 1
Fitness function F(ch) based on patterns
process parameter value
v2
v3
v4
v1
v5
0 A G C B
Chromosome ch:
ED
HF
Process parameter [units]
w11 w12 w13
x1
x2
w21 w22 w23
x3
wj2 wj3
G A B C D E F H
ch1 ch2 x1 # v2 x2 v3 v1 x3 v1 v5 y v1 v2
0 1 1 0
Ch1 Ch2 # v2 v1 v3 v5 v1 v1 v2
Ch v2 v3 v1 v2
Ultimate rule set
w1n
N2
å w2n wj1
Nj
å wjn
Evolution of fuzzy logic rules: iterative rule learning IF x2 is medium AND x3 is low THEN y is low
N1
xn
W
wM2 wM1 wM3 wMn
0
å
NM
å
Fig. 6. Command effect analysis and safety assessment
WTA = Winner Takes All
Membership functions of process parameter values of fuzzy sets as input values for ANN: Evolution of fuzzy logic membership functions
1
0
0
487
488
W. Kacalak and M. Majewski
A)
B)
Fig. 7. Implementation of the intelligent human-robot speech interface for mobile technology: A) designed 3D manufacturing robot model executing tasks, B) written applications for a PDA and Tablet PC in C++ programming language (A)
(B)
Rn=2604,6*N-0,08
R=40*(2-exp(-(0,14)*P))+12*(1-exp(-(2,35)*P))
100
100 8 90 1
2
4
3
6
5
7
95
90 Recognition rate Rn [%]
Recognition rate R [%]
80
70
60
50
1 2 3
85
80 4 75 95%
5
70
40 Recognition level without using the learning patterns
65
30
20 0
10
20
30
40
50
60
Number of learning patterns P
70
80
90
60 58
60
62
64
66
68
70
72
74
76
78
80
82
Noise power N [dB]
Fig. 8. Experimental results: A) Speech recognition rate; B) Speech recognition rate at different noise power
Natural Language Human-Robot Interface
4
489
Experimental Results
The experimental dataset consisted of the learning files containing 70 training words and 108 meaningful training commands. The developed system (fig. 7) was executed on Windows MobileTM on a PDA (Pocket PC) and Tablet PC using Windows. The first test measured the performance of the speaker independent speech recognition with the implemented module on a PC. As shown in Fig. 8A, the speech recognition module recognizes 85-90% of the speaker’s words correctly. As more training of the module is done, accuracy rises to around 95%. For the research on command recognition at different noise power, the microphone used by the speaker was the PDA internal microphone. As shown in Fig. 8B, the recognition performance is sensitive to background noise. The recognition rate is about 86% at 70 dB and 71% at 80 dB. Therefore, background noise must be limited while giving the commands.
5
Conclusions and Perspectives
Application of evolvable fuzzy neural networks allows for recognition of commands in natural language with similar meanings but different lexico-grammatical patterns, which will undoubtedly be the most important way of communication between humans and robots. The developed flexible method can be easily extended for applications. The experimental results of the presented method of commands recognition show its excellent and promising performance, and can be used for further development and experiments. The condition of the effectiveness of the presented system is to equip it with mechanisms of command meaning and effect analysis, as well as safety assessment. In the automated processes of production, the condition for safe communication between humans and robots is analyzing the state of the manufacturing robot and the process before the command is given and using artificial intelligence for technological effect analysis and safety assessment of the command.
References 1. Bosch, L., Kirchhoff, K.: Bridging the gap between human and automatic speech recognition. Speech Communication 49(5), 331–335 (2007) 2. O’Shaughnessy, D.: Speech Communications: Human and Machine. IEEE Computer Society Press, New York (2000)