Natural Language Human-Robot Interface Using ... - Springer Link

2 downloads 256 Views 589KB Size Report
In this paper, a human-robot speech interface for mobile technology is ..... cations for a PDA and Tablet PC in C++ programming language. R=40*(2-exp(-(0 ...
Natural Language Human-Robot Interface Using Evolvable Fuzzy Neural Networks for Mobile Technology Wojciech Kacalak and Maciej Majewski Koszalin University of Technology, Faculty of Mechanical Engineering Raclawicka 15-17, 75-620 Koszalin, Poland [email protected], [email protected]

Abstract. In this paper, a human-robot speech interface for mobile technology is described which consists of intelligent mechanisms of human identification, speech recognition, word and command recognition, command meaning and effect analysis, command safety assessment, process supervision as well as human reaction assessment. A review of selected issues is carried out with regards to recognition and evaluation of speech commands in natural language using hybrid neural networks. The paper presents experimental results of automatic recognition and evaluation of spoken commands of a manufacturing robot model simulating the execution of laser processing tasks in a virtual production process. Keywords: Human-robot interaction, Voice communication, Speech interface, Artificial intelligence, Hybrid neural networks, Mobile technology.

1

Introduction

The advantages of intelligent human-robot speech communication for mobile technology in a production process include the following: – More robustness against human’s errors and more efficient supervising of the process with the chosen level of supervision automation [1]. – Improvement of the co-operation between a human and a robot in respect to the richness of communication [2]. – Achievement of a higher level of organization of a technological process that is equipped with an intelligent two-way speech communication system, which is relevant for its efficiency and production humanization. – Technological decision and optimization systems can be remote elements with regard to a technological process. There is no need of a human to be present at the work station of the manufacturing robot. Commands produced in continuous speech are processed to text with a Tablet PC after identification and authentication of the human by the mobile system shown in abbreviated form on Fig. 1. The recognized text is analyzed by the meaning analysis subsystem performing word and command recognition. This D.-S. Huang et al. (Eds.): ICIC 2009, LNCS 5754, pp. 480–489, 2009. c Springer-Verlag Berlin Heidelberg 2009 

Natural Language Human-Robot Interface

Human’s recognized meaningful commands

MODULE FOR ANALYZING STATUS CORRESPONDING TO HYPOTHETICAL COMMAND EXECUTION

Meaningful commands

Model of a technological process and process diagnostics

Analyzed command segments COMMAND SEGMENT ANALYSIS MODULE USING NEURAL NETWORKS Words in segments as command components COMMAND SYNTAX ANALYSIS MODULE Recognized words

Analyzed word segments

ANN learning file Words as command components

PROCESS STATE ESTIMATION MODULE USING NEURAL NETWORKS

COMMAND CORRECTNESS ASSESSMENT MODULE USING FUZZY LOGIC Incorrect command

WORD ANALYSIS MODULE USING NEURAL NETWORKS

HUMAN ERROR SIGNALING MODULE Correct command

Letters in segments as word components

MODULE OF GRADE ASSESSMENT OF AFFILIATION OF COMMANDS TO CORRECT COMMAND CATEGORY

LETTER STRING ANALYSIS MODULE Text commands in natural language

Grade of command safety or error message

Tablet PC transmitter to PDA

COMMAND CORRECTION MODULE Safe command

ERROR CONTROL MODULE

MODULE FOR PROCESSING VOICE COMMANDS TO TEXT Speech recognition engine on Tablet PC

MODULE FOR SIGNALING COMMANDS ACCEPTED FOR EXECUTION

Learning patterns of speech commands

PDA transmitter to Tablet PC

REACTION TIME ANALYSIS MODULE

HUMAN REACTION ASSESSMENT MODULE USING FUZZY LOGIC

VOICE COMMANDS ARCHIVE MODULE Speech signal Microphone Human’s speech command

SAFETY ASSESSMENT

WORD RECOGNITION MODULE USING EVOLVABLE FUZZY NEURAL NETWORKS

MEANING ANALYSIS

TECHNICAL SAFETY ESTIMATION MODULE USING NEURAL NETWORKS

EFFECT ANALYSIS

Process parameters

ANN learning files

COMMAND EXECUTION MODULE

Human-Robot Communication Cycle

COMMAND EXECUTION

COMMAND RECOGNITION MODULE USING EVOLVABLE FUZZY NEURAL NETWORKS

481

HUMAN AUTHENTICATION MODULE Definition of new parameters BIOMETRIC HUMAN IDENTIFICATION MODULE USING NEURAL NETWORKS Password input characteristics recognition Signature pattern recognition Knowledge pattern recognition Personality recognition

HUMAN Voice message Speaker Speech signal MODULE FOR PROCESSING TEXT TO VOICE MESSAGES

PROCESS STATE SIGNALING MODULE

SPEECH COMMUNICATION

Human’s knowledge

Human’s password

MODULE FOR SIGNALING PROCESS INACCURACY CAUSES

ADDITIONAL INFORMATION MODULE

LASER PROCESSING ROBOT Abnormal process symptoms TECHNOLOGICAL PROCESS SUPERVISION MODULE USING NEURAL NETWORKS Problem solution

MANUFACTURING ROBOT

Communication transmitter from PDA to Robot

Robot communication transmitter to PDA

Speech synthesis engine

Fig. 1. Architecture of an intelligent human-robot speech communication system for mobile technology in a production process

482

W. Kacalak and M. Majewski

is sent to the effect analysis subsystem which analyses the status corresponding to the hypothetical command execution, and consecutively estimates technical safety and the process state. The command is also sent to the safety assessment subsystem which assigns the command to the correct category and makes corrections. The command execution subsystem assesses the reactions of the human, and defines new parameters for the process. The manufacturing robot subsystem is composed of a laser processing robot, as well as modules for technological process supervision and signaling process state and causes of inaccuracy. The subsystem for speech communication produces voice messages to the human.

2

Automatic Command Recognition

In the automatic recognition process of commands in natural language, the speech signal is processed to text and numeric values with the module for processing voice commands to text on a Tablet PC.

[...] ALONG BY

COMMAND SYNTAX ANALYSIS MODULE

command recognition cycle

[...] HEAD HORIZONTALLY [...] LASER LINEAR

z: seg.1, seg.2, seg.3, seg4., seg.5

COMMAND SEGMENT ANALYSIS MODULE

z : move laser processing head wi : move laser processing tool

MOVE [...] PATH POSITION PROCESSING

horizontally along linear by position horizontally along linear path by position

COMMAND RECOGNITION MODULE

[...] TOOL [...]

move laser processing head horizontally along linear by position WORD RECOGNITION MODULE WORD ANALYSIS MODULE mov laser proces-sing head horizont-al along lin by position LETTER STRING ANALYSIS MODULE move the laser processing head of the machine in a horizontal direction along a line by a position

Hamming distance DH= 2 to the class representing command: ‘move laser processing tool horizontally along linear path by position’

Hamming distance DH= 11 to the class representing command: ‘move laser processing tool forward to next position’

[...] ALONG BY [...] HEAD HORIZONTALLY [...] LASER LINEAR

MOVE [...] PATH POSITION PROCESSING [...] TOOL [...]

[...] FORWARD [...] HEAD [...] LASER [...]

MOVE NEXT [...] POSITION PROCESSING [...] TO TOOL [...]

Fig. 2. Illustration of a command recognition cycle

Natural Language Human-Robot Interface

Classification to command classes (Set of possible meaningful command patterns) C1

C2

Output: binary image of the recognized command

C3

483

Evolvable learning of ANN

[...] ALONG BY

Cp

[...]

N31

N32

N33

N3p Output layer

HEAD HORIZONTALLY [...] LASER LINEAR

MOVE [...]

a1

a2

a3

N22

N21

ap Recursive layer MAXNET

PATH POSITION PROCESSING [...] TOOL [...]

N2p

N23

Evolutionary optimization: z, W 1

1

b1

b3

N12

N11

1

1

b2

Mpp

bp

Chromosome W=weights

N1p

N13

Binary distance layer

p=number of patterns

Minimum Hamming distance fitness function F(G)=DHmin(z,wi)

Wpn

Genotype G ={Chz, ChW }

[...] ALONG BY [...]

N01

N02

N03

N04

N05

x1

x2

x3

x4

x5

Fuzzy N06 connectionist structure

Chromosome z=input pattern

HEAD HORIZONTALLY [...] LASER LINEAR

N0n

b

x6

MOVE

xn

xi={-1,+1}

[...] PATH POSITION PROCESSING

a

WORD b

WORD 1 WORD 2 WORD 3

Input of the net: n=a*b b

Binary image of a command

[...] TOOL [...]

a

WORD a

4 3 2 1

WORD 4 WORD 3 WORD 2 WORD 1

Command syntax analysis using command segment patterns

Evolvable architecture of ANN

a

Words into a command , 1

}

Output: images of the recognized words

WORD b

WORD 1 WORD 2 WORD 3

Bit = { 0 (for ANN: -1) Represented word class

ABCDEFGH I J KLMNOPQRSTUVWXYZ 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Classification to word classes (Set of word patterns as command components) C2

C1 N32

N31

a1

a2

N21

N11

ap

N2p

1

1

Chromosome Evolvable b2 W=weights architecture of ANN N12

bp N1p

Evolvable learning of ANN

x2

a

ABCDEFGH I J KLMNOPQRSTUVWXYZ

Image of recognized word ‘HORIZONTALLY’

Mpp

p=number of patterns

Evolutionary optimization of parameters: z, W

Genotype G ={Chz, ChW } Input of the net: n=26*a

Fitness function F(G)=DHmin(z,wi)

b ABCDEFGH I J KLMNOPQRSTUVWXYZ

Wpn Chromosome z=input pattern

Fuzzy connectionist structure

xi={-1,+1} x1

N3p

N22

1

b1

Cp

H O R I Z O N T A L L Y

xn

Binary images as isolated components of the text created by the word analysis module

a

1 2 3 4 5 6 7 8 9 10 11 12 13 14

H O R I Z O N T A L

a

ABCDEFGH I J KLMNOPQRSTUVWXYZ

Binary image of letter string ‘HORIZONTAL’

Fig. 3. Evolvable fuzzy neural networks for word and command recognition

484

W. Kacalak and M. Majewski

LETTER STRING ANALYSIS MODULE using word segment patterns Output of letter string recognition and segment division:

Speech recognition output: start emitting beam...

emitting

emi

ing

ABCDE Z

Binary coding as input pattern:

1 2 3 4 5 a

WORD ANALYSIS MODULE using Hamming neural networks ABCDEFGH I J KLMNOPQRSTUVWXYZ

z seg.1

1 2 3 4

z seg.2

5 6 7 8

ABCDEFGH I J KLMNOPQRSTUVWXYZ E M I T

wi seg.1

1 2 3 4

T I N G

wi seg.2

5 6 7 8

ABCDEFGH I J KLMNOPQRSTUVWXYZ

E M I S

ABCDEFGH I J KLMNOPQRSTUVWXYZ S I O N

WORD RECOGNITION MODULE using evolvable fuzzy neural networks Learning patterns: letters: 26, word segments: 104, words as command components: 70 Chromosome chz = individual representing presence of the same letters in a word Evolutionary strategies and fuzzy connectionist structure

Coding of existence of rows in a pattern

Chromosome chz = [11111100000000000000000000] Chz = [11111111000000000000000000]

Minimum Hamming distance fitness function: Evolutionary algorithm indication of a binary image row to remove

1 F(chz) = DHmin(z,wi) = D (z,w H i) FOR analysis IF (zj=0 AND wij=1) THEN zj=1

[ 111111 00000000000000000000] [ 111111 00000000000000000000]

Word pattern z: Word patterns W:

INITIALIZATION (implementation of evolutionary ANN weights) and CLASSIFICATION PHASES ABCDEFGH I J KLMNOPQRSTUVWXYZ

chz

1 2 3 4 5 6 7 8

ABCDEFGH I J KLMNOPQRSTUVWXYZ E M I T T I N G

i

1 2 3 4 5 6 7 8

i

1 2 3 4 5 6 7 8

chw

ABCDEFGH I J KLMNOPQRSTUVWXYZ

Chz

1 2 3 4 5 6 7 8

E M I S S I O N

ABCDEFGH I J KLMNOPQRSTUVWXYZ E M I T T I N G

Chw

E M I S S I O N

Genotype G ={Chz, ChW }

Fig. 4. Selected strategies for evolution of the neural network for word recognition

The speech recognition engine is a continuous density mixture Gaussian Hidden Markov Model system. The speech signal is transmitted from a PDA to the Tablet PC and after a successful utterance recognition, a text command in natural language is sent to the PDA. Individual words treated here as isolated components of the text are processed with the letter string analysis module.

Natural Language Human-Robot Interface

485

COMMAND SYNTAX ANALYSIS MODULE using command segment patterns transmit emission beam through fiber for laser processing head seg.1

seg.2

seg.3

seg.4

Binary coding as input pattern:

seg.5

WORD 1 WORD 2 WORD 3 WORD 4 WORD 5 WORD b

12345

a

COMMAND SEGMENT ANALYSIS MODULE using Hamming neural networks seg. 1 seg. 1

seg. 2

seg. 2

seg. 3

seg. 4 seg. 4

seg. 3

[...] BEAM [...] EMISSION FIBER FOR HEAD [...] LASER LIGHT

[...] BEAM [...] EMISSION FIBER FOR HEAD [...] LASER LIGHT

[...] BEAM [...] EMISSION FIBER FOR HEAD [...] LASER LIGHT

[...] BEAM [...] EMISSION FIBER FOR HEAD [...] LASER LIGHT

[...] BEAM [...] EMISSION FIBER FOR HEAD [...] LASER LIGHT

[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]

[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]

[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]

[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]

[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]

1

1

23

23

456

456

7

7

z

wi

z

wi

z

wi

z

wi

seg. 5

seg. 5

8

8

wi

z

COMMAND RECOGNITION MODULE using evolvable fuzzy neural networks Learning patterns: meaningful commands: 108, words as command components: 70 Chromosome chz = individual representing presence of words in a command

chz = [0000010000000011010010001000000000000000000000001000000000100001000000] wi: [0000010000000001000000001001000000000001000000001000000000101001000000] Minimum Hamming distance fitness function:

1 F(chz) = DHmin(z,wi) = D (z,w H i)

Evolutionary algorithm indication of a binary image row to remove

FOR analysis IF (zj=0 AND wij=1) THEN zj=1

Evolutionary strategies and fuzzy connectionist structure

Coding of existence of rows in a pattern Command pattern z: [11111111111111011011 01111111111111111111111111111111111111111111111111] Command patterns W: [11111111111111011011 01111111111111111111111111111111111111111111111111] INITIALIZATION (implementation of evolutionary ANN weights) and CLASSIFICATION PHASES Segments: s1 s2

s3

s4

s5

Segments: s1 s2

s4

s3

s5

Segments: s1 s2

s3

s4

s5

Segments: s1 s2

[...] BEAM FOR EMISSION FIBER [...] HEAD [...] LASER LIGHT

[...] BEAM FOR EMISSION FIBER [...] HEAD [...] LASER LIGHT

[...] BEAM FOR EMISSION FIBER [...] HEAD [...] LASER LIGHT

[...] BEAM FOR EMISSION FIBER [...] HEAD [...] LASER LIGHT

[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]

[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]

[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]

[...] OPTICAL [...] PROCESSING [...] THROUGH TO TOOL TRANSMIT [...]

chw

chz

i

Genotype G ={Chz, ChW }

Chz

transmit light beam through optical fiber to laser processing tool seg.1

seg.2

seg.3

seg.4

seg.5

s3

s4

s5

Chw

i

Classification to meaningful command class

Fig. 5. Selected strategies for neural network evolution for command recognition

486

W. Kacalak and M. Majewski

The letters grouped in segments are then processed by the word analysis module. The analyzed word segments are inputs of the evolvable fuzzy neural network for recognizing words. The network uses a training file containing also words and is trained to recognize words as command components, with words represented by output neurons. In the next stage, the recognized words are transferred to the command syntax analysis module which uses command segment patterns. It analyses and divides commands into segments with regards to meaning, and codes commands as vectors. They are sent to the command segment analysis module using Hamming neural networks equipped with command segment patterns. The commands become inputs of the command recognition module. The module uses a 3-layer evolvable fuzzy Hamming neural network either to recognize the command and find its meaning or else it fails to recognize it (fig. 2). The neural network of this module uses a training file containing patterns of possible meaningful commands. The word and command recognition modules contain Hamming neural networks (fig. 3) which feature evolvable architectures and learning. Selected strategies for evolution of the matrix of the neural network weights ’W’ and the input pattern ’z’ as shown in abbreviated form on Fig. 4 and 5, increase the computational effectiveness of the recognition problems.

3

Command Effect Analysis and Safety Assessment

The effect analysis and safety assessment subsystems (fig. 6) make analysis of the recognized human command. The technical safety of the manufacturing robot is checked by analyzing the state of execution of the commands required to have been done as well as the commands to execute in subsequent decisions. The process parameters to be modified by executing the command are checked and the allowable changes of the parameter values are determined. They are both the input signals of the neural network of the technical safety and process state estimation modules. The neural network classifiers, which are trained with a model of the manufacturing robot work, model of the technological process, model of process diagnostics, allow for estimation of the level of safety of the recognized command as well as analysis of the status corresponding to hypothetical command execution. New values of the process parameters are the input signals of the neural networks. An algorithm was created for assessing the technological safety of commands. The lines represent power dependence on the laser processing parameters for particular types of material. Based on the specified criteria, the laser processing power limit is determined for each type of material. Based on the laser processing power limit, the laser processing speed limit is assigned. According to the human’s command, if the increase in speed makes the speed of the laser processing smaller than the smallest speed determined with the power limit, then the command is safe to be executed.

Natural Language Human-Robot Interface

3% above Plimit1

P m ow at er er li ia m l t it yp fo e r nr 1

Plimit1

3% above Plimit2

Laser processing power P[W]

P m ow at er er li ia m l t it yp fo e r nr 2

Plimit2

vlimit2 vlimit1 v

v[m

m/

s]

v

ed

M1

pe

M2

es

ype

sin

M4

rial t

Pr

oc

M5

fa(P)

fa(P) 1

1

M1

vlimit1=v+ v

gs

M3

Mate

M1

P[W]

M2

0

P[W] fa(P) maximum fa(P) allowable power power

fa(P) low power

M3 M4

fa(P) - membership function of the laser processing power

M5 Material type

fa(P) 1

Fitness function F(ch) based on patterns

process parameter value

v2

v3

v4

v1

v5

0 A G C B

Chromosome ch:

ED

HF

Process parameter [units]

w11 w12 w13

x1

x2

w21 w22 w23

x3

wj2 wj3

G A B C D E F H

ch1 ch2 x1 # v2 x2 v3 v1 x3 v1 v5 y v1 v2

0 1 1 0

Ch1 Ch2 # v2 v1 v3 v5 v1 v1 v2

Ch v2 v3 v1 v2

Ultimate rule set

w1n

N2

å w2n wj1

Nj

å wjn

Evolution of fuzzy logic rules: iterative rule learning IF x2 is medium AND x3 is low THEN y is low

N1

xn

W

wM2 wM1 wM3 wMn

0

å

NM

å

Fig. 6. Command effect analysis and safety assessment

WTA = Winner Takes All

Membership functions of process parameter values of fuzzy sets as input values for ANN: Evolution of fuzzy logic membership functions

1

0

0

487

488

W. Kacalak and M. Majewski

A)

B)

Fig. 7. Implementation of the intelligent human-robot speech interface for mobile technology: A) designed 3D manufacturing robot model executing tasks, B) written applications for a PDA and Tablet PC in C++ programming language (A)

(B)

Rn=2604,6*N-0,08

R=40*(2-exp(-(0,14)*P))+12*(1-exp(-(2,35)*P))

100

100 8 90 1

2

4

3

6

5

7

95

90 Recognition rate Rn [%]

Recognition rate R [%]

80

70

60

50

1 2 3

85

80 4 75 95%

5

70

40 Recognition level without using the learning patterns

65

30

20 0

10

20

30

40

50

60

Number of learning patterns P

70

80

90

60 58

60

62

64

66

68

70

72

74

76

78

80

82

Noise power N [dB]

Fig. 8. Experimental results: A) Speech recognition rate; B) Speech recognition rate at different noise power

Natural Language Human-Robot Interface

4

489

Experimental Results

The experimental dataset consisted of the learning files containing 70 training words and 108 meaningful training commands. The developed system (fig. 7) was executed on Windows MobileTM on a PDA (Pocket PC) and Tablet PC using Windows. The first test measured the performance of the speaker independent speech recognition with the implemented module on a PC. As shown in Fig. 8A, the speech recognition module recognizes 85-90% of the speaker’s words correctly. As more training of the module is done, accuracy rises to around 95%. For the research on command recognition at different noise power, the microphone used by the speaker was the PDA internal microphone. As shown in Fig. 8B, the recognition performance is sensitive to background noise. The recognition rate is about 86% at 70 dB and 71% at 80 dB. Therefore, background noise must be limited while giving the commands.

5

Conclusions and Perspectives

Application of evolvable fuzzy neural networks allows for recognition of commands in natural language with similar meanings but different lexico-grammatical patterns, which will undoubtedly be the most important way of communication between humans and robots. The developed flexible method can be easily extended for applications. The experimental results of the presented method of commands recognition show its excellent and promising performance, and can be used for further development and experiments. The condition of the effectiveness of the presented system is to equip it with mechanisms of command meaning and effect analysis, as well as safety assessment. In the automated processes of production, the condition for safe communication between humans and robots is analyzing the state of the manufacturing robot and the process before the command is given and using artificial intelligence for technological effect analysis and safety assessment of the command.

References 1. Bosch, L., Kirchhoff, K.: Bridging the gap between human and automatic speech recognition. Speech Communication 49(5), 331–335 (2007) 2. O’Shaughnessy, D.: Speech Communications: Human and Machine. IEEE Computer Society Press, New York (2000)

Suggest Documents