High-dimensional neural-network artificial intelligence ...

2 downloads 0 Views 893KB Size Report
Jul 8, 2017 - a new smell, and gradually expanding the database. Ivanov A.I. .... will always lead to the same output code of the neuro-network. " c" . Fig. 3.
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/305873307

High-dimensional neural-network artificial intelligence capable of quick learning to recognize a new smell, and... Conference Paper · July 2016 DOI: 10.1109/DIPDMWC.2016.7529413

CITATIONS

READS

2

17

5 authors, including: Y. M. Kuznetsov HSE 4 PUBLICATIONS 3 CITATIONS SEE PROFILE

All content following this page was uploaded by Y. M. Kuznetsov on 08 July 2017.

The user has requested enhancement of the downloaded file.

High-dimensional neural-network artificial intelligence capable of quick learning to recognize a new smell, and gradually expanding the database Ivanov A.I. Laboratory of biometric and neural network technologies , Penza Research Electrotechnical Institute, Penza, Russia, ivan@pnieLpenza,ru

Kulagin V.P., Kuznetsov Y.M., Chulkova C.M.

National Research University Higher School of Economics, Moscow, Russia, [email protected] Ivannikov A.D. The Institute for Design Problems in Microelectronics of Russian Academy of Sciences Moscow, Russia, [email protected]

Abstract- We demonstrate that classical quadratic forms are

not able to solve the problem of recognizing high-dimensional images, The "deep" Galushkin-Hinton neural networks can solve the problem of high-dimensional image recognition, but their training

has

exponential

computational

complexity,

It

is

technically impossible to train and retrain a "deep" neural network

rapidly,

For

mobile

"artificial

nose"

systems

we

proposed to employ a number of "wide" neural networks trained in accordance with (GOST R

52633.5-2011).

This standardized

learning algorithm has a linear computational complexity, i.e. for each new smell image a time of about

0.3 seconds is sufficient for 2024 inputs and

creating and training a new neural network with

256 the

outputs. This leads to the possibility of the rapid training of artificial

intelligence

"artificial

expansion of its database consisting of

nose"

10 000

and

a

-

gradual

-

-

or more trained

artificial neural networks.

Fig.1. Examples of distribution functions of a "bad" parameter (

Keywords- architecture of neuron networks, recognizing high­ dimensional images, exponential computational complexity, mobile "artificial nose" systems

I.

THE PROBLEM OF HIGH-DIMENSIONAL PROCESSING OF BIG " " AMOUNT OF BAD DATA INSTEAD OF SMALL AMOUNT OF " " EXTREMELY GOOD DATA

The main task of natural and artificial intelligence is to reduce the entropy of the input data to an acceptable level. Natural images of objects to be identified usually have quite a few parameters. One distinguishes "bad" parameters with high entropy and "good" parameters with low entropy. Referring to fig. 1, p(Vj) is a distribution function of a "bad" parameter, and p(vz) is a distribution function of a "good" parameter. The entropy of these parameters can be computed if a primitive rule is developed for each of them. The entropy of these parameters can be computed if a primitive rule is developed for each of them. Then we can calculate the probability of second-order errors (mistaking a "stranger" for a "familiar"), and then find the parameter informative value I(vJ -logz (F'z (p(vJ) (1), =

and a "good" parameter

p(�) . P, (p(vJ)

=

p, . j

=

(p(v2)) 1

O'(�)

v ) p( 1) against the distribution function of "strangers"

k2

.J21[ f exp 21f

{- (E(0 -vy } 2· 0' 2 (�)

kl

(2),

dVj

where kl and k2 are upper and lower thresholds of the rule, E(;) is the expectation value of the parameters "strangers", (J'

(;)

is the standard deviation of the parameters "strangers". The higher the parameter informative value is the better. For a common person, the parameter informative value is acceptable being approximate of 7 bits (Pz'" 0.01). For the situation depicted in fig. 1 I(vj) 0.27 bits, I(v 2 ) 3.6 bits. This means that to get an acceptable level of informative value one needs two or more "good" parameters, or 25 or more "bad" parameters. The parameters differ by a factor of 10 due to their informative values. Formally, we can always use 25 "bad" parameters instead of 2 "good" parameters. It is even better to use all the parameters - "good" ones, which are few, and "bad" ones, which are hundreds of times as many =

=

where

ISBN:

978-1-4673-9379-9 ©2016 IEEE

332

II.

A deep network of neurons of Galushkin-Hinton

PROBLEMS OF CLASSICAL MULTIDIMENSIONAL STATISTICS

If the relevant parameters are independent. their overall informative value is easy to determine. It is sufficient to sum all the individual informative values. If. however. the parameters are not independent (if they correlate). then one has to use quadratic forms. In general. quadratic forms are written as follows: e2

=

(E(v)

-

vf· [RJ-1• (E(v) v) ,

(3)

-

where E (v) is a vector of expectation values of a few hundred parameters of the sample, 11 is a vector of the parameters of the sample to be identified normalized to the standard deviation (f(v) T, [R"t is the inverse correlation matrix of a =

few hundred (thousand) controlled parameters of the sample to be recognized. The higher the level of the correlation of the data, the lower their overall informative value. Thus, for biometric data one typically has O'(r) 0.3 [1, 2]. Such a level of correlation leads to about a 10-fold reduction of the level of overall informative value: =

I(vj,vz,v3,....,v,,)

=

lI�. E(J(vtl)

(4).

The smell of a human being is unique, so it can be taken as a biometric image. This means that to achieve an acceptable level of informative value, instead of 25 independent "bad" biometric parameters of the smell, one will have to use 250 correlated "bad" biometric parameters of smell. Computation of a 250x250 correlation matrix is not difficult technically. It is difficult, however, to invert it for use in equation (3). For biometric data, for example vocal data, it is possible to invert a 10xlO correlation matrix (with a vocoder LPClO), and a 12x12 correlation matrix (with a vocoder LPC12). There is no data about correct inversion of higher-order matrices. Thus, it is impossible to speak about correct inversion of 250x250 matrices. III.

USE OF DEEP NEURAL NETWORKS WITH EXPONENTIAL

Fig. 2. Two types of architecture of neuron networks: "deep" neuron networks and "wide" neuron networks.

Nowadays, networks containing up to 10 layers of neurons are used to identify people's faces in digital photographs. The same "deep" networks recognize voice commands, classify pictures on the Internet, track viruses and read car registration numbers. The main problem of using "deep" neuron networks of Galushkin-Hinton is the high computational complexity of training these networks. Hinton trained his "deep" networks using a few tens of servers with a database of images containing about 100 000 examples. Typically, "deep" neuron networks have up to 1 000 000 connections, whose states can be found within several hours of collective computations. If a regular machine is used, training takes about 40 days of continuous operation. As a result, it is impossible to create the artificial intelligence of the artificial nose on the basis of "deep" neuron networks of Galushkin-Hinton. If this path is taken, then it is impossible to teach rapidly an artificial nose a new smell in a new environment of smells. It is only possible to realize a mobile system with a fixed database, for which ageing (partial of complete failure of even a single sensitive element) of the solid-state gas-sensitive matrix is a catastrophe. IV.

USE OF WIDE NEURAL NETWORKS WITH LINEAR

COMPUTATIONAL COMPLEXITY OF TRAINING

COMPUTATIONAL COMPLEXITY OF THE TRAINING ALGORITHM

The use of large artificial neuron networks is the obvious alternative to the limited capacity of linear algebra (multidimensional mathematical statistics). In particular, one can use multilayer ("deep") neuron networks first proposed by our compatriot A.I. Galushkin in 1974 [3]. The architecture of a "deep" neuron network is given on the left panel of fig. 2. For quite a long time scientists did not know how to train "deep" neuron networks. A breakthrough was made in the early 2000s by Geoffrey Hinton [4]. Nowadays, networks containing up to 10 layers of neurons are used to identify people's faces in digital photographs. The same "deep" networks recognize voice commands, classify pictures on the Internet, track viruses and read car registration numbers. For quite a long time scientists did not know how to train "deep" neuron networks. A breakthrough was made in the early 2000s by Geoffrey Hinton [4].

A solution to the problem of divergent characteristics of the sensitive elements of solid-state gas-sensitive matrices and their degradation is the use of "wide" neutron networks of large dimensions. An example of such a network is well­ known: it is used in the modeling environment "Bio Neuro Autograph" [5]. The "wide" network of this product has 416 inputs and 256 outputs, and each neuron has 24 inputs randomly connected to 416 input biometric parameters. The network is trained with the algorithm (GOST R 52633.5-2011) [6], which has a linear computational complexity. The input is implemented with a mouse or any graphical pad operating in the mouse regime. The data are processed with the use of the coordinates x(t) and y(t) of the tip of the pen when a symbol or a group of symbols are being drawn. Controlled parameters are produced through a Fourier transform of x(t) and y(t). Geometrically, the 256 neuron of the "wide" network form a dividing hyperplane in a 24-dimensional space. Any

ISBN:

978-1-4673-9379-9 ©2016 IEEE

333

pair of biometric parameters will produce an elliptical data "Familiar" distribution and the hyperplane will become a line. The training algorithm is organized so that none of the hyperplanes intersects the hyperellips of the data "Familiar". This situation is shown in fig. 3. Fig. 3 shows that getting inside the hyperellips "Familiar" will always lead to the same output code of the neuro-network "

c

"

Getting inside the hyperellipse "Familiar" leads to the appearance of the Hamming distance h = 0; any other distance indicates probable detection of the image "Stranger". The more the Hamming distance, the more the probability of detection of the image "Stranger". V. OBTAINING LARGE VOLUME OF DATA FROM A LIMITED NUMBER OF TRANSITION PROCESSES IN THE SENSORS OF THE SOLID-STATE GAS-SENSITIVE MATRIX

.

All currently existing solid-state gas-sensitive matrices have a small number of sensitive elements. Consider a matrix with 9 sensitive elements. In the analysis of the gases at the outputs of the 9 sensitive elements one has a sequence 0 9 transition processes: (9). A (t), E(t), C(t}, D(t), E(t}, I(t), K(t), L(t}, M(t) Each of the transition processes (9) is digitized as a sequence of 32 counts. To increase the number of counts, concatenate the transition processes: x(t) = A(t) ..1 B(t) ..1 C (t) ..1 D(t); (10). y(t) = D(t) ..1 E(t) ..1 I(t) ..1 K(t); z(t) = K(t) ..1 L(t) ..1 M(t) ..1 A(t).

1

As a result we have three quite long transition processes x(t), y(t), z(t), each consisting of 128 counts. These processes can be Fourier-transformed to produce 643=262144 coefficients. Such a significant number of coefficients is redundant. For the subsequent neuro-network analysis it suffices to use the first most significant 2048 coefficients of the 3-dimensional Fourier transform. Fig. 3. A schematic of the neuro-network emulator of quadratic forms

VI.

SOLUTION OF THE PROBLEM OF CREATING THE

DATABASE "STRANGER" BY CLONING THE FIRST IMAGE OF THE

In other words, the neuro-network completely removes its own entropy of the data "Familiar" making it almost zero: (5). H(V) »H("c") '" 0.1 bits The entropy of the data of the image "Familiar" is comparable to that of "Stranger": H(V) '" H(�) (6). However, the entropy of the code "Familiar" and of the code "Stranger" are not: H("c") « H(" x" ) '" 0.1· 256 =25.6 bits (7). In fact, the biometric data of the image "Stranger" are mixed. In fig/ 3 this situation is depicted as multiple intersection of the ellipse "Stranger" and straight lines. Every section of the ellipse "Stranger" corresponds to its own value of the output code of the neuro-network. Thus, the "wide" neuron network can be viewed as an emulator of a multidimensional quadratic form in the space of distances of Hemming:

SMELL KNOWN TO THE SYSTEM

In order to use the training algorithm (GOST R 52633.5-2011) [6] it is necessary to have several examples of the image of the smell "Familiar" (8 to 20 examples) and about 256 examples of the image of the smell "Stranger". At the beginning of the process of training to recognize the first image, the embryo of the artificial intelligence must have 256 examples of the image "Stranger" already loaded. If this condition is not satisfied, the training is impossible. If the 256 examples of the image "Stranger" are not available to the embryo of the artificial intelligence, then they can be obtained by "cloning" of the first (and only) known image. Such "cloning" is implemented though by of the distribution of the only image through ± �E(Vi) as shown in fig. 4. Fig. 4 shows that the standard deviation of a single biometric parameter is three times as small as the standard deviation of all biometric parameters. In order to clone the 256 " ,, (8) first biometric parameter of the image "Familiar", use the shift h = I (" ej ) EB (" xj ) , ]=1 of its expectation value by �E(Vl)' To calculate the shift it is where "ci" is the i-th code position of the neuro-network necessary to know the probability of first- and second-order " " errors for adjacent images. If one sets PEE=O.OI, then for a trained to distinguish the smell "Familiar", x - i-th code Gaussian distribution law: position of the neuro-network trained to distinguish the smell (11) "Familiar" affected by the smell "Stranger", EB is addition modulo 2.

ISBN:

978-1-4673-9379-9 ©2016 IEEE

334

REFERENCES [I]

Iazov

Iu.K.

et

al.

Nei'rosetevaia

zashchita

personal'ny'kh

biometricheskikh danny'kh. IIIu.K.Iazov (redaktor i avtor), soavtory' V.I. Vo1chihin. AI. Ivanov, V.A Funtikov, I.G. Nazarov II M.: Radiotekhnika, 2012 g. 157 s. IBSN 978-5-88070-044-8. 0.2

1---;-+----++---+f- ,-+-\----+--,.-1I---'r+----t7----t-+----- i

[2]. Akhmetov B.S., Ivanov AI., Funtikov V.A., Beziaev AV.. Maly'gina E.A Tekhnologiia ispol'zovaniia bol'shikh nei'ronny'kh setei' dlia preobrazovaniia necheikikh biometricheskikh danny'kh v kod cliucha dostupa. Monografiia. Kazakhstan. g. Almaty', TOO «Izdatel'stvo LEM». 2014 g. -144 c.. nahoditsia v otkry'tom dostupe (http://portal.kazntu.kz/files/publicate/20 14-06-27 -11940.pdt) [3]. Galushkin A I. Sintez nillogosloi'ny'kh sistem raspoznavaniia obrazov. - M.: «E'nergiia», 1974. [4]. Geoffrey E. Hinton Training Products of Experts by Minimizing

"00"

Contrastive Divergence. Gatsby Computational Neuroscience Unit. University College London, 2002.

kl

Fig. 4. An example of "cloning" (red) of the image "familiar" (dark blue) by shifting the data to the left and right through M(v1) in a one-dimensional

space.

[5]. Ivanov AI., Zaharov O.s. Sreda modelirovaniia «BioNei'roAvtograf» - russkoiazy'chny'i' programnillyT produkt. sozdan laboratoriei' biometricheskikh i nei'rosetevy'kh tekhnologii', razmeshchen na sane AO «PNIE'l» http://pnie'Lrf/activity/science/noc.htm s 2009 g. dlia svobodnogo ispol' zovaniia universitetami Rossii, BeloruSSii, Kazakhstana.

It is clear that images cloned by the shift ( 11) are linearly distinguishable. In particular, one-dimensional versions of such images can be distinguished with a multilevel quantizer, whose steps are described as follows: (I2). kj (vj) E(vj) ± j. 2.33· a(vj) where i=l, 2 =

The sign in (12) is chosen according as the right or left clones of the biometric data are distinguished. For a system that controls 2048 parameters the above method of cloning will produce 42048. This is a large number, and using all possible variants is not reasonable. It is sufficient to make a random choice of 256 images "Stranger". CONCLUSION

From what was said above it follows that replacing "deep" neuron networks of Galushkin-Hinton with "wide" networks allows rapid training of the artificial intelligence to recognize smells. Such a system will have its neuron network for each smell. During the training process, to each new smell will correspond a new neuron network. Preliminary estimates show that the suggested technical solutions will allow recognition of 10 000 or more smells and their mixtures. When such large databases are used, one faces the problem of their ordering for increasing the speed of searching them for smells similar to the smell to be recognized. ACKNOWLEDGMENT

This work is conducted with financial support of the Russian Federation represented by the Ministry of Education and Science of the Russian Federation (unique identifier RFMEFI 60815X0003) .

ISBN: View publication stats

978-1-4673-9379-9 ©2016 IEEE

335