Machine learning methods in data fusion systems

Machine learning methods in data fusion systems Robert Nowak

Rafał Biedrzycki

Jacek Misiurewicz

Institute of Electronic Systems Warsaw University of Technology, Warsaw, Poland Email: [email protected]

Abstract—In heterogeneous, multisensor and multitarget data fusion systems the notion of “levels” is used in order to divide the complex problem of discovering relationships between objects into parts which are easier to understand. In presented paper we consider classifiers as general feature generators, these algorithms are able to connect data from different sensors and different observations. The classifier increases the level of data abstraction, which simplifies the architecture of following system components in data fusion chain. A data fusion engine named DAFNE uses the presented paradigm in its classifier module. The module was implemented in Python and C++, the Na¨ıve Bayesian and decision tree classifiers were used. The tests on simulated data shows improvement of data quality via fusion. The system design allowed to attain real-time processing with limited data volume.

Keywords: data fusion, machine learning, classifier, applications I. I NTRODUCTION In an heterogeneous multisensor and multitarget environment data fusion algorithms are required, because human operator is unable to handle huge amount of raw data. Different types of sensors produce different sets of features of observed object. These features are determined by sensor type and by local context (e.g. sensor location). The general objective of the fusion process is to gain knowledge about threats emerging in the observed environment. Unfortunately, producing such knowledge from data is hard to achieve using computational methods, there is no general artificial intelligence algorithm available. To make the problem easier, several models were proposed that divide that problem into smaller ones. One of the most popular, revised Joint Directions of Laboratories (JDL) [1], [2], incorporates five processes or levels including: level 0 for preprocessing, level 1 for object refinement, level 2 for situation refinement, level 3 for threat refinement and level 4 for process refinement. Each process requires understanding of just some sensor data, giving results in higher level of abstraction. Such decomposition helps to create systems working correctly in limited number of similar cases. In particular, track level fusion (object refinement) algorithms are able to process sensor data to interpret only object location (object position). As described by [3] tracking could be divided into four subprocesses: data alignment, data association, object estimation and object identity. If the position data can be identified in sensor output and then transformed from different sensors to the common coordinate system the objects

are extracted and tracked with high probability. The additional features (e.g. license plate numbers from plate readers) allows to increase the accuracy of tracking, but are not essential. The classifiers and pattern recognition algorithms known from machine learning domain are used in fusion systems in object identity process and in situation refinement and/or threat refinement [1], [4]. One possible approach of learning or adapting classifiers in a data fusion engine was presented in [5]. In presented paper we consider classifiers as general feature generators. The classifiers are employed in order to link data from different sensors and different observations, when feature values could be interpreted and converted to common set of values. Similarly to tracking, the feature fusion with classifier requires data preprocessing (i.e. converting to common set of values) and data association (using tracker associations and/or additional grouping algorithms). The classifier outputs simplify other processes by increasing the level of data abstraction. Additionally, the classifiers are able to create models automatically using a set of examples, so they are able to discover previously unknown rules. Such models could be very complex and accurate. II. M ACHINE LEARNING METHODS Typical classifier in machine learning domain works on a given vector of discrete attributes that describe interesting object. In data fusion system not every vector element is available all the time, moreover, many sensors produce output that is hard to use in classifier algorithm. Therefore, an additional data processing is required. The classifiers are able to connect data from different sensors and different observations, when the feature values could be interpreted and converted to common set of values. The data interpretation and conversion, called feature fusion data alignment, consists two processes: synonym finding and domain conversion. The synonym finding retrieves values of the same meaning from different sensor types, and it requires semantic processing for features used by classifier, similarly to tracking which requires knowledge of formats of location features. The domain conversion reduces the number of different attribute values, as depicted in Fig. 1. This process is required especially in data fusion systems, because real sensors often produce values in continuous domain, and typical classifier requirea values from a finite set. In presented system we use discretizers for this conversion; a discretizer is based

value

v3 v3 v2

v2 v2

v2 v1

time

Figure 1. Discretization in feature fusion data alignment. The continuous signal is converted to discrete time (sampling) by sensor, then to finite set of values 𝑉 = {𝑣1 , 𝑣2 , 𝑣3 } (discretization) by discretizer.

on a collection of disjoint intervals and it returns the index of interval containing given value. In presented approach the discretizer could be constructed by human expert or automatically, using collection of values from given domain. The following methods to calculate intervals from set of training values are available: ∙ partition of 𝐾 equal length intervals; ∙ partition of 𝐾 intervals, based on K-means clustering algorithm [6]. The data association process for feature fusion uses tracker associations to select features describing a given object. At that step preliminary fusion occurs – similar values from different sources (sensors) and values observed in the close period of time are grouped. Areas with different sensor coverage (e.g. in an urban scenario) make the collection of features very variable. Some features in given time-space point are available in many instances, the others are not available. In the presented approach a preprocessing step was introduced. At that step a sliding time window is used to group feature values to calculate min/max/median, as depicted in Fig. 2. This technique allows to aggregate a number of values of val11

val12

feature 2

val2

1

2

feature N

valN1

val2

val13

valN2

where 𝑐𝑖 ∈ 𝐶, 𝑥1 ∈ 𝑋1 , ..., 𝑥𝑛 ∈ 𝑋𝑛 The results of classification is the probability of each category 𝑃 (𝑐∣𝑥1 , ..., 𝑥𝑛 ) or, in the simplest case, the category with maximum probability, i.e. arg max𝑐∈𝐶 𝑃 (𝑐∣𝑥1 , ..., 𝑥𝑛 ). Decision tree classifier uses tree structure as depicted in Fig. 3. Each node 𝑣 stores the probability of each category cat1 test1 cat3

cat2

test3 cat4

time window feature 1

created by tracking level of data fusion, and associations between different features in given observation. The classifier algorithms carry out feature level fusion using the values from different sources and with different meaning, including values not used by tracking module (e.g. width) and features calculated by tracker (e.g. estimated speed or acceleration). The broad range of features are connected to produce label. In presented approach the Na¨ıve Bayes (NB) and decision tree (DT) classifiers are used. A Na¨ıve Bayes Classifier calculates probability of all categories using Bayes theorem and total probability for attribute values, assuming attribute independence as shown in Eq. 1, where 𝐶 = {𝑐1 , ..., 𝑐𝑚 } is the finite set of discrete categories, 𝑥𝑗 is j-th attribute, and 𝑋𝑗 is the finite set of j-th attribute values, called j-th attribute domain. ∏𝑛 𝑃 (𝑐𝑖 ) 𝑗=1 𝑃 (𝑥𝑗 ∣𝑐𝑖 ) ∏𝑛 𝑃 (𝑐𝑖 ∣𝑥1 , ..., 𝑥𝑛 ) = ∑ (1) 𝑐∈𝐶 𝑃 (𝑐) 𝑗=1 𝑃 (𝑥𝑗 ∣𝑐)

val14 Classifier valN4

associations Tracker

Figure 2. Data association process for feature fusion. The tracker associations and the sliding time window is used to reduce number of missing attribute values. For each feature used by classifier the values are collected (e.g. {𝑣𝑎𝑙11 , 𝑣𝑎𝑙12 , ...} for feature 1), then values with observation time 𝑡 : 𝑡𝑐 −𝑤 ≤ 𝑡 ≤ 𝑡𝑐 are selected, where 𝑡𝑐 is current time and 𝑤 is the time window length. Selected values are used to calculate median or minimum or maximum value that is finally used as input for the classifier.

given feature to one value. On the other hand it can reduce the number of missing attribute values. The introduced memory of historical values allows to find and use (or eliminate) values which deviate from the mean. For example we can calculate median of speed in selected time-window that could eliminate classifying car stopped on the crossroads as a person when only speed attribute is available. The classifiers are able to produce information based on two types of associations: associations between observations,

cat5

Figure 3. Decision tree data structure. Each node 𝑛 consists collection of probabilities for each category 𝑐 ∈ 𝐶 (called cat), 𝑐𝑎𝑡𝑛 = {(𝑐1 , 𝑃𝑛 (𝑐1 )), (𝑐2 , 𝑃𝑛 (𝑐2 )), ..., (𝑐𝑚 , 𝑃𝑛 (𝑐𝑚 ))} for objects that entered 𝑛. All internal nodes also contain a test that is performed on example’s attribute.

for objects associated with 𝑣. Therefore, the root node describes all objects and it is the least precise objects description. The leaf node are the most precise objects description and it describes very similar objects. In the presented approach a binary equality test is associated with each internal node. The decision tree structure is traveled from root to leaf using Alg. 1, where next node is chosen based on test results on selected attribute of given object 𝑒. Both types of classifiers could work with missing attribute values. Na¨ıve Bayesian approach assumes conditional probability 1 for such attribute, the corresponding factor is not considered in product in Eq. 1. Decision tree classifier assumes false result of test (depicted in Alg. 1) for missing attribute value. The classifier rules can be created by an expert (without training) by the use of provided GUI. Additionally the training algorithms are available. In this case the expert should only create the labeled set of examples, and the classifier rules are created automatically using this set. The training helps

Algorithm 1 Decision tree classification procedure C LASSIFY((𝑇, 𝑒)) ⊳ decision tree 𝑇 , example 𝑒 𝑣 ← root(𝑇 ) while ¬𝑣.isLeaf() do if 𝑣.test(𝑒) then 𝑣 ← 𝑣.leftChild else 𝑣 ← 𝑣.rightChild end if end while return 𝑣.category end procedure

from 𝐸 supporting given node is small (𝐸𝑚 parameter) or there exist only few examples with not major category the leaf is created, otherwise the inner node is constructed. The inner node creation involves best test searching. In presented method the test giving the best information gain is chosen. The information gain 𝑔 is calculated as shown in Eq. 2, 𝑒𝑛(𝐸𝑝 ) + 𝑒𝑛(𝐸𝑛 ) − 𝑒𝑛(𝐸) 𝑔 = ∣𝐸 ∣ ∣𝐸𝑝 ∣ ∣𝐸𝑛 ∣ ∣𝐸𝑛 ∣ 𝑝 ∣𝐸∣ log( ∣𝐸∣ ) + ∣𝐸∣ log( ∣𝐸∣ ) where 𝐸𝑝 = {𝑒 ∈ 𝐸 : 𝑡(𝑒) = true}, 𝐸𝑛 = {𝑒 ∈ 𝐸 : 𝑡(𝑒) = false} ∑ ∣{𝑒 ∈ 𝐸 : 𝑒𝑐 = 𝑐}∣ ∣{𝑒 ∈ 𝐸 : 𝑒𝑐 = 𝑐}∣ 𝑙𝑜𝑔( ) 𝑒𝑛(𝐸) = ∣𝐸∣ ∣𝐸∣ 𝑐∈𝐶

to create models when no prior rules are known and/or to construct very complex and accurate models. The training of NB Classifier is performed by the algorithm that estimates probabilities shown in Eq. 1, i.e. conditional probabilities 𝑃 (𝑥∣𝑐) and categories probability 𝑃 (𝑐) for each category 𝑐 and attribute value 𝑥. These probabilities are estimated by frequency counted on given set of data. Additionally the Laplace smoothing is applied. In presented approach 𝐴𝑖𝑗 +1 , where 𝑥𝑗 is observed value, 𝑐𝑖 - given 𝑃 (𝑥𝑗 ∣𝑐𝑖 ) ≈ 𝐴𝑖 +∣𝑋 𝑗∣ category, 𝐴𝑖𝑗 is number of examples in training set with value 𝑥𝑗 and category 𝑐𝑖 , 𝐴𝑖 is number of examples with category 𝑐𝑖 , ∣𝑋𝑗 ∣ is number of values of attribute 𝑗 (number of elements in given attribute domain). The probabilities of getting given 𝑖 +1 , where 𝑁 is category is estimated similarly, 𝑃 (𝑐𝑖 ) ≈ 𝑁𝐴+∣𝐶∣ number of training examples, 𝐶 number of categories. The Iterative Dichotomiser 3 inspired algorithm [7] used to generate Decision Tree is depicted in Alg. 2. In presented Algorithm 2 Building decision tree using a set of labeled examples 𝐸 (training examples) procedure B UILD T REE(𝐸) 𝑇 ← generateAllTests(𝐸) 𝑟𝑜𝑜𝑡 ← BuildNode(𝑇 , 𝐸) return 𝑟𝑜𝑜𝑡 end procedure procedure B UILD N ODE(𝑇 , 𝐸) 𝑐ˆ ← arg max𝑐∈𝐶 ∣{𝑒 ∈ 𝐸 : 𝑒𝑐 = 𝑐}∣ if ∣𝐸∣ < 𝐸𝑚 ∨ ∀𝑐∈𝐶:𝑐∕=𝑐ˆ∣{𝑒 ∈ 𝐸 : 𝑒𝑐 = 𝑐}∣ < 𝐸𝑚 then return LeafNode(𝐸) end if 𝑡 ← selectBestTest(𝑇 ,𝐸) 𝐸𝑝 ← {𝑒 ∈ 𝐸 : 𝑡(𝑒) = True } 𝐸𝑓 ← {𝑒 ∈ 𝐸 : 𝑡(𝑒) = False } 𝑣 ← InternalNode(𝑡,𝐸) 𝑣.leftChild = BuildNode(𝑇 ∖ {𝑡}, 𝐸𝑝 ) 𝑣.rightChild = BuildNode(𝑇 ∖ {𝑡}, 𝐸𝑛 ) return 𝑣 end procedure solution it is recursive procedure that builds the tree using a set of training examples 𝐸. If the number of examples

(2) where 𝑡 is binary test, 𝐸 is example set supporting given node, 𝑐 ∈ 𝐶 is a category. The inner node test splits examples 𝐸 into two sub-sets 𝐸𝑝 and 𝐸𝑛 used to recursively construct left and right children nodes. After building the decision tree the pruning process is performed. The pruning reduces overfitting by reducing the size of the tree. The tree reduction is performed it will not increase classifier error calculated on the unused set of labeled examples. In presented application the classifiers output can be merged using the simplified Dempster-Shafer rule [8], as depicted in Eq. 3, ∏𝐾 𝑃𝑘 (𝑐𝑖 ) (3) 𝑃 (𝑐𝑖 ) = ∑∣𝐶∣ 𝑘=1 ∏𝐾 𝑃 (𝑐 ) 𝑘 𝑖 𝑖=1 𝑘=1 where 𝑃𝑘 (𝑐𝑖 ) is the result of 𝑘 classifier for category 𝑐𝑖 , and 𝐾 is the number of classifiers. III. DAFNE SYSTEM A DAFNE project was undertaken in an effort to create a platform for experimenting with fusion algorithms for multisensor urban environment. DAFNE stands for “Distributed and Adaptive multisensor FusioN Engine”. The engine is accompanied with a realistic sensor data simulator [9] and an evaluation module [10], as depicted in Fig. 4. Thus, the DAFNE system as a whole constitutes a good testing platform which allows to create almost-real data sets. The engine itself consists of database and tracking modules [11], resource management module [12], classifier module and decision support module. A flexible design of the system, using thin client and http protocol [13], allows to exchange modules without disturbing the system integrity. This allows for easy experimenting with different algorithms. The classifier as part of Dafne uses features included in observations produced by sensor simulator and the associations delivered by tracker. The classifier module consist of the web-based user interface, application programming web interface, classifier server module, persistence layer (database) and classifier core, as depicted in Fig. 5. The main algorithms (NB Classifier, DT Classifier) are implemented in C++ due to its high performance. The shared library with active object

user interface

fusion engine simulator (SIM)

tracker (TRK)

classifier (CLA)

decision support (DEC)

database

performance evaluation (EVAL)

Figure 4. User interface

Modules in Dafne system.

API

http server classifier core (C++) classifier server (Python) classifier handle classifier handle

Naive Bayesian commands

Decision Tree

results scheduler

database (Postgres)

Figure 5.

classifier rules results train examples

thread pool

Classifier module structure, this is the part of DAFNE project.

design pattern allows to execute time consuming tasks (e.g. classifier learning) in separate thread of execution. This library provides Python application interface (including classifier handlers, commands, results, etc.). The data alignment and data association processes are implemented in Python due to its flexibility. Classifier rules, created by experts or calculated from the example data, are stored in the database, the input and output data is stored there as well. The Dafne system is time-triggered. The tracker produces estimates of object’s features (position, speed) periodically. Each data update from tracker undergoes classification process to produce features required by decision support module.

material (organic, metallic). This scenario was simulated by the Dafne simulator, the number of readings from sensors was 2958, 61.53% was generated for pedestrians. The classifier task was to add labels ,,vehicle” or ,,pedestrian” to each observation. The data used in the experiments is available on http://staff.elka.pw.edu.pl/∼rbiedrzy/fusion.html. The classifier error, in comparison to ground truth data, was measured. Unfortunately, the classifier error includes tracker bad associations, because in presented solution the tracker associations are used to connect observations on classifier input (data association process). In case of of presented dataset tracker gave 27.37% of bad associations and 10.04% of object id changes, where by bad association we understood using observation of plot (a real object seen by sensor) 𝑥 to update track (an object created by tracker to explain its input data) 𝑡 generated for plot 𝑦, where 𝑥 ∕= 𝑦. By id change we understood situation where track 𝑡 for plot 𝑦 is at some point continued for plot 𝑥. All classifiers use the same discretizers for ”width” feature and for ”speed” feature, both were created by human. The discretizer for ”width” feature is based on fact that maximum shoulder breadth for 95% of 20 years old males in USA is 0.45 m [14]. We added 20 cm to that value to recompense clothes size and measurement errors. Therefore, the discretizer intervals are defined as follows: (0, 0.65⟩ → 0; (0.65, 200⟩ → 1. The discretizer for ”speed” feature is has 4 sections. It is known [15], that the average walking pedestrian has speed about 1.25 m/s (older) up to 1.51 m/s (younger). We decided to use two intervals that should feet for pedestrians: (0, 1.43⟩ → 0; (1.43, 6⟩ → 1. The second interval should also suffice for running person. The next interval, (6, 12.1⟩ → 2), can hold slow cars or very fast runners (the fastest run on 100m distance is at speed about 12.1 m/s). The last interval (12.1, 85⟩ → 3 is appropriate for speeds that are available only for vehicles. The classifiers used in following experiments were created by human. The DT structure is depicted in Fig. 6 and NB classifier probabilities are presented in Tab. I.

IV. R ESULTS The system including the classifier module was tested in multisensor urban scenario on simulated data. In presented work an embassy protection scenario was used, the threat should be signaled when a suspected vehicle stops near the embassy. This scenario runs 180 second, 42 objects of different characteristics (vehicles, pedestrians) were observed by 100 sensors in different locations and different types and characteristics (optical cameras, infrared cameras, radars, plate readers, hyperspectral imaging sensors), including measure of position, temperature, color, size, license plate numbers,

Figure 6. Human made DT. The root node is on the top. Each internal node contain test description. Each leaf node contains name and probability of the major category. The first test (root node test) checks if object speed is 3 (very fast). If this is true leaf node is next and classifier generates feature ”class” with value ”vehicle” and probability 99%. If first test returns ”false” the next test checks if speed is 2. If it is not true an additional checks on material attribute are performed.

Table I T HE CATEGORIES PROBABILITIES 𝑃 (𝑐) AND CONDITIONAL PROBABILITIES 𝑃 (𝑐∣𝑥) FOR HUMAN MADE NB CLASSIFIER .

𝑃 (𝑐) 𝑃 (𝑐∣𝑤𝑖𝑑𝑡ℎ = 0) 𝑃 (𝑐∣𝑤𝑖𝑑𝑡ℎ = 1) 𝑃 (𝑐∣𝑚𝑎𝑡𝑒𝑟𝑖𝑎𝑙 = 0) 𝑃 (𝑐∣𝑚𝑎𝑡𝑒𝑟𝑖𝑎𝑙 = 1) 𝑃 (𝑐∣𝑚𝑎𝑡𝑒𝑟𝑖𝑎𝑙 = 2) 𝑃 (𝑐∣𝑠𝑝𝑒𝑒𝑑 = 0) 𝑃 (𝑐∣𝑠𝑝𝑒𝑒𝑑 = 1) 𝑃 (𝑐∣𝑠𝑝𝑒𝑒𝑑 = 2) 𝑃 (𝑐∣𝑠𝑝𝑒𝑒𝑑 = 3)

𝑐 =pedestrian 0.61 0.95 0.05 0.01 0.01 0.98 0.70 0.20 0.099 0.001

𝑐 =vehicle 0.39 0.05 0.95 0.01 0.98 0.01 0.25 0.25 0.25 0.25

input attributes speed material width speed,mat.,width DS fusion tuned DS

Table II C ONFUSION MATRIX AND ERROR RATE ( IN %) OF NB CLASSIFIER FOR DIFFERENT DATA ASSOCIATION TIME WINDOW LENGTH .

time-window 𝑤[s] no data assoc. 1 3 5 11 13 20 100

p. as p. 1788 1788 1784 1779 1765 1760 1749 1695

v. as v. 662 662 705 736 774 778 779 779

v. as p. 476 476 433 402 364 360 359 359

p. as v. 32 32 36 41 55 60 71 125

Table III C ONFUSION MATRIX AND ERROR RATE ( IN %) OF NB CLASSIFIER FOR DIFFERENT SETS OF ATTRIBUTES AND DIFFERENT POINT OF FUSION .

error[%] 17.17 17.17 15.86 14.98 14.16 14.20 14.54 16.36

In the following tables with the results of the experiments we provided classifier error rate and confusion matrix where ”p. as p.” column shows how many pedestrians was classified as pedestrians, ”v. as v.” shows how many vehicles was classified as vehicles, etc. In the first experiment the influence of feature fusion data association time window length (i.e. 𝑤 in Fig. 2) to classifier accuracy was investigated. For this experiment we used the human-made NB classifier (shown in Tab. I) that was working only on speed feature. As an as aggregation function in the preprocessing step we selected maximum, because this function is fast to compute and the differences between vehicle’s and pedestrian’s speed are large so reasonable amount of noise does not change classification result. The median was also used here, it is better if bigger errors in speed measurement are simulated. The results for selected lengths of the time widow are presented in Tab. II. From that table we can notice that enabling data association and setting time window value greater than 3 seconds improves classifier accuracy. The best result was achieved for 11 seconds time window length. Unfortunately, the longer time window the more data to process, so the value of 5 seconds is recommended, it gives good classifier accuracy and good performance. This value is used in others experiments. Table II shows that with the increase of time window length the number of correctly classified pedestrians decreases.

p. as p. 1779 1802 1196 1775 1800 1769

v. as v. 736 440 545 840 448 818

v. as p. 402 698 593 298 690 320

p. as v. 41 18 624 45 20 51

error[%] 14.98 24.21 41.14 11.60 24.00 12.54

This is reasonable, because rising time-window length also increases a probability of observing noisy data. On the other hand, with the increase of time-window length, the number of correctly detected vehicles increases. This is connected with simulated scenario in which some vehicles stops after run. In the second experiment we compare results of humanmade NB classifier (Tab. I) for different sets of attributes and different point of fusion explained below. The maximum aggregate function was used for ”speed” feature and the median was used for ”material” and ”width”. The attribute level fusion is performed by the NB classifier that uses three attributes (speed, material, width). The decision level fusion was performed on the results of single-attribute classifiers by Dempster-Shafer algorithm (DS fusion). The results are presented in Tab. III. In that table we could see that ”speed” is the most informative attribute and the second best is ”material”. The attribute level fusion is better than the best single-attribute classifier (here based on ”speed” feature) and it is better than decision level fusion. An attempt to fuse on decision level the results of individual classifiers (DS fusion) was not successful at first, but after tuning of individual classifiers (tuned DS) the result was nearly as good as attribute level fusion. The tuning process consist of changing probabilities in singleattribute classifiers to force the classifier that is working on less informative attribute be less sure its decisions than classifier working on more informative attribute. The attribute level fusion outperforms decision level fusion not only in an accuracy, but also it is faster and it is easier to create fusing classifier by human or to train it. Therefore, we decided to perform attribute lever fusion in our system. In the third experiment the comparison between humanmade NB classifier and decision tree (DT) and their trained versions was performed. The training and testing processes were performed on the same data, because the aim was not to estimate trained classifier accuracy for the unseen data, but to find out how good is human-made classifier (common sense knowledge was used). The results are presented in Tab. IV. It can be observed (Tab. IV) that NB classifier created by human (human NBC) has identical error like its trained version. This shows, that it is not easy to improve common sense based classifier. On the other hand training can be reasonable for problems where common sense is not enough and expert is not available. It could be also observed, that human-made DT is better than its trained version and it is better than both

Table IV C ONFUSION MATRIX AND ERROR RATE ( IN %) FOR HAND MADE DT AND NBC AND THEIR TRAINED VERSIONS . classifier ver. human NBC trained NBC human DT trained DT

p. as p. 1775 1773 1765 1771

v. as v. 840 842 891 835

v. as p. 298 296 247 303

p. as v. 45 47 55 49

error[%] 11.60 11.60 10.21 11.90

Table V C ONFUSION MATRIX AND ERROR RATE ( IN %) FOR CLASSIFIER THAT FUSES INFORMATION FROM INTELLIGENT SENSORS .

classifier ver. NBC ped. default NBC veh. default DS fused NB+DT

p. as p. 1820 1616 1785

v. as v. 1032 1138 1072

v. as p. 106 0 66

p. as v. 0 204 35

error[%] 3.58 6.90 3.41

NB classifiers. The success of DT’s means that the categories are rectangularly separable in considered attribute space. We believe that the success of human-made DT is due to human readability of underlying model — it is much easier to transfer common sense knowledge to decision tree structure than to a set of probabilities for NBC. The width attribute is not used by both, trained and human made DT. The human’s made DT used only 5 binary tests, the trained one used 6 binary tests. In the next experiment an ability to fuse classes generated by intelligent sensors was checked. Let’s assume that we have two kinds of intelligent sensors: one of them is able to detect pedestrians, the second one is able to detect vehicles. So one sensor classifies observations into 2 classes: pedestrian, unknown, and the other into: vehicle, unknown. In considered scenario each type of sensor flags about 45% of observations so both types of sensors flag about 90% of observations (number of pedestrians is similar to number of vehicles). The remaining 10% of observations could be classified into one of 2 possible categories, so we expect that resulting classifier error would be much less than 10%. The fusion of two kinds of intelligent sensors is performed by human-made NB classifier. The results are provided in Tab. V. From that table we can observe that using majority category (pedestrians) for observations that were unclassified by the sensors gives much better results than leaving them unclassified or classifying them to minority category. The achieved result could be further improved by combining sensors NB classifier results with human made DT results by the use of decision level fusion. V. C ONCLUSIONS An application of classification algorithms for heterogeneous multisensor data fusion in an experimental fusion engine was presented. Experiments with a realistic simulator of urban scenario data were performed. The results show the improvement of the fused classification results with respect to the data given by the sensors at the system input. The results also shows that trained models are nearly as good as human-made, so

the training process could be used to build models for more complicated scenario. Typically, less accurate decision level fusion is used instead of more accurate feature level fusion because of computation cost. In this case attribute level fusion is faster and gives better results. In presented approach, the data fusion for extracting identity declaration uses the associations of objects obtained from tracker. Therefore, the classifier works after tracker and uses its output. A possibility to interleave tracker and classifier processing, so that classifier output serves as an additional input for the tracker, may be the subkect of further research. Such an approach is much more complicated, but it is expected to increase the tracker accuracy. ACKNOWLEDGEMENT This work was performed as a part of the DAFNE project, sponsored by the European Defence Agency under grant number A-0380-RT-GC. The project DAFNE as a whole is a joint effort of IDS Sp.A., Udine University, Fraunhofer IOSB, TNO, VOP-026 and Warsaw University of Technology. The authors would like to thank all the partners for the fruitful cooperation. R EFERENCES [1] S. Das, High-level data fusion. Artech House Publishers, 2008. [2] J. Llinas, “Revisiting the JDL data fusion model II,” DTIC Document, Tech. Rep., 2004. [3] D. Hall and J. Llinas, “An introduction to multisensor data fusion,” Proceedings of the IEEE, vol. 85, no. 1, pp. 6 –23, jan 1997. [4] H. Mitchell, Multi-sensor data fusion: an introduction. Springer, 2007. [5] R. Nowak, J. Misiurewicz, and R. Biedrzycki, “Automatic adaptation in classification algorithms fusing data from heterogeneous sensors,” in Information Fusion (FUSION), 2011 Proceedings of the 14th International Conference on, 2011, pp. 179 – 184. [6] J. MacQueen et al., “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 281-297. California, USA, 1967, p. 14. [7] J. Quinlan, “Induction of decision trees,” Machine learning, vol. 1, no. 1, pp. 81–106, 1986. [8] G. Shafer, A mathematical theory of evidence. Princeton university press Princeton, NJ, 1976, vol. 1. [9] P. Pohanka, J. Hrabovsky, and M. Fiedler, “Sensors simulation environment for sensor data fusion,” in Information Fusion (FUSION), 2011 Proceedings of the 14th International Conference on. IEEE, 2011, pp. 1–8. [10] T. Sartor, N. Negenborn, E. Michaelsen, and K. Jager, “Assessment procedure with specific roc curves for comparison of fusion engines,” in Information Fusion (FUSION), 2011 Proceedings of the 14th International Conference on. IEEE, 2011, pp. 1–7. [11] M. Ditzel, S. van den Broek, P. Hanckmann, and M. van Iersel, “Dafne– a distributed and adaptive fusion engine,” Hybrid Artificial Intelligent Systems, pp. 100–109, 2011. [12] I. Visentini and L. Snidaro, “Integration of contextual information for tracking refinement,” in Information Fusion (FUSION), 2011 Proceedings of the 14th International Conference on. IEEE, 2011, pp. 1–8. [13] R. Biedrzycki and R. Nowak, “Internet application architecture for data fusion,” Warsaw University of Technology, Tech. Rep., 2010. [14] M. McDowell, C. Fryar, C. Ogden, N. C. for Health Statistics (U.S.) Staff, N. Health, and N. E. S. U. Staff, Anthropometric Reference Data for Children and Adults: United States, 1988-1994, ser. DHHS publication. National Center for Health Statistics, 2009. [15] R. Knoblauch, M. Pietrucha, and M. Nitzburg, “Field studies of pedestrian walking speed and Start-Up time,” Transportation Research Record, vol. 1538, no. -1, pp. 27–38, 1996.

Machine learning methods in data fusion systems

Machine learning methods in data fusion systems

Suggest Documents

Machine learning methods for omics data integration

MACHINE LEARNING METHODS FOR MICROARRAY DATA ANALYSIS

Machine learning methods for omics data integration

Machine Learning Methods for Transcription Data Integration

Ranking Methods in Machine Learning

Ensemble Methods in Machine Learning

Big Data Systems Meet Machine Learning ...

Machine Learning Methods

Machine Learning and Data Mining Methods in ...

Fuzzy Methods in Machine Learning and Data Mining: Status and ...

Machine Learning and Data Mining Methods in Diabetes Research

Machine Learning for Systems and Systems for Machine Learning

Data Mining & Machine Learning

SPARSE MACHINE LEARNING METHODS FOR ...

SPARSE MACHINE LEARNING METHODS FOR ...

Machine Learning Methods for Quantitative

Spectral methods in machine learning and new

Kernel methods in machine learning - Kernel Machines

STA 414/2104 Statistical Methods for Machine Learning and Data ...

Machine learning methods: an application to by-catch data - Iccat

Using methods from the data-mining and machine-learning literature ...

1 data mining and machine learning methods for microarray analysis

Machine learning methods: an application to by-catch data - CiteSeerX

Applying GIS and Machine Learning Methods to Twitter Data ... - PLOS