a neural network for image-based vehicle detection - Carnegie Mellon ...

18 downloads 104527 Views 3MB Size Report
addition, the training capability of the neural network detection model permits ..... for is the sequential movement of a peak associated with the movement of a car.
0968-0901(193 s6.cm.l + .oo 0 1993 Pergamon Rcss Ltd.

Transpn. Rex.-C. Vol. 1. No. 3, pp. 235-247, 1993 Printed in Great Britain.

A NEURAL NETWORK FOR IMAGE-BASED VEHICLE DETECTION* DARCY BULLOCK Department of Civil Engineering, Louisiana State University, Baton Rouge, LA 70803, U.S.A. JAMESGARRETT,JR. Departmentof Civil Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, U.S.A. CHRIS HENDRICKSON Department of Civil Engineering,CarnegieMellon University, Pittsburgh, PA 15213, U.S.A. (Received 17 April 1992; in revisedform

17 December

1992)

Abstract-Vehicle detection on roadways is useful for a variety of traffk engineering applications from intersection signa control to transportation planning. Traditional detection methods have relied on mechanical or electrical devices placed on top of, or embedded in, pavements. These systems are relatively expensive to install, tend to be unreliable over time, and are limited in their capabilities. Considerable research has been conducted in the area of machine vision for Wide Area Vehicle Detections Systems (WADS). These systems have typically employed conventional image processing and pattern matching algorithms, and many installations have been sensitive to varying lighting conditions, camera perspective, and shadows. In addition, these systems have often required large amounts of computing resources. This paper reports on the development of a new image based vehicle detection system that is based on a simple back propagatiorufeedforward neural network for tracking vehicles. Application of this concept in a field system is discussed and preliminary results are presented. These results suggest that the neural network vehicle tracking model can be used to reliably detect vehicles. In addition, the training capability of the neural network detection model permits the system to adapt to variations in lighting and camera placement. This should lead to simplified installation and maintenance of WADS.

1. INTRODUCTJON

Over the past several years, vision-based wide area traffic detection systems (WADS) have received considerable attention. These systems can collect data on many different lanes of traffic and can be rapidly installed. This interest is spurred by the decrease in cost and improved reliability of video cameras and related processing hardware. Also, video-based systems-may have additional uses, such as measuring queue length or identifying traffic-disrupting incidents, that are not practical with loop detectors. A comprehensive review of existing video image processing systems for roadway applications appears in Hockaday (1991) and Hockaday, Chatziioanou, Nodder and Kuhtenschmidt (1992). This study evaluated eight different video systems on 28 different traffic data sets. Costs of typical commercial systems were expected to be less than $10,000 for quantity orders. Hockaday et al. referred to the detection techniques used in the evaluation system as either “tracking” or “tripwire” models. The tripwire models were the simplest and detected vehicles by watching for “significant” deviation in light intensity along a narrow band transverse to the direction of travel (Fig. 1). Under good lighting conditions this technique works quite well. However, shadows from vehicles in adjacent lanes can cause false positive detections, particularly during morning and afternoon periods. Different tripwire thresholding techniques are used to detect vehicles during night and day operations, but during dawn and dusk, the transition between these two techniques tends to degrade performance. Estimates of vehicle speed can be derived by constructing a speed trap from two adjacent tripwires. Detection systems using tracking models attempt to locate a vehicle in a detection zone by correlating generalized vehicle templates to digital images of the detection zone (Fig. 2). A time series of template locations can be used to count and estimate the speed of vehicles passing through the detection zone. This technology is based upon classical image processing techniques, and can be extremely computationally intensive. These systems generally estimate *An earlier draft of this paper was presented at the ‘*Application of Artificial Intelligence Engineering” Conference, Engineering Foundation, San Buenaventura, CA, June 1992. 235

in Transportation

D. BuLL~~K,J. GARRETT,JR., and C. HENDRICKSON

236

04

(4

08

(d)

Fig. 1. Tripwire detection model.

Fig. 2. Tracking detection model.

speeds more accurately than speed traps constructed using adjacent tripwires. However, this technique frequently fails to detect and count vehicles that deviate from the generalized vehicle templates. This paper describes a new tracking model for detecting vehicles that is based on a neural network. This technique does not rely on the development or calibration of rigid templates. Instead, it learns to recognize vehicle shapes by watching a human locate several example vehicles. This provides a significant benefit in comparison to classical image processing, since the neural network can adapt to different camera perspectives, lighting conditions, and so on, if given examples of these variations (Bullock, Garrett and Hendrickson, 1991). Use of neural networks for this case is motivated in part by the problem of autonomous vehicle navigation in unstructured conditions. This domain has similar machine vision problems, and neural networks have demonstrated better performance than classical machine vision techniques. In a recent demonstration, a world record was set when a truck was steered along a multi-lane highway at 50 mph for over 20 miles using a neural network (Pomerleau, 1992). In this demonstration, the network demonstrated robust performance under varying lighting conditions, changes in pavement textures, variations in lane stripings, and occlusions due to adjacent vehicles. Some of the techniques developed in the autonomous navigation work have been adapted to the vehicle detection system presented in this paper. In the following section, a neural network model for video-based vehicle detection is outlined. Subsequent sections report on the basic system architecture, the neural network image processing model, the training process, and validation experiments on the proposed model. 2. DETECTOR MODEL

2 .I. Architecture The architecture of the neural network video detection system is shown in Fig. 3, and resembles the typical hardware configuration used in the systems evaluated by Hockaday (1991). In this system, a surveillance camera is used to observe a traffic scene. The camera produces an analog video signal that is converted to a digitized image suitable for computerbased manipulation. This digitized image is called a pixel map, since it discretizes the video image into a finite number of picture elements (called pixels), and assigns one of 256 numerical values in proportion to their gray-scale intensities (black = 0, white = 255). A detection zone within a traffic lane is specified within the video image by an operator. Multiple detection zones can be accommodated by developing separate neural networks for each detection zone. The detector model is applied to the image within the detector zone to yield (a) a presence output indicating that a vehicle is within the detection zone; and (b) a pulse output indicating a vehicle has passed through the detection zone. A presence output is commonly used for detecting queue formation, and a pulse output is commonly used for incrementing a vehicle counter.

A neural network for image-based vehicle detection

231

Fig. 3. Image-based vehicle detection system architecture

2.2. Neural network tracking model The general system architecture shown in Fig. 3 is common to nearly all image-based

vehicle detection systems. The distinguishing characteristics of the various detection systems is the detector model that “observes” the detection zone and produces the presence and pulse outputs. The unique detection model proposed in this paper is diagrammed in Fig. 4. The input to the detector model is a pixel map of the zone. An example pixel map representing the detection zone is illustrated in Fig. 5. The resolution of this pixel map is much finer than required for the purpose of vehicle detection. For use by the neural network, a coarser pixel map is built by combining individual pixel values into area “tiles,” as illustrated in Fig. 6. Each tile is represented by one value indicating the average gray scale intensity of a corresponding area in the detection zone. For example, the value corresponding to Tile 1 is computed by calculating the average of the a x 6 tile in the lower right comer of the detection zone, where a and b determine the dimensions of each tile (Fig. 4). In practice, the average gray scale value for a tile can be estimated using a small random sample of the pixels in the a X b region. The results of this tiling operation can be visually illustrated by comparing the tine-resolution pixel map shown in Fig. 5 with the tiled map shown in Fig. 6. The tiled image clearly depicts the DetectorModel

Fig. 4. Open-loop operation of detector model.

D. BULLOCK, J.

238

GARRETT,

JR., and C. HENDRICKSON

vehicle, but the number of pixels required to represent such an image is reduced by a factor of (a X b). For example, a and b were both 5 for the tiled image (Fig. 6) constructed from the original image (Fig. 5). The image in Fig. 5 required approximately 7250 pixels; the image in Fig. 6 required approximately 290 pixels. The tiled image is represented by a vector of length n (where n is the total number of tiles in the tiled image), and input into the neural network tracking model. This neural network tracking model is the new concept described and illustrated in this paper. In contrast to conventional image processing techniques that are more complex and difficult to tune, the neural network provides the capability of learning to track vehicles by first observing a human specifying the location of a few vehicles in sample images. The details of this training process are given in the next section. From the neural network, an output vector for every tiled input image is computed, and that output vector will contain a piecework approximation of a bell-shaped curve (Fig. 4). The range of the output values in this vector will be [O,l]. Furthermore, the peak of the bell-shaped curve will roughly correspond to the centroid of a vehicle observed in the tiled detection zone. Of course, if no vehicle is observed within the detection zone, all the values in the output vector would be zeros. During a typical detection sequence, the vehicle will enter the detection zone and first appear at the bottom of the extracted detection zone image (Fig. 4). When the vehicle is in that location, the neural network would compute an output vector with the peak of the bell-shaped curve centered about the first output unit. An example time series of output vectors is shown in Table 1. As the vehicle progresses through the detection zone, the peak of the bell-shaped curve moves from unit 1 to unit 9. An important decision in developing this model was designing a method for the neural network to indicate the location of a vehicle in the detection zone. Several alternatives were considered. First, a single analog value could be used to indicate the relative position of a vehicle in the detection zone. As the vehicle enters the zone, the analog value might increase from 0.0 to 0.1. As the vehicle moves toward the top of the detection zone, the single analog value might approach 0.9 or so. Finally, when the vehicle exits the detection zone, the analog output would have to return to zero. The problem with this technique is that the boundary case of a vehicle exiting the detection zone requires a step change of 1.O to 0.0 on the output unit. Furthermore, there is no provision for handling congested scenes, when multiple vehicles might appear within the detection zone. Another technique considered was the use of several sequential binary outputs indicating whether a vehicle is in or not in a particular segment of the image. However, data representations investigations have reported that training neural networks to produce these binary outputs is difficult in the presence of noise (Hancock, 1989). Consequently, a redundant representation was used to enhance the sequential binary output representation. This representation depicts a discretized bell-shaped curve, where the peak of the curve indicates the location of a vehicle. The tracking model shown in Fig. 4 shows these m outputs. In the implementation used for this research, there was no limit on how large m could be, but m could be no smaller than three. In practice, there is little reason to choose an m much larger than 20. The evaluation section of this paper discusses the issues in selecting an optimal or near optimal value for m. The neural network mathematical model used in this research is based on a particular class of networks referred to as feedforward/backpropagation networks. The mathematical model for this network is well documented in Rumelhart and McClelland (1986). A network is constructed Table I. Example of neural network tracking model output

Unit 1

Unit 2

Unit 3

Unit 4

Unit 5

Unit 6

Unit I

Unit 8

Unit 9

‘2 13 ‘4 ‘5

0.4 0.7 0.4 0.0 0.0

0.2 0.6 0.9 0.2 0.0 0.0 0.0 0.0

0.1 0.1 0.5 0.8 0.1 0.0 0.0 0.0

0.0 0.0 0.0 0.6 0.7 0.1 0.0 0.0

0.0 0.0 0.0 0.1 0.7 0.6 0.0 0.0

0.0 0.0 0.0 0.0 0.1 0.8 0.2 0.0

0.0 0.0 0.0 0.0 0.0 0.3 0.9 0.0

0.0 0.0 0.0 0.0 0.0 0.0 0.8 0.4

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8

‘1

‘6

0.0

4

0.0

‘8

0.0

A neural network

Fig. 5. Fine-resolution

detection

for image-based

239

vehicle detection

Fig. 6. Tiled detection

zone image.

zone image.

by assembling a directed graph of simple processing units (Fig. 7). A diagram of a three-layer neural network is shown in Fig. 4 as the “neural network tracking model.” The network is composed of n input processing units, p hidden processing units, and m output processing units. Each processing unit can receive inputs, transmit outputs, and perform calculations. The input processing units are the receptors for the tiled detection zone pixel map. Each tile Tile, in the tiled input image is represented as a vector of values in the range [0 . . . 2551. Each input processing unit emits a normalized output value: ri = (Tile, -

128)/128.0,

Vi = 1, . . . , n,

Fig. 7. Neural network tracking

model.

(1)

240

D. BULLOCK, J. GARRETT, JR., and C. HENDRICKSON

SO that ti fully covers the range [ - 1 . . I]. Each output of the input processing units is connected to an input receptor on all the hidden processing units. These processing units also emit a normalized output value:

(2) where each w~,~is a gain assigned to the output $ passing through the connection from input processing unit j to hidden processing unit k, & is a bias assigned to hidden processing unit k, and&) is the hyperbolic tangent function: exp(x) - exp(-x)

f(x) = exp(x) + exp(-x)’ Other output filter functions could be used for this normalization, such as the sigmoid function. However, the tanh function has been found to be effective in similar image processing applications, since the output is symmetric about zero (Pomerleau, 1992). Similarly, the output from each of the hidden processing units is transmitted via a connection to an input receptor on all the output processing units. These processing units also emit a normalized output value:

where each We,/is a gain assigned to the output h, passing through the connection from hidden processing unit k to output processing unit 1. fII is a bias assigned to output processing unit 1, and&) is the hyperbolic tangent function. For the network to be useful, a proper set of connection weights (Wj,k,w&, and biases (&. PI) must be estimated. These weights and biases will provide the desired mapping between a vehicle location described by t,, . . . . , t,, and the Gaussian-shaped output described by 01, . . . , 0,. The development of these connection weights and biases is accomplished by a procedure called “network training,” described in the next section. Once the network has been trained, it provides a mapping from a tiled pixel map to a Gaussian surface similar to that shown in Fig. 8. 2.3. Neural network output jilter The vector received by the output filter from the neural network output processing units contains m values. The values in the vector represent a rough approximation to a discretized Gaussian-shaped curve (plus some noise). The output filter is responsible for producing two signals: (a) a presence output indicating that a vehicle is within the detection zone; and (b) a pulse output indicating that a vehicle has passed through the detection zone. The presence output (Fig. 3) is intended to correspond to the presence contact on an ordinary loop detector. Whenever a vehicle is within the zone it should be “on.” If no vehicle is within the zone, it should be “off.” In the output filter (Fig. 4) developed in this work, the presence output is active whenever at least one of the output units has an activation exceeding a threshold value, typically 0.6. Similarly, the pulse output (Fig. 3) is intended to emulate the pulse contact of a conventional loop detector. This requires the output filter to watch a time series of neural network output vectors (Table 1) to determine when the peak has completely traversed the detection zone. Once that criterion has been met, the output filter turns the pulse output on for a period of 125 milliseconds and then turns it back off. The implementation of the presence logic is straightforward. However, the pulse output requires the output filter to monitor a time series of output unit activations and perform a matched filtering operation to detect when a vehicle has passed through the detection zone. The matched filtering operation performed by the output filter must recognize an infinite family of

A neural network for image-based vehicle detection

Fig. 8. Idealized Gaussian-shaped

241

surface.

Gaussian-shaped curves (acceleration = 0) and skewed Gaussian-shaped curves (acceleration # 0). To avoid the overhead associated with traditional signal-processing methods, a heuristic method is used by the output filter to process the time series of neural network outputs, and to activate the pulse output after a particular pattern has been recognized. The distinctive pattern searched for is the sequential movement of a peak associated with the movement of a car through the video image. Thus, the pattern the output filter is looking for is the sequence of output vectors for which peaks in individual output vectors occur at output units in a sequential order. In other words, the first output vector has no peaks (over a threshold of say 0.6), the second output vector has a peak at unit 1, the third output vector has a peak at unit 2, the fourth output vector has a peak at unit 3, and so on. An example of this progression was shown in Table 1. When the filter observes the sequential propagation of the peak from the fifit output unit to the last output unit, the filter produces a pulse to signify the passage of a vehicle. Tbe output filter is simply a message-passing system of the form shown in Fig. 9. As long as the peaks continue to move from left to right, the filter will consider this sequence of output vectors as a

Fig. 9. Message passing model.

242

D. BULLOCK, J.GARRETT, JR., and C. HENDRICKSON

vehicle passing through the image. Timeout logic must also be incorporated sudden lane changes into or out of the detection zone.

3. TRAINING

THE DETECTOR

to recover from

MODEL

The capability of the neural network to learn to track vehicles by first having a human show it several example images and corresponding location vectors is an extremely important feature of this approach. This training process consists of two distinct steps: constructing the training set, and adjusting the connection weights and biases so that the neural network learns to map tiled images into location vectors. 3.1. Training set construction The training set is constructed by collecting several dozen detection zone snapshots, and using a graphical window that permits the operator to identify those snapshots having vehicles in them and specify where the vehicles are located in those images. Figure 11 shows an example of several images used in a training session. The tool used to specify the location of vehicles in these images is shown in Fig. 10. This tool simplifies the task of creating a training set by permitting the operator to view au image and position a slider bar along the right side of the tool, for which the position roughly corresponds to the center of the vehicle. Once the position of the car is located, the operator clicks on the “OK” button to save the location and view the next training image. The slider bar position produced for each training image is stored as a value between 0 and 100. A 0 corresponds to a vehicle just entering the detection zone, and a 100 corresponds to a vehicle just exiting the detection zone. A location vector of the proper length required to train the neural network can be calculated using the following equation for a modified Gaussian-shaped curved, centered at C: -((lOO/m)i

Oi =

exp I

(

- C)2

loo

_

;i=l,...,m >

(3

I

Once the location of the vehicle has been specified in all the example training images, a training data set can be assembled. This data set is composed of both the tiled image vector (Fig. 4, Neural Network Tracking Model Input) and a location vector (Fig. 4, Neural Network Tracking Model Output). This data is used to teach the network how to map the tiled image

Fig. 10. Training

tool

A neural network for image-based vehicle detection

prim

a0712.pnn

a0713.pnn

243

a0703. pnn

a0709.pnm

aO7lO.pnn

aO7ll.pnm

a0714.pnn

a0715.pnm

a0716.pnm

Fig. 11. Example training images.

vector to the location vector. In order for this training to generalize well for open loop operation, a relatively large data set is required for the training process. However, manually classifying more than a few hundred images would require a considerable amount of time and be very tedious. To provide more training data without increasing the operators burden, a technique was developed to leverage additional training data from a single training sample. This was done by over-sampling the detection zone (Fig. 12). The operator is still only shown one example image (the area inside the dotted lines in Fig. 12), but after the operator specifies the vehicle location, an algorithm shifts the detection window around the oversampled image in small increments. This provides tiled image vectors representative of vehicles straying from the center of the detection zone and at other longitudinal locations (Fig. 13). Since these various window locations are just linear transformations of the original image, new location vectors can be computed quite easily by applying the same linear transformation to the original location vector. For example, if a detection zone was oversampled by 10 pixels on each side and 50 pixels on each end, and the training window was shifted in increments of 5 pixels, a single sampled image could be used to generate 105 [(lo + 1 + 10) * (2 + 1 + 2)] example training sets. Using this technique not only reduces the time required to construct training sets, but also improves the breadth of samples used to train the network. For example, training cases for vehicles driving near the shoulder were easily incorporated in the training set, but would have been difficult to find in random samples. 3.2. Neural network training method The training technique used to establish the connection weights (Wj,kpw,J and biases (Pa, &) is a closed loop, gradient decent weight modification procedure known as the Generalized Delta Rule (GDR) (Rumelhart and McClelland, 1986). The weight acquisition method (training) provides a technique for iteratively modifying the gains and biases so that the neural network learns the general mapping between tiled image vectors and location vectors. This

244

D. BULLOCK,~. Chwmt,J~.,

and C. HENDRICKSON

seen by detector model

Fig. 12. Sampled image.

L

-

7

Fig. 13. Sample transformations.

closed-loop training procedure (GDR) (Rumelhart and McClelland, 1986) is performed using hundreds of classified pixel maps to teach the neural network to produce the Gaussian-shaped curve for varying lighting conditions, vehicle shapes, and vehicle locations. This training process is performed by first using the neural network to compute an output vector o given a tiled detection zone pixel map as input. Then an error is computed by comparing the neural network-generated vector to the operator-generated output vector 0’. The computed error is then backpropagated through the network, where the backpropagated error is used to modify the connection weights by adding the value of Awij to wiJ. The following function, called the delta rule (Rumelhart and McClelland, 1986) is used to compute Awij.

where q is a learning parameter; oi is the output of processing unit i; and sj is the backpropagated error assigned to processing unitj. This training process continues until the network can reliably predict the proper location of a vehicle in the example images. Once this training is complete, the system is ready to be used in an open-loop manner on live data. 4. EXPERIMENTAL VALIDATION

Model validation experiments have been conducted off line by sampling and digitizing images from a videotape of traffic scenes. From this tape, two different data sets were assembled: one set for training and one set for testing. The training tool (Fig. 10) was used to identify vehicle locations, and a commercial neural network simulation program by NeuralwareTN was used to perform neural network training and recall operations. 4. I. Criteria for petformance evaluation Several quantitative measures have been examined to determine when a network is adequately trained. These measurements include training epochs, root mean squared error, sum squared error, and a shift error. In general, the number of training epochs required for past training sets is an unreliable predictor of how many training epochs a new data set and neural network will require to perform satisfactorily. Instead, a closed-loop measure of how well the neural network is capable of reproducing the training outputs is of more use. Two measures commonly used for this purpose are the root mean squared error and the sum of squared error. These error measures indicate how precisely the neural network reproduces the output vectors in the training data. For example, Fig. 14 shows a network perfectly reproducing the training

A neural network for image-based vehicle detection

Fig. 14. Output vector with no error.

245

Fig. 15. Typical output vector.

data. However, variations in vehicle size and contrast are likely to introduce a certain amount of variability in where a network predicts a vehicle to be. For example, Fig. 15 shows a network producing a noisy Gaussian curve shifted slightly to the right of where the test data set located the vehicle. This relatively small shift is not that significant, but would produce a relatively large sum squared or root mean squared error. A less extreme example of problems with these error measurements can be illustrated by imagining a certain amount of noise superimposed on the neural network outputs in Fig. 14, but with the peak still located properly. Under those conditions, the neural network would precisely identify the location of the vehicle, but the error measurements would not reflect this (Hancock, 1989). In view of these shortcomings, a more suitable error measure called a “shift error” was devised to determine if a network is suitably trained. This shift error is measured as the average absolute difference between where the manual specified (Fig. 10) Gaussian peak is located (Fig. 15, arrow pointing down) and where the unit with the largest activation is located. This measure can be used to evaluate a trained network. Typically this measure should be less than some set value (say within one output unit). This shift error is important to measure, since the tracking model outputs should not “chatter” as a vehicle moves through the detection zone. 4.2. Network architecture For detection zones roughly five vehicle lengths long and one lane wide, the network architecture that has been found to work quite well has 300 input units and 9 output units. These parameters have been chosen as a result of a limited parameter study. In general, the ability of the neural network tracking model has not been significantly affected by variations in the number of input units. However, increasing the number of outputs from four to nine has substantially improved performance, since the peak of the bell-shaped curve tended to disappear during transitions between output units on networks having only four outputs. With nine output units, the outputs are more densely packed and provide a finer representation of the bell-shaped curve. An example taken from real test data (images the network was not trained on) illustrates how the Gaussian-shaped hill propagates across the outputs as the vehicle moves through the detection zone (Fig. 16). The optimal number of hidden units per layer is still under investigation. Simply increasing the number of hidden units in one layer has not been as effective as adding additional layers. The data shown in Fig. 16 was produced by a network with 300 inputs, 100 units in hidden layer 1, 50 units in hidden layer 2, 10 units in hidden layer 3, and 9 output units (300-100-50-10-9). Using only a single hidden layer, the neural network correctly located the peak of the bellshaped curve for each of the six images, but the amplitude was approximately 50% shorter, regardless of how long it was trained. 4.3. Tracking accuracy

To date, the performance of the neural network tracking model has only been evaluated under moderately favorable lighting conditions. Several evaluations of the discrete samples have shown that the neural network consistently locates vehicles to within one output unit 90% of the time. However, a time series like that shown in Fig. 16 is a much better evaluation of the proposed system than is a series of random images. Due to the large data storage requirements associated with manipulating digital images, only one limited time series test has been performed. In that test, 200 random images and 200 sequential images were extracted from different segments of traffic scene video tape. From the first 200 random images, a 3200 image

246

D.

BULLOCK,

J.

GARRETT,

JR., and C. HENDRICKSON

iIi-_~~~~~~

-....... . ..._.._.................................. ~~ Fig. 16. Example detection sequence.

training set was constructed using the technique depicted in Fig. 13. After training the network for 100,000 cycles, the root mean squared error was 0.35 and the average shift error was half the distance between output units. After the training was complete, the 200 sequential images were shown to the network to determine how accurately the neural network model could track vehicles it had never seen before. The root mean squared error was 0.78 and the average shift error was slightly more than half the distance between output units for this new data. The neural network tracking model correctly identified each of the 13 vehicles that passed through the detection zone in those 200 frames. This data is summarized in Table 2. The sequence shown in Fig. 16 was taken from that test data. Table 2. Training and testing data

RMS error Average absolute shift error (units) Data set size (images)

Training Images

Testing Images

0.35 0.51 3200

0.78 0.53 200

A neural network 5. CONCLUSIONS

for image-based

vehicle detection

241

AND RECOMMENDATIONS

This paper has described a new technology for constructing image-based vehicle detection systems, and presents some preliminary results that demonstrate the viability of this method. In comparison to other image-based vehicle detection systems, the system offers the following benefits: The detection model can be calibrated by an operator “showing” the system several example images containing vehicles and identifying their location. This eliminates the need for a field operator to have a deep understanding of image processing. Neural networks may perform better than traditional image processing techniques in unstructured outdoor lighting conditions typically encountered by image-based vehicle detection systems. However, additional research is still required to evaluate extended field operations quantitatively, reduce the training effort, and improve the reliability of the neural network tracking model. Some possible research issues include: (a) development of guidelines for aggregating or tiling images, (b) using the network to integrate multiple sensors, (c) adding additional output layers to permit the classification of vehicles by size, and (d) implementing the neural network processing and matched filtering on a single board computer for field testing and comparison with other image-based vehicle detection systems. l

l

Acknowledgements-The simulator used in this work was the Professional IL/Plus Environment from Neuralware. Some of the testing of this approach was performed by participants in the Carnegie Mellon CAST program sponsored by the National Science Foundation. This group included: Nenad Ivezic, Annie Pearce, Adam Matter, Unhyi Choi, Ronald Thompson, Fei-Wen Chuang, Jeremy Cohen, Peter Shiner, and Suresh Bhavnani.

REFERENCES Bullock D., Garret J., and Hendrickson C. (1991) A prototype neural network for vehicle detection. In C. Dagli (Ed.), Application of Neural Networks in Engineering, A.S.M.E. Hancock P. J. B. (1989) Data representation in neural networks: An empirical study. In D. Touretzky, G. Hinton and T. Sejnowski @is.), Proceedings of 1988 Connectionist Models Summer School, M. Kaufmann, San Mateo, CA, 1989, pp. 1I-20. Hockaday S. (1991) Evaluation of image processing technology for application in highway operations. Technical Report TR 91-2, California Polvtechnic State University, San Luis Obisno, CA. Hockaday S., Chatziioanou A., Nodder R., and Kuhtensch-midt S. (1992) Evaluation and comparison of video image processing systems for traffic detection. Transportation Research Board Preprint #920744. Pomerleau D. A. (1992) Neural network perception for mobile robot guidance. Technical Report CMU-CS-92-115, Carnegie Mellon University, School of Computer Science. Rumelhart D. and McClelland J. (1986) Parallel Distributed Processing, Vol. I, The MIT Press, Cambridge, MA.

Suggest Documents