The survey conducted by the Farmers Insurance Group of Companies ... The statistics gathered by the Insurance Institute for Highway Safety (IIHS) shows red.
Security Applications using Computer Vision Sreela Sasi, Ph. D. Professor Department of Computer and Information Science Gannon University, Erie, PA 16541, USA ABSTRACT Computer vision plays a significant role in a wide range of “Homeland Security Applications”. The Homeland Security Applications include Port security – cargo inspection; Facility security – Embassy, Power plant, Bank; and Surveillance – military or civilian, etc.; Video surveillance cameras are placed in offices, hospitals, banks, ports, parking lots, parks, stadiums, malls, train stations, airports, etc. The challenge is not for acquiring surveillance data from these video cameras, but for identifying what are valuable, what can be ignored, and what demands immediate attention. Computer vision systems attempt to construct meaningful and explicit descriptions of the environment or scene captured in an image. Few Computer Vision based security applications are presented here for securing building facility, rail road (Objects on rail road, and red signal detection) and roads.
I.
INTRODUCTION
1. Securing Building Facility Homeland security functions focus on intelligence and warning, protecting critical infrastructure and domestic counterterrorism. Biometric is a reliable way to authenticate the identity of a living person based on the physiological or behavioral characteristics. Gait of a person is a non-invasive biometric that can be used for recognition at a greater distance without the knowledge or cooperation of the person being recognized. Body weight, limb length, habitual posture, bone structure, and age influence the gait of a person. It has applications in visual surveillance, aware-spaces, and intelligent human-computer interfaces. These factors give each person a distinctive gait, which can be used as a biometric. The nonlinear characteristics associated with gait pose a major challenge for research in this area. In this research, three methods are devised and evaluated for performance for the recognition of static postures in gait by combining Hidden Markov model with Visual Hull technique by Gomatam A.M., & Sasi S. (2004), Stereovision with 3D Template Matching by Gomatam A.M., & Sasi S. (2004) and Isoluminance lines with 3D Template Matching (TM) by Gomatam, A. M. & Sasi. S. (2005). These methods were tested on silhouettes of different person that are extracted from Carnegie Melon University’s Motion of Body (MoBo) database (2004) and performances were compared.
2. Rail road and Road Safety (a) Securing Rail Road Rail accidents pose a major threat in terms of lives and cost. The widespread of concerns for the nation’s railroad have grown direr since the attacks of September 11, 2001. While the Federal Government has implemented extensive safety and security measures in the aviation industry, it has left
railroad security entirely up to rail corporations. Statistics collected by Operation Lifesaver show that fifty percent of the rail accidents occur at rail crossings equipped with flashing lights, barrier gates and warning bells. Though railroad crashes are rare, when they do occur, lead to massively destructive and deadly railroad crashes. According to the National Transportation Safety 60% of all crossing fatalities occur at unprotected crossings, and approximately 80% of all public railroad crossings are not protected by lights and safety gates. Collisions with other trains, derailment, and collisions with passenger vehicles are the common types of rail road accidents. Here are some statistics from CNN News (2005) regarding the railroad accidents: • Every 90 minutes there is a train collision or derailment • A train carrying hazardous material goes off the tracks approximately every 2 weeks in United States • More than 50% of all railroad accidents occur at unprotected crossings
the
These kinds of massive destructions could occur due to various reasons. Train accidents are caused due to mechanical failures, communication failures, railroad crossings littered with debris or a simple human error by an individual employee such as a locomotive engineer, driver, train conductor, rail inspector or railroad maintenance mechanic. The people responsible for the smooth operation of current rail system are listed below: • • • • •
Locomotive Engineer: Controls the locomotive Driver: Assists Locomotive engineer Rail Inspector: Inspects signals and track wiring Train Conductor: Deals with emergency situations Railroad Maintenance Mechanic: Repairs damaged tracks
There is a horn that can be honked to warn any vehicles on the rail track. But any common human error can lead to a train crash. There is no automated warning signal to alert the locomotive engineer about a possible threatening object or vehicle on the rail track in the current locomotive engine cabins. Whenever a train accident occurs, there will be serious personal injuries and extensive economical loss. The economical loss is due to both damaged property and the huge compensation paid to the victims. Lately, the locomotives are equipped with a camera and a microphone as investigation aids. These are mounted in the locomotive engine cabin and a live video and audio are recorded continuously from the perspective of a locomotive engineer while the train is moving. Apart from this, they can aid in the recording of gate crossing incidents, near misses, or other operating incidents. Digital video recordings provide clear and detailed evidence that are more reliable than an eye witness’s accounts in case of accidents. Also, digital recordings are legally admissible in court if needed. These video recordings can also be used for monitoring obstacles present on rail road using digital image processing techniques such as path tracking, edge detection, object recognition, and red signal detection. (i)
Object Detection on Rail roads
An automated computer “Vision-based Real-time Smart system to Prevent Railroad Accidents (VRSPRA)” that analyses individual frames in the video stream and generates a warning signal for the locomotive driver is presented. Abilash Sanam who was a graduate student at Gannon University in 2004 has done this research and devised an application for locomotive industry to save human life as well as major economic loss caused by the rail accidents.
(ii)
Tracking Red Signal Lights near Rail roads
Color of signal lights and position of signal poles is a major concern when designing an automatic signal detection system. The investigation of accidents completely depends on the unadulterated information gathered at an accident zone. Limited information available from the accident zone, such as eyewitness and physical evidence, causes several problems for investigations. These investigations indirectly affect the organizations which are depending on them. Generally, this kind of accident occurs at rail-road crossings, intersections of the road, and highway crossings. According to the investigators from the National Transportation Safety Board, an accident between an Amtrak train and a tractor-trailer outside of Chicago caused the death of at least 11 people. Both the locomotive engineer and the truck driver have given totally different accounts of what had happened prior to the crash (Train Accident Report, 2004). According to the US Department of Transportation, human fault causes 70 to 80 percent of the transportation accidents. Human fatigue is also playing an important role in these accidents which are not understood by normal investigations. This kind of accidents causes much loss to the organizations, both economically and on reputation. The examples of human fatigue that caused losses according to Rail Accident Report (2004) and Rail Employee Fatigue (Amtrak 2004) report. • A $52 million judgment was awarded against Conrail in an accident involving a railroad worker who was crushed by a train controlled by a sleep deprived driver on double shifts. • A $4 million judgment was awarded against Nabors Drilling for an accident where an employee driving home after working long hours fell asleep behind the wheel. The survey conducted by the Farmers Insurance Group of Companies showed that more than 36% of motorists admitted to running a red light in the past year, which is one of the leading causes of crashes in urban areas. The statistics gathered by the Insurance Institute for Highway Safety (IIHS) shows red light running crashes cause nearly 1,000 deaths and more than 200, 000 injuries each year. The main reason is due to the automobile driver failing to stop at the red signal, and runs over other road users. This happens when the driver fails to see that the signal is red due to his negligence after being drunk, or if the signal is invisible because of the bad climatic conditions. A Vehicle Mounted Recording Systems (VMRS) can be used to provide evidence for organizations such as law enforcement, insurance agencies, and transportation. This recording system continuously monitors and records all the events. In the case of any accident these event recorded videos are used as evidence for an investigation. These recorded videos will provide complete information about an accident with all the causes behind it unaltered. The insurance industry will benefit from vehicle Mounted Video Recording Systems since they provide accurate evidence against the fraudulent claim losses as mentioned by Trax (2004) and National Transportation Safety Board (2004). Though VMRS records continuously the events such as speed, time, location, transmission position, and heading direction of the vehicle etc.; the position of signal lights and the color associated with these lights are not automatically detected by the current system. A computer vision-based system is designed and implemented to track the position and color of signal lights. The position and the color of the signal are written to a log file that can be used as concrete evidence while investigating the causes of accidents. The “Color-based Signal light Tracking in Real-time Video (CSTRV)” system is an intelligent system using La*b* color model in combination with contour tracking by Yelal et al. (2006). This method analyzes each frame of the video sequentially, and detects the presence of signal lights and its color.
(b) Ensuring Road Safety Red light running is a leading cause of urban crashes that often results in injury and death. Total road deaths in USA for the year 2004 were 42,636. A survey conducted during 1999-2000 revealed that 20% of vehicles involved in road accidents did not obey the signal. Each year “red” light running causes nearly 200,000 accidents resulting in above 800 deaths and 180,000 injuries according to Drive and Safety Alive, Inc. (2006) and Department for Transport - UK (2004). Signal lights on the road intersection are for controlling traffic. Some people do not abide by the traffic rules and cross the intersection when the signal light is ‘red’. In order to reduce the accident rate at the intersections, busy and accident prone intersections should be monitored. The authorities may not be able to monitor all the intersections continuously round the clock on all days. This demands a cost effective and automated system for continuously monitoring all the intersections, and to penalize the people who would violate the traffic rules. Automatic License Plate Recognition (ALPR) systems have been developed and discussed in Wikipedia for Number Plate (2005), Motorola Solutions for Government (2005), CCTV Info – UK (2005), and License Plate recognition (2005). In ALPR systems for monitoring intersections, a still camera is placed adjacent to the signal lights for capturing the license plate of the car at the intersection. Sensors are located on the road to detect the presence of a vehicle at the intersection. When the signal light is ‘red’ and the sensors are active then a still photograph is taken which is used for issuing penalty ticket by the law enforcement authorities. The camera is equipped with a bright flash light for helping to capture quality image, and for cautioning the driver for his/her violation. The ALPR system is not a foolproof system because the license plate can be tampered or the plate might be stolen from another car or the license plate would not be visible due to bad weather conditions, or the sensors on the road might be tampered. In this research an expert system that would capture the Vehicle Identification Number (VIN) and the License plate of the vehicle crossing the intersection on ‘red’ signal is presented. Using the VIN, it is possible to find the owner of the car, insurance details and the car facts report. No sensors are needed for this Vision-based Monitoring System for Detecting Red signal crossing (VMSDR) that captures and recognizes both VIN and License plate of the vehicle running ‘red’ signal light at the intersections presented by the research conducted by Sharma, R. and Sasi, S. (2007). VMSDR system needs two video cameras out of which one is placed on the sidewalk and the other is placed on the pole above the intersection adjacent to the signal lights. The video camera placed on the side walk captures the license plate and the video camera placed on the pole along with the ‘signal light’ captures the VIN. This research is intended to provide a support system for the law enforcement agencies to proactively ensure that intersections are engineered to discourage red light running.
II.
ARCHITECTURE AND SIMULATION RESULTS
1. Securing Building Facility As a newly emergent biometric, gait recognition aims at discriminating individuals by the way they walk. Gait has the advantages of being non-invasive and difficult to conceal, and is also the only perceivable biometric at a distance. The need for automated person identification is growing in many applications such as surveillance, access control and smart interfaces. It is well known that biometrics is a powerful tool for reliable and automated person identification system. In the study of gait, recognition is performed with a technique called an activity-specific static biometric. The advantage of measuring a static property is that it is amenable to being done from multiple viewpoints. In this research, recognition of gait characteristics is performed using static postures. Here, three different techniques are used to identify the gait characteristic from the given static postures. They are “Enhanced Gait Recognition Using HMM and Visual Hull Techniques,” “Multimodal Gait Recognition based on Stereo Vision and 3D Template Matching,” and “Gait Recognition Based on Isoluminance Line and 3D Template matching.”
(i)
Gait Recognition using Hidden Markov Model and Visual Hull Technique
A combination of Hidden Markov Model (HMM) and Visual Hull (VH) techniques is utilized to recognize the gait characteristics from the static postures are used in this method. Initially, the static pictures of various individual are taken from the right and left side using a camera. The silhouette is extracted for each of the pictures because the 3D information provided by the cameras is not enough to get a precise representation and also to provide a limit. This limit is the outer contour for the fitting process so that the model should not overpass that silhouette. Body silhouette extraction is achieved by a simple background subtraction and thresholding followed by applying a 3x3 median filter operator to suppress isolated pixels. The concept of visual hull is to characterize the best geometric approximation that can be achieved using a shape-from-silhouette reconstruction method. This method exploits the fact that the silhouette of an object in an arbitrary view re-projects onto a generalized polyhedral Visual Hull. After the background subtraction is performed, the system approximates the visual hull in the form of a polyhedral volume. Without loss of generality it is presumed that the XZ-plane of coordinate system is the ground plane, and the Y-axis is the normal to the ground. The four different gait characteristics identified in the obtained polyhedral volume are the head section, pelvic or the centroid, foot section and the two-foot regions. The location of the centroid of the subject is estimated by taking the center of gravity of the VH. For the polyhedral VH, it is the centroid of the polyhedral model that can be computed while building the model. For the sampled silhouettes, one estimates the VH by integrating the volume enclosed within the endpoints of the ray intervals. The HMM is applied to the obtained polyhedral volume and all the points corresponding to the gait character is identified. The set of points are then stored in a gait database. This database contains values of several gait postures of different samples. A gait sample, which needs to be identified and authenticated, is subjected to the above-mentioned procedure and sets of points are obtained. The obtained points are then compared against each corresponding set of points in the database. An architecture is presented in Figure 1.1 using these techniques for authentication purposes. Gait Sequence
Silhouette Extraction
Test Gait
Visual Hull Encapsulation
Visual Hull Encapsulation
HMM Evaluation
Silhouette Extraction
Gait Database
HMM Evaluation
Authentication System Figure 1.1. Architecture for enhanced Gait Recognition using HMM and Visual Hull Technique Identification System
(ii) Gait Recognition based on Stereo Vision and 3D Template Matching Stereovision technique is combined with the 3D template matching technique for effective identification and authentication of a person in this method. An HMM technique is also implemented in parallel to ensure accuracy of recognition as shown in Figure 1.2. The silhouettes are extracted from the given pictures using a simple background subtraction and thresholding followed by applying a 3x3 median filter operator to suppress isolated pixels. The visual hull technique is used as an approximate geometric model of the objects in the scene. This system approximates the visual hull in the form of a polyhedral volume. This is made use by both the systems. The polyhedral volume is then subjected to stereovision technique for the purpose of reconstruction. Here, the structural based technique is made use for reconstructing the 3D image. The image is converted into the boundary representation, and the correspondence candidates are found from the epipolar condition, intensity, and the shape. The connectivity of the segments is evaluated according to less distance, same intensity, and same angle. Based on the similarity, the correspondence between the left and right segments is found, and then the 3D information is reconstructed. From the 3D image thus obtained, a 3D template is extracted using the segmentation-based technique. This is used to find the precise position and orientation of the target object from depth data by projecting the corresponding 3D model. The extracted 3D template is stored in a database. A sample gait, which needs to be identified, is subjected to the above-mentioned procedure and is compared with the templates in the database. Parallel to this process, values are obtained from the silhouettes of image using HMM analysis. This is also used to identify and authenticate the person. Gait Sequence
Silhouette Extraction
Polyhedral Visual Hull Rendering
Silhouette Segmentation
Stereo Based Reconstruction
HMM Analysis
3D Template Extraction Gait Database
Identification & Authentication
Figure 1.2: Architecture for Multimodal Gait Recognition Based on Stereovision and 3D Template Matching
(iii) Gait Recognition Based on Isoluminance Line and 3D Template matching The Isoluminance lines for stereovision technique are combined with 3D template matching technique for effective identification and authentication in this method. This method is similar to the second method except that isoluminance line based stereovision is used to reconstruct the image and to find depth information. Initially, the visual hull technique is used as an approximate geometric model of the objects in the scene. After the background subtraction and thresholding, the system approximates the visual hull in the form of a polyhedral volume. Then the Isoluminance line based stereovision technique is applied to the polyhedral volume. The volume is segmented into various levels of black and white areas by setting a threshold. Black areas smaller than a certain threshold are deleted. A 3D partial image is thus reconstructed from the isoluminance lines. The partial images are merged together into one final image, which gives the depth information. From this 3D image, a 3D template is extracted using the segmentbased 3D template matching technique. This is used to find the precise position and orientation of the target object from depth data by projecting the corresponding 3D model. The extracted 3D template is stored in a database. A sample gait, which needs to be identified, is subjected to the above-mentioned procedure and is compared with the templates in the database. The architecture is shown in Figure 1.3.
Gait Sequence
Silhouette Extraction Stereo Images
Isoluminance 3D Reconstruction Depth Information from 3D Pose
Gait Database
Identification & Authentication
Figure 1.3. Architecture of Gait Recognition System
The architecture shown in Figure 1.1, Figure 1.2 and Figure 1.3 correspond to Gait Recognition Using HMM and Visual Hull Technique, Gait Recognition based on Stereo Vision and 3D Template Matching, and Gait Recognition Based on Isoluminance Line and 3D Template matching respectively are simulated using 40 test samples of silhouettes of different persons. The methods were simulated using Image Processing Toolbox in MATLAB. The silhouettes are extracted using the photo samples stored in the Carnegie Melon University database.
(iv.1) Simulation - Gait Recognition Using HMM and Visual Hull Technique The silhouettes are extracted using static images and VH encapsulation is applied on them. HMM is used to find the maximum likelihood detection using the values obtained from the VH encapsulation for identification purpose and these values are stored in a sample database. Testing is conducted with a new set of test sample against the ones stored in the database. The simulation results are shown below using two sample test cases. For the first case, the samples used are from two different persons for the same posture. The pictures used in this research are obtained from a controlled environment, i.e. laboratory. In the second test case, the samples are of the same person but for different postures. Initially, using the VH technique both cases are simulated to extract the features from silhouettes. Secondly four points are identified using HMM technique, and the four values corresponding to the centroid, distance between centroid and shoulders, and distance between the centroid and the foot are recorded. The values measured for case1 and case2 are given in Table 1 & 2. The values in Table 2 clearly demonstrate that the two samples are not identical and hence are in compliance with the actual data. Table 2 depicts a match greater than 86% from which an authentication can be concluded. Table 1: Sample for different persons for the same posture
Table 2: Test Sample for same person with different posture
Person 1
Person 2
Posture 1
Posture 2
490.00
201.00
561.50
560.45
317.00
251.50
507.45
507.35
307.00
240.00
480.12
481.54
220.00
263.65
390.56
390.56
(IV.2) Simulation - Gait Recognition based on Stereo Vision and 3D Template Matching The silhouettes are extracted using static images, and then VH encapsulation is applied. The depth of the image is obtained using structural analysis method of stereovision. The combined application of Visual Hull and Stereo Vision increases the accuracy of the depth information of the image. The depth information of the pre-identified points corresponding to centroid, shoulder points and the leg points are obtained for this structure. These values are stored in a database and a 3D model is constructed. From this 3D templates are extracted and are compared with the templates stored in the database for authentication. An HMM analysis is performed in parallel to locate the centroid of the silhouette, and is stored along with four different values corresponding to the regions identified. Validation is conducted with a characteristic feature of a new person against the ones stored in the database. Authentication is valid if the feature matches with more than 87% of the values of the corresponding image in the database. The silhouettes for this research are extracted using the photo samples stored in the Carnegie Melon University database.
The algorithm supports a variation of 45 degrees from the front posture on both the sides with a medium intensity of light. The algorithm is tested on 40 different samples and achieved a result of 89% of identification and authentication.
(IV.3) Simulation - Gait Recognition Based on Isoluminance Line and 3D Template matching (3DTM) The silhouettes are extracted using static images and the VH encapsulation is applied. Once the Visual Hull is built, isoluminance line for stereovision is applied. With the combination of Visual Hull and isoluminance lines for stereovision the depth information of the image is obtained accurately. The depth information is obtained for the pre-identified points of the structure. For this model 3D templates are extracted and are compared with the templates stored in the database for authentication. Testing is conducted with a characteristic feature of a new person against the one stored in the database. The simulation results for the architecture using isoluminance line and 3DTM are shown in Figure 1.4 and Figure 1.5. Authentication is valid if the feature matches with most of the values in the database for a given test sample. The simulation results using this method are compared with the ones using HMM technique and are shown in Figure 1.6. The silhouettes are extracted using the photo samples stored in the Carnegie Melon University database.
Figure 1.4. VH Simulation
Figure 1.5. Template Matching
100 95
X-axis: Number of samples Y-axis: Recognition percent
90 85 x
80
y1
75 70 65 60 1
2
3
4
5
6
Figure 1.6. Comparison graph of the two methods
2. Rail road and Road Safety (a) Securing Rail Road:(i) Object Detection on Rail Roads - “Vision-based Real-time Smart system to Prevent Railroad Accidents (VRSPRA)” The “Vision-based Real-time Smart system to Prevent Railroad Accidents (VRSPRA)” system detects obstacles on the railway track that may cause a possible derailment. The video and audio are recorded continuously using a loco cam that is mounted in the locomotive engine cabin while the train is moving from the perspective of a locomotive engineer. The image frames are analyzed by applying edge detection, path tracking and object identification techniques. Initially, frames are extracted from the video data and processed to monitor the railway track for gaps and for any obstacles. If there are any sudden variations in the number of edges in the current region of interest then a warning signal is generated for the locomotive engineer. The VRSPRA system architecture is shown in Figure 2(a).1.
Pre-Processing
Image Frame Extraction
Real-time Video Edge Detection
Alarm Generation
Path Tracking
Figure 2(a).1. VRSPRA System Architecture
A customized sample video stream collected during normal operation of the locomotive from GE Transportation Systems in Erie, Pennsylvania is used for testing the VRSPRA system. Initially during preprocessing, audio-video interleaved (avi) data is extracted from the customized data stream. This avi file is encoded using MPEG-4 compression format. Hence ‘DivX’ codec is used to extract the sample IFrames. Obstacles of different size are introduced in these frames and combined together to form a new video sequence. This video sequence is used to identify the objects as a possible threat and to generate an alarm signal. The image frames are analyzed by applying edge detection, path tracking and object identification techniques. The video is recorded at a speed of 30 frames per second. According to the research carried out at Defense Advanced Research Projects Agency (DARPA) in (2004), the normal speed of a locomotive is less than 100MPH. The distance traveled in one frame time is approximately 1.5 yards. The visibility range of the camera used is 1.5 miles. The distance covered before activating the alarm signal is 0.012 miles. The breaking distance for the train with a usual force of 40Newton is less than a mile. Hence, there is enough time for the driver to take proper action after receiving the warning alarm. The timing computation for alarm signal generation is given below: Let Train Speed be 100 MPH Let Frame Rate be 30 FPS Distance Covered in 1 Frame time = ~1.5 yards Visibility range of camera used = 1.5 miles Distance covered before alerting = 0.012 miles Breaking distance for the train with an usual force of 40 Newton < 1 Mile Hence it is possible to apply the brakes and stop the train in case any threatening object is identified on the tracks. Image frames with a sport utility vehicle (SUV), bus, locomotive cabin and a log of wood are inserted in the video stream collected from the locomotive cabin. The number of edges with and without the objects parked on railway track is counted and is given in Table 3. Table 3: List of objects and respective edge count
(ii)
Object
No. Of Edges
Alarm
SUV
225
Yes
Bus
269
Yes
Locomotive Cabin
311
Yes
Log of wood
160
No
Color-based Signal light Tracking in Real-time Video (CSRTV)
The “Color-based Signal light Tracking in Real-time Video (CSTRV)” system uses the luminous values of the glowing points combined with color values present in the image frame of a real time video. The color-based prediction finds application in the field of transportation for detecting the color of signal lights. This system uses the luminous values of the glowing points combined with color values present in the image frame of a real time video. La*b* color model is used to extract the luminous values in the image frame, and contour tracking is used for shape detection of the signal lights. The architecture for CSTRV system is shown in Figure 2(a).ii.1.
The image frames extracted from the real-time video are subjected to region segmentation to extract the Region of Interest (ROI). The ROI is extracted by analyzing the position of the signal lights in an image frame based on road tracking. This road tracking technique uses the edges of a road as reference lines. Based on these reference lines the image frame is segmented into different regions. The ROI is selected in such a way that the probability of presence of signal lights is maximal. Now the system needs to detect the presence of signal lights and then color. This can be done by converting the ROI image to La*b* color space and by applying contour tracking for shape recognition of signal lights. Video Frames
ROI
RGB to La*b* Conversion
Intensity
Grouping Algorith m
R/G – a* Y/B – b*
Edge Detection
Contour Tracking
Log File
Figure 2(a).ii.1 CSTRV system Architecture
The Commission Internationale de I’Eclairage (CIE) La*b* is the most complete color model used conventionally to describe all the colors visible to the human eye. CIELAB allows the specification of color perceptions in terms of a three-dimensional space in which L-axis deals with the lightness which extends from black to white, the a* deals with red and green colors and the b* deals with yellow and blue colors. The L, a*, and b* are calculated from the tristimulus values using following equations. L=116(Y/Yn)1/3-16
a*=500[(X/Xn)1/3-(Y/Yn)1/3] b* = 200[(Y/Yn)1/3 - (Z/Zn)1/3] where Xn, Yn and Zn are the values of X, Y, and Z for the illuminant that was used for the calculation of X, Y, and Z of the sample, and the quotients X/Xn, Y/Yn, and Z/Zn are all greater than 0.008856 The simulation is done using MATLAB 7.0.4 version. For simulation, several videos are taken in which some videos are with signal lights and others are without signal lights. The videos which are used for the simulation are taken from the computer-vision class of Gannon University database. This database consists of videos taken from an automobile moving at a speed of 25, 30, and 35 miles/hour. Sample videos taken while running at these speeds are tested using this algorithm. Region segmentation is performed on the transformed binary image of the original RGB image extracted from the input video. The binary image is a matrix of 0’s and 1’s. The 1’s in binary image represents either the sharp or blunt edges, or other structures containing sharp or blunt edges in the image. The tracing of the sharp edges is done using threshold values. The edges which lie below threshold values are eliminated and the values that lie above the threshold values are stored in a dataset. The dataset is subjected to a line tracing function to check the parallelism and continuity of the edges in the binary image. Line tracing function uses the adjacent pixel value method, in which it looks for the position of (coordinates) 1’s in all the directions and stores them in a database. The resultant coordinates of the edges found in the line tracing are used as reference coordinates for tracing the signal lights. The ColorValueSet needs to be tracked for the shape of the pole; i.e., whether the ColorValueSet lies inside the signal light pole or not. Tracing the shape of the signal light around the ColorValueSet will lead to the detection of the exact signal light with color using this contour or shape tracing. The shape tracing function takes the center point [X, Y] of the ColorValueSet as reference points, with width +X and height +Y as boundary points. It checks for the presence of horizontal and vertical lines of the pole based on these reference points and the corresponding boundary points. The presence of the horizontal and vertical lines is treated as presence of a signal light pole. The simulation produced effective results in detecting the color associated with signal lights. A log file will be written with frame number and presence or absence of color associated with signal lights as shown in Table 4. The execution times for various numbers of frames with and without signal lights are shown in Table 5. Table 4: The Log file of signal lights Frame Numbers 1-30 31-60 61-90 91-120
Presence of signal Lights No No Yes No
Color of signal light No color No color Red No color
Table 5: The execution times of frames
Frames 1 10 100 (with lights) 100(without lights)
Frame rate (seconds) 0.1406 1.2438 11.999 10.799
Figure 2(a).ii.2 shows the original frame extracted from the video in which the signal light is present, and Figure 2(a).ii.3 shows the image with ROI used to extract ColorValueSet i.e., the group of glowing pixels (LEPs) associated with either red, yellow, or green colors. Figure 2(a).ii.4 shows the image with ColorValueSet surrounded by contour that is used to predict the shape of the pole.
Figure 2(a).ii.2. The image frame from a video
Figure 2(a).ii.3. The image frame with ROI
Figure 2(a).ii.4. The ColorValueSet associated with contour
(b) Ensuring Road Safety – Architecture and Simulation A high level architecture for the VMSDR system is presented, and is shown in Figure 2(b).1. Video 1
Processing unit
Municipal Corporation
Video 2
Figure 2(b).1. High Level VMSDR Architecture
Penalty ticket
A detailed diagram is shown in Figure 2(b).2. Processing unit Input frames from videos
Find region of interest (ROI)
De-interlacing
Character segmentation
Neural network recognition system
Database containing Vehicle details
E-mail / mail penalty tickets
Figure 2(b).2. Detailed VMSDR architecture
For simulating the VMSDR architecture a video camera is placed at a height of 10 feet above the ground on a fixed pole in the parking lot at Gannon University, and another camera on the sidewalk. The timer of the processing unit is used for synchronization of videos from both the video cameras. The video captures 25 frames per second. These are used to identify the vehicle identification number (VIN) and the license plate number. As an example a single frame is used to explain the steps for detecting the VIN. Initially the region of interest (ROI) that includes the VIN is cropped from the video frame and is shown in the Figure 2(b).3.
Figure 2(b).3. Region of interest that includes the VIN on the metallic plate The ROI is further narrowed down to only the VIN and is used for further processing. A fine cropping is performed that includes only the VIN of size 24x135. Matlab 7.04 is used for simulating the algorithms. This code can be embedded in the processing unit attached to the video camera unit. The ROI consisting of only the VIN is shown in Figure 2(b).4 inside a Matlab window.
Figure 2(b).4. Region of interest (ROI) Since the vehicles may be moving fast it is likely to have interlacing artifacts in the ROI. Interlacing (from Wikipedia - 2007) can cause the image to have the artifacts such as Combing effect, Mice teeth, Saw tooth edge distortion, Interlaced lines, Ghost image, Blurring, etc. In order to remove these artifacts, a de-interlacing technique using linear interpolation is used. This will refine the characters in the VIN which are used for character recognition purpose. The ROI obtained after de-interlacing is used for character recognition. The characters are segmented into a 7x5 matrix and are stored in arrays. Figure 2(b).5 shows some of these character images.
Figure 2(b).5. Character images sent to neural network
Each character array is fed to a neural network which recognizes the characters of the VIN. The neural network is trained using the method given at University of Florida website for Neural Network Training (2007). These were for 26 alphabets and 10 digits for both noise and without noise, and used backpropagation algorithm to recognize the array. Figure 2(b).6 shows the result of feeding the image consisting of “1” to the neural network.
Array of the image “1”
Figure 2(b).6. Character images recognized by neural network algorithm
All the images are fed to the neural network one after another and the complete VIN is recognized. The procedure is repeated for at least 10 frames for each vehicle. The license plate is recognized using the algorithms given in Parker, J. R. (1994), and Lotufo, R. A., Morgan, A. D. & A.S. Johnson, A. S. (1990). This is sent along with the recognized VIN and the 10th frame from video camera as a proof of the vehicle in the middle of intersection on red light, to the test database management system which had data for 20 people. The VIN and the license plate numbers are verified using this database containing the vehicle’s details such as VIN, License plate number, owners name and address, insurance details, tickets issued in the current year. Using the address of the owner, a penalty ticket is issued based on his/ her previous driving records. A time period is given to the owner to contest / challenge in case someone else was driving during the ticketed time. In this way the driver is penalized instead of the owner. Instead of a test database management system used for simulation, a fully pledged Database Management System can be used at the municipal corporation side and penalty tickets can be issued automatically through e-mail or using ordinary mail. In case more details are needed, car facts of the vehicle can be checked using the VIN.
III.
CONCLUSION
The first method for Gait Recognition uses Hidden Markov Model, which is a stochastic process, is quite efficient in identification of gait characteristics for static postures. The images considered for this method are all 2D, which makes the process robust and less time consuming. Since the processing is done on 2D
images, the recognition rate is not quite high compared to the methods using Stereovision and Isoluminance lines with 3D Template Matching. The two methods using 3DTM have the distinct advantage of having higher recognition rate since they use 3D images. But the drawback of this method is the higher computing time that makes the process slower. Also the luminous intensity of the pictures plays an important role. An automated computer “Vision-based Real-time Smart system to Prevent Railroad Accidents (VRSPRA)” that analyses individual frames in the video stream and generates a warning signal for the locomotive driver for possible stopping. This system is simulated and was noted to be successful. The “Color-based Signal light Tracking in Real-time Video (CSTRV)” system is an intelligent system using La*b* color model in combination with contour tracking. This method analyzes each frame of the video sequentially, and detects the presence of signal lights and its color. This finds application in tracking the color of signal lights on rail roads that could be used for rail road accident investigation. Vision-based Monitoring System is for Detecting Red signal crossing (VMSDR) that captures and recognizes both VIN and License plate of the vehicle running ‘red’ signal light at the intersections presented. This research is intended to provide a support system for the law enforcement agencies to proactively ensure that intersections are engineered to discourage red light running. The VMSDR system can be extended to include the details of drivers who rent vehicles from rental services from in state or out of state. Future work will focus on having access to the VIN details of all the vehicles in 50 states which are distributed across the country. This system can monitor tickets issued in another state to the same driver. The limitations of this system are the extreme weather conditions. If there is snow on the windscreen right above the VIN or if there is a sticker which obstructs the VIN as seen from the camera, the VIN cannot be extracted and recognized.
References Amtrak (2004). Retrieved from http://www.ntsb.gov/events/2002/bourbonnais/amtrak59_anim.htm CCTV Info – UK (2005). Retrieved in 2005 from http://www.cctv-information.co.uk/constant3/anpr.html CNN News: “Train collision near Los Angeles kills 11” (2005). Retrieved on September 2005 from http://www.cnn.com/2005/US/01/26/train.derailment/ April 29, 2005 Defense Advanced Research Projects Agency (DARPA), DARPA Grand Challenge (2005), accessed March 14, 2005 from http://www.darpa.mil/grandchallenge Department for Transport - UK (2004). Retrieved in 2004 from http://www.dft.gov.uk/stellent/groups/dft_control/documents/homepage/dft_home_page.hcsp Drive and Safety Alive, Inc. (2006). Retrieved in 2006 from http://www.driveandstayalive.com/info%20section/statistics/stats-usa.htm Gomatam A.M., & Sasi S. (2004). “Enhanced gait recognition using HMM and VH techniques”. IEEE International Workshop on Imaging Systems and Techniques, pp. 144-147, 14 May 2004, Stresa - Lago Maggiore, Italy. DOI: 10.1109/IST.2004.1397302 Gomatam A.M., & Sasi S. (2004). “Multimodal Gait Recognition Based on Stereo Vision and 3D Template Matching”, Proceedings of the International Conference on Imaging Science, Systems and Technology (CISST’04). pp. 405-410, Las Vegas, Nevada, USA, June 21-24, 2004, CSREA Press
Gomatam, A. M. & Sasi. S.(2005). “Gait Recognition based on Isoluminance line and 3D Template Matching”, International Conference on Intelligent Sensing and Information Processing -ICISIP’05, pp. 156-160, January 04-07, 2005, IIT Chennai, India, DOI: 10.1109/ICISIP.2005.1529440 Gomatam, A.M. (2004). “Non-invasive Multimodal Biometric Recognition Techniques”, Unpublished MS Thesis from Gannon University, Erie, PA, USA Interlacing – Wikipedia (2007). Retrieved in 2007 from http://en.wikipedia.org/wiki/Interlacing License Plate recognition (2005). Retrieved in 2005 from http://www.oletc.org/oletctoday/0415_licplate.pdf#search=%22automatic%20license%20 plate%20recognition%22 Lotufo, R. A., Morgan, A. D. & A.S. Johnson, A. S. (1990), “Automatic Number Plate Recognition,” IEE Colloquium on Image Analysis for Transport Applications, February 1990, London, INSPEC Accession Number: 3649590 MoBo (2004) Retrieved on May 2004 from http://www.ri.cmu.edu/publication_view.html?pub_id=3904 Motorola Solutions for Government (2005), Retrieved in 2004 from http://www.motorola.com/governmentandenterprise/northamerica/enus/solution.aspx?navigationpath=id_801i/id_826i/id_2694i/id_2695i National Transportation Safety Board (2004), Retrieved in 2004 from http://www.ntsb.gov/Events/symp_rec/proceedings/authors/scaman.htm Neural Network Training (2007), Retrieved from University of Florida website in 2007 from http://www.math.ufl.edu/help/matlab/ReferenceTOC.html Parker, J. R. (1994), Practical Computer Vision Using C, Wiley 1994, New York, USA Rail Employee Fatigue (2004), Retrieved in 2004 from http://www.circadian.com/expert/fatigue_inattention.html Sharma, R & Sasi, S. (2007). “Vision-based Monitoring System for Detecting Red Signal Crossing”, Innovations and Advanced Techniques in Computer and Information Sciences and Engineering, pp. 29-33, Springer, 2007, ISBN: 978-1-4020-6267-4 Train Accident Report (2004). Retrieved in 2004 from http://www.visualexpert.com/Resoures/trainaccidents.html Trax (2004), Retrieved in 2004 from http://www.avtangeltrax.com/digital.htm Wikipedia for Number Plate (2005), Retrieved in 2004 from http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
Yelal, M. R., Sasi, S., Shaffer, G. R., & Kumar, A. K. (2006). “Color-based Signal light Tracking in Realtime Video”, IEEE International Conference on advanced Video and Signal based Surveillance, November 22-24, 2006, Sydney, Australia, DOI: 10.1109/AVSS.2006.34