External Visual Positioning System for Enclosed

0 downloads 0 Views 838KB Size Report
priate vehicle in order to substitute the in-vehicle positioning system. In this paper, we focus exclusively on the visual detection and positioning .... of the front number plate. For pedestrians ... implemented in C++ using the library OpenCV 2.4.5.
External Visual Positioning System for Enclosed Carparks Jens Einsiedler∗‡ , Daniel Becker∗ , Ilja Radusch∗† ∗ Fraunhofer Institute for Open Communication Technologies (FOKUS), Kaiserin-Augusta-Allee 31, 10589 Berlin, Germany

[email protected] † Daimler Center for Automotive Information Technology Innovations (DCAITI), Ernst-Reuter-Platz 7, 10587 Berlin, Germany

[email protected] ‡ Fraunhofer Application Center Wireless Sensor Systems, Am Hofbr¨auhaus 1, 96450 Coburg, Germany

[email protected]

Abstract—In the last decade, Global Navigation Satellite Systems (GNSS) have taken a key role in vehicular applications. However, GNSS-based systems are inoperable in enclosed areas like carparks. To overcome this problem, we developed an infrastructure-based positioning system which utilizes customary monocular surveillance cameras to determine the position of vehicles within a carpark. The position information is also provided via car-to-infrastructure communication to the appropriate vehicle in order to substitute the in-vehicle positioning system. In this paper, we focus exclusively on the visual detection and positioning component of this system for detecting and locating moving objects in the carpark. A detailed evaluation demonstrates that the proposed system is able to meet the requirements of common vehicular use cases such as navigation, obstacle warning or autonomous driving.

I. I NTRODUCTION In an ever-increasing variety of applications (e.g. navigation systems, intelligent driver assistance systems, self-driving vehicles, etc.), positioning systems play a key role in modern vehicles. Global Navigation Satellite Systems (GNSS) such as Global Positioning System (GPS) represent the predominant technology for these applications in outdoor areas. However, these systems have the notable disadvantage that they require a line-of-sight to the satellites to operate correctly and reliably. Several publications (e.g. [1], [2], [3]) have pointed out that the accuracy of both GPS and A-GPS decreases significantly in urban and densely built-up areas or even tree-lined roads. In particular in enclosed areas where a line-of-sight to the satellites is unavailable, GNSS-based systems do not work anymore or only at a very degraded performance. Examples are tunnels, carparks as well as urban canyons. In these scenarios and in other fields of applications for which GPS does not meet the requirements, alternative positioning systems are necessary. Vehicular indoor positioning represents a specific field of application and has been addressed in several works. Generally, the complexity of the proposed approaches grows with the requirements, which range from navigation solutions (e.g. [4], [5]) to autonomous driving (e.g. [6], [7]). However, these approaches are usually based on complex sensors or require extensive infrastructure modifications. Also, in-vehicle sensors

are limited to line-of-sight, i.e. it is impossible to detect obstructed objects. A system utilizing global knowledge from an infrastructure perspective can overcome these disadvantages. In this work, we present an approach for infrastructure based (i.e. external) positioning of objects with surveillance cameras in enclosed spaces such as underground carparks. We utilize off-the-shelf monocular network cameras to monitor the lanes within the carpark. In our system, dedicated software agents process the image streams of cameras, detect vehicles and pedestrians, determine their position and provide these information to the positioning clients (e.g. C2X In-Vehicle Client, Smartphone, etc.). This paper focuses exclusively on the detection and positioning component of this system and their evaluation in our realistic carpark testbed. The paper is organized as follows. Related work is discussed in Section 2. In Section 3 we present an overview of requirements of selected applications. Our system architecture is described in Section 4. Section 5 explains our visual positioning approach and Section 6 provides a detailed evaluation. Finally, this work closes with a summary and outlook in Section 7. II. R ELATED W ORK The ability to detect and classify objects and their actions by common video sensors and the need for intelligent visual surveillance in security, marketing and military applications has made intelligent visual surveillance systems one of the most active computer vision research areas during the last decade [8]. Vision based systems have also claimed a significant role in several fields of traffic and vehicular applications due to their availability, high detection rate, low costs and sensor range [9]. The publications [10] and [11] show approaches to track vehicles in tunnels. However, the vehicles are not localized in this works. [12] in turn is concerned with 3D localization of moving objects (e.g. forklifts and people) with monocular cameras. In [13] an agent-based multi-camera person detection and tracking framework is presented which utilizes color appearance features to differentiate objects. [14] introduces both a single and multi-camera view person localization system for environments with high occlusions.

III. R EQUIREMENTS AND U SE C ASES When designing and implementing an indoor localization system, it is important to be aware about different classes of requirements which need to be met by this system [15], [16]. The most obvious one is accuracy which is usually defined as the median positioning error, i.e. the center of a normally distributed error when considering the probability density function (abbr. pdf ) of the error. Precision on the other hand describes the reproducibility of the positioning error, i.e. relates to the standard deviation of the pdf. Two other related requirements are latency and detection rate. Latency describes the time period between position determination and delivery to the positioning client which is most important if the position is determined externally and then transmitted to the client. The detection rate relates to the number of position detections per time period. Reliability and availability are also related. Reliability refers to the operating time where the system does not behave according to the specified accuracy and precision, e.g. the percentage of operating time where the positioning errors are significantly higher than expected due to unforeseeable environmental conditions. In contrast, availability represents the downtime of the positioning system, i.e. the percentage of time where no position can be provided. Other requirements include scalability defining the potential for upgrading a system from smaller to larger scales and cost relating to the available financial budget for a system. Overall, the requirements cannot be considered independently because they influence each other. Moreover, they heavily depend on the intended application. In the following, three different vehicular applications and their requirements to the localization system are introduced: Navigation, obstacle warning and autonomous driving. Navigation systems provide a map view with the current position and directions to the destination. Obstacle warning systems indicate if the road is blocked (e.g. by a pedestrian) in order to prevent a collision. Lastly, autonomous driving systems direct the vehicle to a programmed destination without any human interaction at all. Accuracy

Cost / available funds

Latency

Availability

Detection rate Reliability

Navigation system

Fig. 1.

Obstacle warning

IV. S YSTEM OVERVIEW In our previous works [17] and [18], we have described the overall system architecture (see Fig. 2) and the purpose of each individual component in detail. The focus of this work is set on the Lane Monitoring Worker (LMW) which connects to the cameras and accesses the video stream in order to perform motion and object detection as well as position determination of detected objects. All these information are sent to a central software component, the Detection Module (DM) which performs local tracking, aggregates the position information and transmits it to the Tracking/Identification Module (TM/IM). The TM/IM performs global tracking and identification of the externally observed positions and forwards the appropriate data to the correct registered endpoints, i.e. mobile positioning clients. Client Mobil client 1

REST

Mobil client n

R

REST

R

Backend Server

Tracking and Identification Module

Socket/REST

R

Detection Module Socket

R

Socket

R

Distributed System Lane Monitoring Worker 1 HTTP(S)/RT(S)P

R

Lane Camera 1

Fig. 2.

Lane Monitoring Worker n HTTP(S)/RT(S)P

R

Lane Camera n

FMC Diagram of the System

V. P OSITIONING A PPROACH Precision

Scalability

the outside. The total requirements are the lowest for the navigation and the highest for the autonomous driving application with the obstacle warning at an intermediate position.

Autonomous driving

Radar chart of requirements for three different use cases

The radar chart (see Fig. 1) visualizes the requirements for each use case: A lower requirement is indicated closer to the center of the circle whereas a higher requirement is closer to

Our approach is realized in four stages as illustrated in Fig. 3. The first stage (A.) of the actual image processing is to receive the latest camera frame. A combined motion detection and classification algorithm locates and classifies all moving vehicles and pedestrians within the camera observed area during the second stage (B.). In the third stage (C.), the position of the detected objects are calculated by a transformation from image plane (pixels) to floor plane (meters) coordinates. Also the geo-position is calculated based on the a priori knowledge about the camera mounting point. Last but not least, features of the detected vehicles are extracted and matched to features of already known ones to track vehicles within the camera’s field of view during the last stage (D.). Finally, all information are sent to the DM. Furthermore, the presented approach is able to detect the object’s position relative to the camera but

not its pose or heading. Apart from the motion detection, this approach is stateless, i.e. each image is regarded individually. A.

Get Frame

B. Motiondetection [Motions>=1]

[Motions==0]

Detecting vehicles in static images is a solved computer vision problem as [21] demonstrates. Similar to this work, we use a cascade of boosted Haar-like feature classifiers for the vehicle detection within the predefined ROIs. This approach was proposed in [22] for the first time and has been improved later in [23]. Our classifier was trained with approx. 3500 positive and negative training images, taken in a carpark combined with images of several computer vision test image sets (e.g. MIT Car Dataset, Caltech, INRA, etc.). Related to the work [24], we make use of Histograms of Oriented Gradients HOG features to detect pedestrians. The default HOG cascade shipped with OpenCV [25] is used.

Classify objects

C. Position Determination C.

[Objects==0] [Objects>=1]

Determine position [Vehicle]

[No Vehicle]

D. Local tracking

Send information

Fig. 3.

At first, we apply the GrabCut algorithm [26] to separate the objects from the background to determine their root point [12] which is defined as the central connection of the object and the ground used for position determination. For vehicles, the root point is defined as the floor position below the center of the front number plate. For pedestrians, the floor position below the person’s center of gravity is defined as root point. The following Figure (see Fig. 4) illustrates the root point determination of our approach for vehicles and pedestrians: At the top, object detection area is shown along with the manually annotated root point (red ”x”). Below, the GrabCut segmented area and the root point (red ”+”) are shown which were automatically determined by our approach.

Activity diagram of the localization process

Due to the focus of this work, we limit the subsequent work to the points A. to C. The feature-based tracking in stage D. will be described in detail in a future work. A. Grab Frames An asynchronous working thread manages both the connection to the network camera and decoding of the video stream. Each received frame is stored in combination with the time of reception (timestamp) in a dual buffer. This enables us to provide the latest fully decoded and pre-processed frame to the image processing as well as to receive the next frame simultaneously in background. B. Object Detection As a first step in image processing, we analyze the video stream for moving objects to limit the subsequent object detection to these regions. To achieve this, we use an existing implementation of Branski’s and Davis [19] motion template algorithm which uses Motion History Images (MHI) [20] to detect and track movements. The MHI are updated once in each iteration. All motions older than fps/2 frames (approx. 0.5s) were deleted during this update. All others were filtered by size (elements whose dimensions are