Document not found! Please try again

Robust Drivable Road Region Detection for Fixed-Route ... - MDPI

0 downloads 0 Views 5MB Size Report
Nov 27, 2018 - In this paper, a computer vision and neural networks-based drivable road region detection approach is ...... The KITTI vision benchmark suite.
sensors Article

Robust Drivable Road Region Detection for Fixed-Route Autonomous Vehicles Using Map-Fusion Images Yichao Cai 1,† , Dachuan Li 2,† , Xiao Zhou 1,2, * and Xingang Mou 1 1 2

* †

School of Mechanical and Electronic Engineering, Wuhan University of Technology, Wuhan 430070, China; [email protected] (Y.C.); [email protected] (X.M.) California PATH, University of California, Berkeley, Richmond, CA 94804-2468, USA; [email protected] Correspondence: [email protected]; Tel.: +1-510-570-9638 These authors contributed equally to this work.

Received: 30 August 2018; Accepted: 20 November 2018; Published: 27 November 2018

 

Abstract: Environment perception is one of the major issues in autonomous driving systems. In particular, effective and robust drivable road region detection still remains a challenge to be addressed for autonomous vehicles in multi-lane roads, intersections and unstructured road environments. In this paper, a computer vision and neural networks-based drivable road region detection approach is proposed for fixed-route autonomous vehicles (e.g., shuttles, buses and other vehicles operating on fixed routes), using a vehicle-mounted camera, route map and real-time vehicle location. The key idea of the proposed approach is to fuse an image with its corresponding local route map to obtain the map-fusion image (MFI) where the information of the image and route map act as complementary to each other. The information of the image can be utilized in road regions with rich features, while local route map acts as critical heuristics that enable robust drivable road region detection in areas without clear lane marking or borders. A neural network model constructed upon the Convolutional Neural Networks (CNNs), namely FCN-VGG16, is utilized to extract the drivable road region from the fused MFI. The proposed approach is validated using real-world driving scenario videos captured by an industrial camera mounted on a testing vehicle. Experiments demonstrate that the proposed approach outperforms the conventional approach which uses non-fused images in terms of detection accuracy and robustness, and it achieves desirable robustness against undesirable illumination conditions and pavement appearance, as well as projection and map-fusion errors. Keywords: drivable road region detection; autonomous vehicles; map-fusion image; FCNs

1. Introduction Environment perception is a critical technical issue for autonomous vehicles and significant progress has been made during the past decade. In particular, road detection is one of the key aspects of environment perception, since identifying key elements of the road (such as road markings and lane borders) and drivable road regions is essential for providing valid road information to support decision-making and to navigate the vehicle on roads [1]. In the area of autonomous driving systems, various environment perception systems have been designed based on different perceptions sensors (monocular/stereo camera, LiDAR and radar, etc.), among which LiDAR and camera are the most commonly used sensors for environment perception. Although LiDAR has been applied to perception tasks in several autonomous vehicle systems and demonstrated its effectiveness in experimental tests [2,3], its application in drivable road detection is still limited by its disadvantages such as high cost, low resolution and lack of texture information. Sensors 2018, 18, 4158; doi:10.3390/s18124158

www.mdpi.com/journal/sensors

Sensors 2018, 18, 4158

2 of 15

Cameras are preferable to LiDAR in road detection due to their low cost, light weight, high resolution and simplicity in installation and configuration. For camera-based perception systems, the objective of drivable road region detection is to identify a region of pixels that indicates the drivable (or navigation-free) area in a given image captured by the camera, and examples of applications of camera in road/lane detection can be found in [4–9]. The real-world autonomous driving scenarios pose significant challenges to camera-based perception systems, due to the following factors: (1) Unstructured road environments: road markings and lane borders are not always available, and the marking/borders may be too vague to be identified; (2) variable illumination conditions: the images may contain shadows and other undesirable illumination conditions; (3) road curvatures: the camera’s field-of-view may not capture the entire region needed due to curved road segments; (4) ununiform pavement appearance and occlusions: the road pavement in the image may contain variable texture and color, and the objects in the camera’s field-of-view may cause occlusions in the image. Despite the progress made in the road detection systems, the robust and effective detection of drivable regions in variable and challenging road environments remains a major challenge to be addressed. To address the above issues for autonomous vehicles, this paper proposes a drivable road region detection approach using a monocular vehicle-mounted camera. This research focuses particularly on autonomous vehicles that have access to route maps (e.g., shuttles, buses and other vehicles operate on predefined fixed routes). Taking advantage of the route map and the vehicle location information, the proposed approach extracts from the full route map a local map which corresponds to the current processed image captured by the camera, which is then fused with the projected image to obtain a map-fusion image (MFI). The MFIs enable robust drivable region detection in the presence of undesirable illumination conditions, ununiformed pavement appearance, occlusions and absence of road/lane marking. In addition, for multi-lane scenarios, the lane navigation information (the lane which the vehicle is supposed to operate in) can also be provided in the MFI. The road detection approach is built upon the Fully Convolutional Network (FCN)-VGG16 framework which is trained using the MFIs. Therefore, the drivable road regions in each MFI are identified using the trained FCN-VGG16. The proposed approach is validated using video data collected from real-world scenarios, and experimental results show that the proposed approach achieves high accuracy and robustness in drivable road region detection in various driving scenarios and outperforms contemporary approach. The remainder of this paper is organized as follows: Section 2 reviews the existing approaches for road detection. Section 3 formulates the drivable road detection problem. Details of the proposed road detection approach is presented in Section 4. Experimental results based on real-world scenarios are presented and analyzed in Section 5, and the paper is finally concluded in Section 6. 2. Related Works In general, drivable road region detection can be classified as a road detection problem. During the past decade, various computer vision-based approaches have been proposed to address road detection problem for autonomous vehicle systems, and they can be generally divided into three categories. One straightforward paradigm detects the road regions by extracting the lane markings and road boundaries. This category of approaches utilizes conventional image processing techniques to extract the road boundaries by using image features such as edge, position and color [4–9]. After that, the boundaries are fitted (linear models [6,7] and high-order curves/splines are used for straight and curved road segments [8], respectively), and the area between the identified lane markings and/or boundaries is considered as the drivable road region. The conventional approaches have the primary advantage of offering the opportunity to evaluate specific performance criteria and factors that affect the performance, such as descriptors used. However, these approaches are sensitive to illumination variation and occlusion, and their performance may degrade significantly when dealing with roads with vague or discontinuous lane markings. Another category of approaches detects roads by analyzing texture features and attempting to find the inhomogeneity between road area and backgrounds in the

Sensors 2018, 18, 4158

3 of 15

image [10–14]. To specify texture information in the image, Gabor Filter [10,13] and discrete cosine transform [14] are commonly used in this type of road detection approaches and they can achieve desirable performance. Although this type of approaches performs well on rural, single-lane roads, their performance is undesirable when detecting drivable region on multi-lane roads. In general, the conventional approaches described above achieve desirable road detection performance only in specific scenarios and are less effective in more complex driving environments. For instance, the lane on the opposite direction of a two-way road may be incorrectly identified as drivable road region by the marking/boundary-based road detectors. In addition, there are usually multiple driving directions in intersection regions and it is difficult to identify correct drivable road regions only from boundaries and texture information. Therefore, the application of the conventional marking/boundary-based and texture-based approaches to real-world driving scenarios are still very limited due to their limitations. As alternatives to the conventional approaches, machine-learning based semantic segmentation approaches have emerged as effective tools [15] in computer vision area, and due to their high precision and efficiency, they have been applied to a number of autonomous driving systems [16–19] in recent years. Semantic segmentation approaches can achieve pixel-level precision and demonstrate effectiveness in many road detection applications, and typical semantic segmentation algorithms are built upon CNNs, as proposed in [20–24]. A more recent example of such approaches is the Vanishing Point Guided Network (VPG-Net) [23] which can detect road under complex environmental condition. However, such CNN-based models typically process images using a local block as the input, which requires high computational cost. The performance of machine-learning-based approach are heavily affected by the dataset used to train the network and the size of required dataset may increase dramatically if the network is required to adapt to various use cases. In order to improve the efficiency of CNNs, the Fully Convolutional Networks (FCNs) is proposed in [25], which can recover the class of each pixel from abstracted features in the image. Since it is proposed, a number of research works have focused on making extensions and improvements to FCNs and applying the approach in the field of autonomous driving systems [26–28]. One of the most preferred variants is the MultiNet [28] which is a codec structure model based on FCNs and VGG [27], and it can accomplish joint classification, detection and semantic segmentation via a unified framework. The encoder of MultiNet is designed to extract the features from image, while the decoder is responsible for multiple tasks such as road detection. In the applications of road detection, the training parameters can be initialized with pre-trained VGGs to speed up the training speed, and the road region is detected using FCNs. MultiNet demonstrates appealing performance both in structured and unstructured road with a desirable trade-off between accuracy and training cost. Therefore, taking advantage of the machine-learning based frameworks, the approach proposed in this paper uses an FCN-VGG16 framework to identify drivable road region from map-fusion images. On the other hand, real-world contains many complex driving scenarios that may be difficult to be handled by systems with single perception data source. As described previously, for intersections containing multiple lanes and multiple driving directions, it can be difficult to determine which direction the vehicle should be driving on by using solely the information of images. Therefore, fusing information from multiple sources is a preferable choice to improve the performance of road detection. [29] proposes a lane marking detection framework that fuses GPS data, vehicle velocity and lane map, and it demonstrates good performance in structured road scenarios. Similarly, GPS data and image are fused for localization and decision making in [30]. Inspired by the previous research, the vehicle location data and route map data are used to generate local route maps in the proposed framework. The vehicle location data can either be obtained from vehicle-mounted GPS or other data feeds such as point cloud map-matching of LiDAR data. We fuse a local route map and its corresponding image captured by vehicle-mounted camera into a map-fusion image (MFI), which provides high level heuristics for the drivable region detection.

Sensors 2018, 18, 4158

4 of 15

3. Problem Formulation Drivable road region detection can be considered as a subtask of road detection which involves Sensors 2018, 18, x FOR PEER REVIEW 4 of 14 all key elements in the road environment. Without considering the pedestrians, vehicles and other obstacles, obstacles, the the drivable drivable road road region region can can be be defined defined as as the the specific specific road road area area that that the the vehicles vehicles are are supposed to drive in. For the monocular camera-based systems, the data precepted from the driving supposed to drive in. For the monocular camera-based systems, the data precepted from the driving environment images captured by the camera. As shown in Figure a typical environmentare arethe the images captured by vehicle-mounted the vehicle-mounted camera. As shown in 1a, Figure 1a, a image of the driving environment can be divided into three parts: the background area, the drivable typical image of the driving environment can be divided into three parts: the background area, the area and the road area.road Given an image (Figure captured the camera at certain drivable areanon-drivable and the non-drivable area. Given Ian image1a) I (Figure 1a)by captured by the camera timestamp t, the definitions used in drivable road region detection are as follows: at certain timestamp t, the definitions used in drivable road region detection are as follows: S—theS—the set of set all pixels in thein image I, S = 𝐼,{ p𝑆| p=∈{𝑝|𝑝 I } ∈ 𝐼} of all pixels the image Sb —the set of pixels in the background area (or non-road area) area) 𝑆𝑏 —the set of pixels in the background area (or non-road 𝑆𝑟 —the of pixels the region road region Sr —the set of set pixels in thein road 𝑆 —the set of pixels in the drivable road region Sd —the𝑑 set of pixels in the drivable road region 𝑆𝑢𝑑 —the of pixels the non-drivable road region. S —the set of set pixels in thein non-drivable road region. ud

The The relationship between the above sets sets is asisfollows (Figure 1b):1b): relationship between the above as follows (Figure 𝑆 = 𝑆𝑏 ∪ 𝑆𝑟 , S = S b ∪ Sr ,

(1) (1)

𝑆𝑏 ∩ 𝑆𝑟 = ∅, S b ∩ S r = ∅,

(2) (2)

∪ 𝑆𝑢𝑑 , 𝑟 = Sr 𝑆= Sd𝑆∪𝑑 S ud ,

(3) (3)

Sd 𝑆∩𝑑 S∩ud𝑆𝑢𝑑 =∅ = ,∅,

(4) (4)

Given above definitions, monocular camera-basedroad roadregion region detection detection problem can Given the the above definitions, thethe monocular camera-based can be be formulated formulatedas asfollows: follows: Definition 1 (Drivable Road Region Detection): Given a set of pixels 𝑆 in an image 𝐼, extract the Definition 1 (Drivable Road Region Detection). Given a set of pixels S in an image I, extract the pixels of pixels of 𝑆 from 𝑆. Sd from S. 𝑑 In this paper, the objective of drivable road region detection is to extract the drivable regions in In this paper, thethat objective of drivable roadpre-specified region detection toaddition, extract the in the captured images correspond with the route.isIn fordrivable vehiclesregions operating the captured images that correspond with the pre-specified route. In addition, for vehicles operating on fixed routes, note that the drivable road regions include regions of both single road segments and on fixed routes, note that the drivable road regions include regions of both single road segments intersections. and intersections.

(a)

(b)

Figure drivable region regiondetection: detection:(a) (a)Regions Regionsininanan image captured camera; Figure1.1. Definition Definition of drivable image captured byby thethe camera; (b) (b) Relationship between regions. Relationship between thethe regions.

4. Drivable Road Region Detection 4. Drivable Road Region Detection The overall framework of the proposed approach is depicted in Figure 2. Our approach consists The overall framework of the proposed approach is depicted in Figure 2. Our approach consists of three primary processes: (1) Image pre-processing: the original images captured by the camera are of three primary processes: 1) Image pre-processing: the original images captured by the camera are swapped into bird’s-eye view images using Inverse Perspective Mapping approach. (2) Map fusion: swapped into bird’s-eye view images using Inverse Perspective Mapping approach. 2) Map fusion: the homogenous local route map of the processed image is extracted from a global route map with the assistance of vehicle location data, and it is then fused with the image to generate a map-fusion image (MFI). 3) Drivable road region detection: a FCNs-based deep neural network (FCN-VGG16) is utilized to detect drivable road region from the MFIs. Note that the FCN-VGG16 are also trained using datasets of MFI images. Details of each process are presented in the following subsections.

Sensors 2018, 18, 4158

5 of 15

the homogenous local route map of the processed image is extracted from a global route map with the assistance of vehicle location data, and it is then fused with the image to generate a map-fusion image (MFI). (3) Drivable road region detection: a FCNs-based deep neural network (FCN-VGG16) is utilized to detect drivable road region from the MFIs. Note that the FCN-VGG16 are also trained using datasets of MFI images. Details of each process are presented in the following subsections. Sensors 2018, 18, x FOR PEER REVIEW 5 of 14

IPM IPM conversion conversion Monocular vehicle-mounted camera

Bird’s eye view images

Training

Map Map fusion fusion

Original images

Drivable Drivable Road Road Regions Regions

Map-fusion Map-fusion images images Global route map

Local Local map map extraction extraction

Local Local Route Route map map

FCN-VGG16

Vehicle Vehicle location location data data

Figure Figure 2. Overall OverallFramework Frameworkofof the the drivable drivable road road region region detection detection based based on on map-fusion map-fusion and and monocular monocular camera. camera.

4.1. Inverse Inverse Perspective Perspective Mapping Mapping 4.1. Vehicle-mounted cameras camerasare are typically installed a that waythe that facethe toward the Vehicle-mounted typically installed in a in way lensthe facelens toward forwardforward-driving direction of the AsFigure depicted in Figure 3a, images captured by the driving direction of the vehicle. As vehicle. depicted in 3a, images captured by the vehicle-mounted vehicle-mounted camera are normally in the front-view, which is more suitable for certain detection camera are normally in the front-view, which is more suitable for certain detection applications such applications such as pedestrian detection [31] and traffic-sign detection [32]. However, for drivable as pedestrian detection [31] and traffic-sign detection [32]. However, for drivable road region road region detection images incontain front-view a lot of redundant detection tasks, imagestasks, in front-view a lotcontain of redundant informationinformation which maywhich affectmay the affect the effectiveness of the detection algorithm. Compared with the front view, the bird’s-eye view effectiveness of the detection algorithm. Compared with the front view, the bird’s-eye view is more is more preferable inregion drivable region applications detection applications the image less redundant preferable in drivable detection as the imageascontains lesscontains redundant information, information, and the lane markings become roughly parallel [4] (Figure 3b). and the lane markings become roughly parallel [4] (Figure 3b). Without loss loss of of generality, generality, the the road road regions regions can can be be assumed assumed to to be be homographic homographic planes planes in in road road Without detection scenarios. Therefore, we can easily map the original images into bird’s eye view using detection scenarios. Therefore, we can easily map the original images into bird’s eye view using Inverse Perspective Perspective Mapping Mapping (IPM). (IPM). The The IPM IPM can can be be considered considered as as an an image image projection projection process process and and Inverse in our our application, application, the the pixels pixels in in the the original original images images can can be be projected projected to to the the bird’s bird’s eye-view eye-view images images in using a mapping matrix (projection matrix). In our application, a multiple-point correspondence-based using a mapping matrix (projection matrix). In our application, a multiple-point correspondenceˆ method is usedisto estimate the mapping matrixmatrix H: ̂: based method used to estimate the mapping 𝑯 0 ′ , (i = 1, 2, …n) 𝑝𝑖 Hp = 𝑯𝑝 pi = i , (𝑖i = 1, 2, . . . n )

(5) (5)

̂ 𝑝′, ˆ𝑯 p 𝑝==Hp 0,

(6) (6)

  II 0 ))and  ( 0p∈ where p′ homogeneous coordinates coordinates of of pixels pixels in in the the original original image image I 0I (p p0 and p represent the homogeneous and ̂H ˆ can p ∈ I ),I), the respectively. the projected projected image II (bird’s-eye (bird’s-eye view)( view) (p respectively.The Themapping mappingmatrix matrix𝑯 can be be estimated estimated 0 and p (i = 1, 2, . . . , n) from 0 and I I, respectively, and in two-steps: two-steps:First, First,select selecttwo twosets setsofofpixels pixelspip′ i and in pi (i i = 1, 2, …, n) from I  Iand , respectively, and ˆ ˆ ̂ the optimal optimal estimate estimate (𝑯 (̂H) of the the mapping mappingmatrix matrix𝑯H can can be be calculated calculated based-on based-on Equation Equation(5). (5). Once Once 𝑯 H the ) of 0 is determined, determined, all all pixels pixels of the front-view image II can can be be projected projected into into aa bird’s-eye bird’s-eye view view image image I I is ˆ ̂ using the (Equation (6)).(6)). (Note that the usedisinused our approach to reduce redundant using the estimated estimatedH 𝑯 (Equation (Note thatIPM theisIPM in our approach to reduce informationinformation instead of trying ensure markings parallel.) redundant insteadtoof tryingthe to lane ensure the laneare markings are parallel.)

(a)

(b)

the projected image I (bird’s-eye view)( p  I ), respectively. The mapping matrix 𝑯 can be estimated in two-steps: First, select two sets of pixels pi′ and pi ( i = 1, 2, …, n) from I  and I , respectively, and ̂ ) of the mapping matrix 𝑯 can be calculated based-on Equation (5). Once 𝑯 ̂ the optimal estimate (𝑯 is determined, all pixels of the front-view image I  can be projected into a bird’s-eye view image I ̂ (Equation (6)). (Note that the IPM is used in our approach to reduce using the estimated 𝑯 Sensors 2018, 18, 4158 6 of 15 redundant information instead of trying to ensure the lane markings are parallel.)

(a)

(b)

Figure 3. Images in front viewview and bird’s-eye view:view: (a) Front-view image; (b) Bird’s-eye-view image. Sensors 2018, 18, x FOR PEER REVIEW 6 of 14 Figure 3. Images in front and bird’s-eye (a) Front-view image; (b) Bird’s-eye-view image.

̂ˆ is Note estimation of of projection projection matrix matrix 𝑯 H Note that that the above described estimation is aa pre-calibration pre-calibration procedure, procedure, ˆ ̂ and the IPM process in our system uses the fixed H calculated via the pre-calibration. As a matter and process in our system uses the fixed 𝑯 calculated via the pre-calibration. As a matter of ˆ ̂ , and therefore, aa more of fact, rolland andpitch pitchmovements movementsofofthe thevehicle vehiclecan cancause causevariations variationsof of 𝑯 H, fact, thethe roll more ˆ𝑯 ̂ based accurate based on on thethe measurement of the roll and in real-time. accurateway wayofofIPM IPMisistotoadjust adjustH measurement of vehicle’s the vehicle’s roll pitch and pitch in realHowever, considering that the vehicle performs minor roll and pitch movements at low speeds in time. However, considering that the vehicle performs minor roll and pitch movements at low speeds regular driving environments, the proposed drivabledrivable road detection approach can achievecan robustness in regular driving environments, the proposed road detection approach achieve against such movements and small IPM error (see Section 5.4). Therefore, it is reasonable to use a fixed, robustness against such movements and small IPM error (see Section 5.4). Therefore, it is reasonable ˆ ̂ in our application. pre-calibrated in our application. to use a fixed, H pre-calibrated 𝑯

4.2. Image Generation Generation 4.2. Map-Fusion Map-fusion Image Figure Figure44 depicts depicts the the process process of of map-fusion map-fusion image image generation. generation. The The overall overall process process consists consists of of four four steps which are described in the following subsections. steps which are described in the following subsections.

Figure 4. 4. Generation Generation of of map-fusion map-fusion images. images. Figure

4.2.1. Image Generation Generation 4.2.1. Map-Fusion Map-fusion Image For For vehicles vehicles driving driving on on fixed fixed routes routes (e.g., (e.g., shuttles, shuttles, buses buses and and school school buses) buses) or or vehicles vehicles with with pre-established high-precision maps, the route maps provide critical heuristics for the pre-established high-precision maps, the route maps provide critical heuristics for the environment environment perception perception and and navigation navigation of of vehicles. vehicles. A A route route map map isisdefined definedas asaamap mapthat thatconsists consistsof ofroads/lanes, roads/lanes, navigation points and driving directions of the routes that the vehicle is supposed to operate navigation driving directions of the routes that the vehicle is supposed to operate on.on. In In our system, routemap mapisispresented presentedand andstored storedin in the the form form of straight and curved our system, thethe route curved segments segments (Figure a certain geographical location of the vehicle in (Figure4), 4),and andeach eachpoint pointininthe theroute routemap maprepresents represents a certain geographical location of the vehicle the corresponding segment. In addition, for fixed routes, there is only one forward driving direction in the corresponding segment. In addition, for fixed routes, there is only one forward driving for the vehicle in a straight intersection, this driving direction is also direction for the vehicle road in a segment straight or road segment and or intersection, and thisinformation driving direction included in the route map. The global map can be generated using various data sources such as information is also included in the route map. The global map can be generated using various GPS data maps and high-precision maps. For the framework in this paper, a satellite map witha sources such as GPS maps2D and high-precision 2D maps. proposed For the framework proposed in this paper, geo-location (i.e., GPS coordinates) used to generateisthe global route map 4). satellite mapinformation with geo-location information (i.e.,isGPS coordinates) used to generate the (Figure global route map (Figure 4). 4.2.2. Local Map Extraction 4.2.2.When Localthe Map Extraction vehicle operates on the route which is pre-specified in the global route map, a local route map that corresponds the road in the current image from the camera is extracted from the When the vehiclewith operates on region the route which is pre-specified in the global route map, a local route map that corresponds with the road region in the current image from the camera is extracted from the route map every time an image is captured by the vehicle-mounted camera. In the proposed framework, the matching between the local route map and images is based on GPS data from the vehicle-mounted GPS module. For each frame of image and GPS coordinates, we search its corresponding position and localize the corresponding region in the global route map using GPS

Sensors 2018, 18, 4158

7 of 15

route map every time an image is captured by the vehicle-mounted camera. In the proposed framework, the matching between the local route map and images is based on GPS data from the vehicle-mounted GPS module. For each frame of image and GPS coordinates, we search its corresponding position and localize the corresponding region in the global route map using GPS coordinates. After that, a local route map with a look-ahead length of 20 m can be extracted from the global route map, using the current position of the vehicle (according to the installation position of the camera, the equivalent height of each bird’s-eye-view image is approximately 20 m above the road surface, leading to a look-ahead distance of 20 m). Note that vehicle location from other data feeds (e.g., LiDAR point cloud matching) can also be utilized in the proposed approach. The synchronization of the images and GPS data is a major issue in local route map extraction, as the update rates of the vehicle-mounted camera and the GPS module may be different. In our approach, it is appropriate to use the GPS data of which the timestamp is closest to that of the image, under the assumption that the vehicle would not move a long distance within milliseconds. 4.2.3. Map-Image Fusion After the local map is extracted, the image is fused with the corresponding local route map to generate a map-fusion image (MFI), where the road image contains the primary information, and the local route data provide heuristic navigation information of the autonomous vehicle, including location, specified lane and driving direction. As depicted in Figure 4, in the map-fusion image, the original bird’s view image is overlaid with line or curved segment representing the pre-defined route of vehicle specified in the corresponding local route map (Figure 4). The midpoint of the bottom border of the image represent the position of the vehicle at the moment the image is captured. Therefore, the origin of the line/curvature in the local map is aligned with the midpoint of the bottom border of the image. Experiment results indicate that the color and width of the line have slight effect on the detection results. Therefore, in our application, we use green-colored (RGB: 0, 255, 0) lines with a width of 2 pixels to represent the route in the MFI for the subsequent drivable road region detection process. However, for visualization purposes, broad red lines are used to represent the route in Figure 4. 4.3. FCN-VGG16 for Drivable Road Region Detection 4.3.1. Neural Network Model As described in previous subsections, CNNs demonstrate desirable performance in various detection tasks due to its high efficiency and storage-saving characteristics. The neural network model used in this research follows an extension of the CNN model, namely the MultiNet, which is an end-to-end model firstly proposed in [28]. The MultiNet is a multi-task framework and the semantic segmentation model FCN-VGG16 of MultiNet is adopted in the proposed framework for drivable road region detection. The FCN-VGG16 model consists of two major components, a CNN encoder and a segmentation decoder. As an encoder in our framework, a pre-trained VGG16 network which consists of 13 convolutional layers is used to extract features from the MFI images. For the decoder module, the extracted features are processed by a cascade of convolutional layers and up-sampling layers to generate the output. A set of 1 × 1 convolutional layers is responsible for generating a low resolution segmentation. After that, three transposed convolution layers performs up-sampling of the low-resolution segmentation. Finally, skip layers composed by 1 × 1 convolutional layers extract high-resolution features and output the image with detected drivable road regions. The architecture of the FCN-VGG16 model is depicted in Figure 5.

decoder module, the extracted features are processed by a cascade of convolutional layers and upsampling layers to generate the output. A set of 1 × 1 convolutional layers is responsible for generating a low resolution segmentation. After that, three transposed convolution layers performs up-sampling of the low-resolution segmentation. Finally, skip layers composed by 1 × 1 convolutional layers Sensors 2018, 18, 4158 of 15 extract high-resolution features and output the image with detected drivable road regions. 8The architecture of the FCN-VGG16 model is depicted in Figure 5. Input

Output

224╳448╳3

224╳448╳2

Encoded Feature 7╳14╳512

14╳28╳268 7╳14╳512 28╳56╳128 56╳112╳256 112╳224╳128

224╳448╳64

Encoder

Decoder

Conv.+ReLu

max pooling

FCN-8s

Sensors 2018, 18, x FOR PEER REVIEW

Figure5.5.Segmentation Segmentationmodel modelbased basedon onFCN-VGG16 FCN-VGG16[28]. [28]. Figure

8 of 14

4.3.2.Selection SelectionofofLoss LossFunction Function 4.3.2. TheCross-entropy Cross-entropy function is widely loss function neural networks as it can The function is widely used used as lossasfunction in neuralinnetworks as it can effectively effectively avoid vanishing the gradient vanishing problem. loss function in the training avoid the gradient problem. Therefore, lossTherefore, function adopted in the adopted training process of the process of the FCN-VGG16 model is a cross-entropy function which is defined as follows: FCN-VGG16 model is a cross-entropy function which is defined as follows: 1

𝐿𝑜𝑠𝑠(𝑃, 𝐺) =1 |𝐼| ∑𝑖∈𝐼(𝐺𝑖 𝑙𝑜𝑔𝑃𝑖 ) , (7) G logP , (7) Loss( P, G ) = ( ) i i | I | i∑ ∈ I of the image, 𝑖 represents a pixel in the MFI, where 𝐼 denotes the MFI inputs, |𝐼| represents the size 𝐺𝑖 is the ground truth of 𝑖, and 𝑃𝑖 is the predicted binary value of 𝑖. If 𝑖 falls in the predicted where I denotes the MFI inputs, | I | represents the size of the image, i represents a pixel in the MFI, drivable region, the value of 𝑃𝑖 is set to 1, otherwise 𝑃𝑖 equals to 0. 𝐺𝑖 is calculated in a Gi is the ground truth of i, and Pi is the predicted binary value of i. If i falls in the predicted drivable similar way. region, the value of Pi is set to 1, otherwise Pi equals to 0. Gi is calculated in a similar way. 5. Experimental study 5. Experimental Study 5.1.Experiment ExperimentPlatform Platform 5.1. Theproposed proposedmap-fusion map-fusionand andFCN-VGG16-based FCN-VGG16-baseddrivable drivableroad roadregion regiondetection detectionwas wastested tested The using real-world driving scenario videos captured by a camera (Figure 6a) mounted on an 18-meter using real-world driving scenario videos captured by a camera (Figure 6a) mounted on an 18-meter bus(Figure (Figure6b). 6b).Parameters Parametersofofthe thevehicle-mounted vehicle-mountedare arelisted listedininTable Table1.1.To Toobtain obtainaadesirable desirablefield field bus view,the thecamera camerawas wasinstalled installeddown downthe thecenter centerofofthe theupper upperborder borderofofthe thewindshield windshieldand andbelow below ofofview, the headliner of the bus. Testing drive was conducted on a 4.3 km testing track which consists the headliner of the bus. Testing drive was conducted on a 4.3 km testing track which consists ofof4 4 intersections,and and videos testing drive corresponding recorded tothe test intersections, videos of of thethe testing drive andand thethe corresponding GPSGPS datadata werewere recorded to test the drivable region detection algorithm. drivable road road region detection algorithm.

(a)

(b)

Figure Figure6.6.Experiment ExperimentPlatform: Platform:(a) (a)1818mmtesting testingbus, bus,(b) (b)vehicle-mounted vehicle-mountedcamera. camera.

Table 1. Parameters of the vehicle-mounted camera.

Parameters Model

Daheng MER-125-30-UC ®

Sensors 2018, 18, 4158

9 of 15

Table 1. Parameters of the vehicle-mounted camera. Parameters Model CCD sensor Resolution Frame rate Lens

Daheng MER-125-30-UC Sony® ICX445 1292 × 964 30 fps Basler 4 mm fixed focus length

To reduce the training time of the FCN-VGG16 model, we normalize the converted IPM images to a resolution of 224 × 448 (The original resolution is 1292 × 964, see Table 1). The experiments were conducted on a PC with one Intel® Core i5-3570 CPU and one NVIDIA® 1080Ti GPU. With a learning rate of 0.00001, it took 3.5 h to train the FCN-VGG16 network for 30000 times using the MFI images. 5.2. Evaluation Metric The Pixel Accuracy (PA) [15] is used as the metric for evaluating the proposed approach. Since drivable road region detection can be regarded as a single-class semantic segmentation task, the calculation of PA can be simplified as follows: PA =

ptp , p f p + ptp + p f n

(8)

where, ptp , p f p and p f n represent the number pixels of true positive, false positive and false negative cases detected by the algorithm, respectively. Therefore, high Pixel Accuracy value indicate that the algorithm achieves good performance in detecting drivable road regions. 5.3. Evaluation of Detection Accuracy A dataset of 270 frames is selected from the recorded video to evaluate the performance of the proposed approach. The dataset includes straight road segments, curved segments and intersections. For comparison purposes, the algorithm is tested using both map-fusion images (MFI) and raw images (without map-fusion). The PA of detection results are calculated for each frame in both cases. The route line added to the MFI can be considered as artificial lane-markings. Therefore, the choice of color and width of the route line may affect the drivable road region detection results. The color of the route line should be distinct from the color of the road region in the image, and the width of the line is relevant to the intensity of the high-level intervention. We test different choices of colors and widths of the lines in local maps to demonstrate the effect of MFI in drivable road region detection. The performance (PA) of different choice of line colors and widths for each frame are depicted in Figure 7 (The cases using raw images without map-fusion are marked as ‘no line’). As shown in Figure 6, the testing site contains two intersections and turning sections without lane markings, and they correspond with the first 100th frames, and the sections from the 160th to 203th frames of the dataset (the other sections correspond with regular straight road segments), respectively. As can be seen from Figure 7, the PA value of detection is quite low at intersections and turning segments (curve marked as ‘no line’) due to the absence of lane markings, and there is little information for the algorithm to determine the drivable road region. Therefore, the route line in the map-fusion image provides important heuristics for drivable road region detection. As for the route line, experiment results demonstrate mild difference using different choices of line colors and width, as shown in Figure 6. Experiment results also suggest using colors that are distinctive from the color of road surface, such as green and red. In our experiment, we chose to use the green color and the width of 2 pixels for the route line in the MFI (please note that the route lines are rendered in red in the subsequent figures just for clarity of visualization purposes).

lane markings, and they correspond with the first 100th frames, and the sections from the 160th to 203th frames of the dataset (the other sections correspond with regular straight road segments), respectively. As can be seen from Figure 7, the PA value of detection is quite low at intersections and turning segments (curve marked as ‘no line’) due to the absence of lane markings, and there is little information in ofthe Sensors 2018, 18,for 4158the algorithm to determine the drivable road region. Therefore, the route line 10 15 map-fusion image provides important heuristics for drivable road region detection.

Figure 7. 7. PA PA of of detection detection for for each each frame. Figure frame.

By for calculating average PA value of demonstrate all frames in mild the dataset, theusing average performance in As the routethe line, experiment results difference different choices of detecting accuracy of FCN-based detection algorithm using raw images and our map-fusion-based line colors and width, as shown in Figure 6. Experiment results also suggest using colors that are algorithm is evaluated listed in Table 2. asThe evaluation indicate that proposed distinctive from the colorand of road surface, such green and red. results In our experiment, wethe chose to use approach outperforms conventional FCN-based detection algorithm in terms of detection accuracy the green color and the width of 2 pixels for the route line in the MFI (please note that the route lines (PA). Examples results arefigures depicted Figure 8, which contains typical scenarios such as are rendered in of reddetection in the subsequent justinfor clarity of visualization purposes). intersections, roads and turning sections, and theindetected drivable regions areofmarked Sensors 2018,straight 18, x FOR PEER REVIEW 10 14 By calculating the average PA value of all frames the dataset, the road average performance in in green. In contrast to conventional FCN-based detection algorithm, the proposed approach can detecting accuracy of FCN-based detection algorithm using raw images and our map-fusion-based effectively detect road detections even even in intersections and turning where lane-where still effectively detectdrivable drivable road detections in intersections and segments turning segments algorithm is evaluated markings are absent. and listed in Table 2. The evaluation results indicate that the proposed lane-markings are absent. approach outperforms conventional FCN-based detection algorithm in terms of detection accuracy 2. Averagein detection (PA). Examples of detection resultsTable are depicted Figureaccuracy. 8, which contains typical scenarios such as Table 2. Average detection accuracy. intersections, straight roads and turningApproach sections, and the detected PA drivable road regions are marked in green. In contrast to conventional FCN-based detection algorithm, the proposed approach can still FCNApproach without MFI 0.811 PA MFI and FCN-VGG16

FCN without MFI MFI and FCN-VGG16

(a)

0.917

0.811 0.917

(b)

Figure 8. Comparison detectionresults. results. The thethe results of conventional FCN FCN Figure 8. Comparison of of detection The upper upperrow rowshows shows results of conventional without MFI, and the lower row is the detection results of the proposed approach. (a) Detection results without MFI, and the lower row is the detection results of the proposed approach. (a) Detection results of straight road segments; (b) Detection results of intersections and turning segments. of straight road segments; (b) Detection results of intersections and turning segments.

5.4. Evaluation of Robustness 5.4. Evaluation of Robustness The complex road scenarios of the testing site include absence of lane/road markings, The complex road scenarios of the testing site include absence of lane/road markings, illumination illumination variation, disturbances in road surface appearance and the presence of various road variation, disturbances in road surface appearance and the presence of various road users (other users (other vehicles, pedestrians, bicycles and other road users). In addition, errors introduced by vehicles, pedestrians, bicycles and other road users). Inmay addition, introduced by the IPM the IPM conversion and local route map extraction process also leaderrors to detection deviations.

conversion and local route map extraction process may also lead to detection deviations.

The robustness of our approach was evaluated using the above complex road environments. Examples of the testing results are shown in Figures 9−11. As described in Section 5.3, the proposed approach demonstrates desirable robustness in the absence of road/lane markings (Figure 8). Figure 9 depicts examples of typical illumination variation scenarios including heavy shadows and variations of illumination direction. As can be seen from the figure, the proposed approach achieves desirable robustness in such scenarios, while conventional border or feature-based approaches may fail.

5.4. Evaluation of Robustness The complex road scenarios of the testing site include absence of lane/road markings, illumination variation, disturbances in road surface appearance and the presence of various road users (other vehicles, pedestrians, bicycles and other road users). In addition, errors introduced by Sensors 2018, 18, 4158 11 of 15 the IPM conversion and local route map extraction process may also lead to detection deviations. The robustness of our approach was evaluated using the above complex road environments. The robustness of our approach wasinevaluated usingAs the above complex road Examples of the testing results are shown Figures 9−11. described in Section 5.3,environments. the proposed Examples of the testing results are shown in Figures 9–11. As described in Section 5.3, proposed approach demonstrates desirable robustness in the absence of road/lane markingsthe(Figure 8). approach demonstrates desirable robustness in the variation absence ofscenarios road/lane markingsheavy (Figure 8). Figure Figure 9 depicts examples of typical illumination including shadows and9 depicts examples of typicaldirection. illumination variation shadows and variations variations of illumination As can be seenscenarios from theincluding figure, theheavy proposed approach achieves of illumination direction. As can be seen from the figure, the proposed approach achieves desirable desirable robustness in such scenarios, while conventional border or feature-based approaches robustness may fail. in such scenarios, while conventional border or feature-based approaches may fail.

Figure Sensors Figure 9. 9. Robustness Robustness against against illumination illumination variation variation.. Sensors 2018, 2018, 18, 18, xx FOR FOR PEER PEER REVIEW REVIEW

11 11 of of 14 14

Figure 10 10demonstrates demonstrates examples examples of ofincorrect incorrect local local route route maps maps(Figure (Figure 10a) 10a)and andIPM IPMconversions conversions Figure Figure 10 demonstrates examples of incorrect local route maps (Figure 10a) and IPM conversions (Figure 10b) due to errors. As can be seen from the figures, the drivable road region can still be detected (Figure 10b) due to errors. As can be seen from the figures, the drivable road region can still (Figure 10b) due to errors. As can be seen from the figures, the drivable road region can still be be by our approach even when the route lines are placed in totally incorrect positions of the image. These detected by our approach even when the route lines are placed in totally incorrect positions the detected by our approach even when the route lines are placed in totally incorrect positions of of the results These indicate that the combination map-fusionof FCN-VGG16 can improve the of image. results indicate that combination map-fusion and can improve image. These results indicate that the the of combination ofand map-fusion and FCN-VGG16 FCN-VGG16 canrobustness improve the the drivable road detection in complex road environments. robustness of drivable road detection in complex road environments. robustness of drivable road detection in complex road environments.

(a) (a)

(b) (b)

Figure Figure 10. 10. Robustness Robustness against conversion and fusion errors: (a) Detection results in the presence of Robustnessagainst againstconversion conversionand andfusion fusionerrors: errors:(a) (a)Detection Detectionresults resultsin inthe the presence presence of of local map extracting deviation; (b) Detection results in the presence of IPM errors. local map extracting deviation; (b) Detection results in the presence of IPM errors. errors.

Figure Figure 11 detection results results of approach in presence of undesirable Figure 11 shows shows the the detection detection results of the the proposed proposed approach approach in the the presence presence of undesirable undesirable road our approach demonstrates robustness against large pavement road surface surface appearance. appearance. Still, Still, our approach demonstrates robustness against large pavement cracks Still, our approach demonstrates robustness against large pavement cracks cracks (Figure 11a) and inconsistent pavement colors (Figure 11b). (Figure 11a) and inconsistent pavement colors (Figure 11b).

(a) (a)

(b) (b)

Figure11. 11.(a) (a)Detection Detectionresults resultsin inthe thepresence presenceof oflarge largecracks; cracks;(b) (b)Detection Detectionresults resultsin inthe thepresence presenceof of Figure Figure 11. (a) Detection results in the presence of large cracks; (b) Detection results in the presence of inconsistent pavement colors. inconsistent pavement colors. inconsistent pavement colors.

Figures Figures 99 and and 10 10 present present the the performance performance of of drivable drivable road road region region detection detection in in the the presence presence of of undesirable factors induced by illumination and road appearance. In most real-world driving undesirable factors induced by illumination and road appearance. In most real-world driving scenarios, scenarios, another another aspect aspect of of the the complexity complexity of of road road environments environments is is that that the the road road region region is is usually usually shared by various road users such as other vehicles, pedestrians, and motorcycles, etc. We tested shared by various road users such as other vehicles, pedestrians, and motorcycles, etc. We tested the the propose drivable road region detection approach in various scenarios with other road users, as shown propose drivable road region detection approach in various scenarios with other road users, as shown in in Figures Figures 12–14 12–14 (for (for visualization visualization purposes, purposes, these these figures figures are are shown shown in in the the front-view). front-view). Figure Figure 12 12 presents the scenario with single, continuously moving pedestrian. The pedestrian walked across the

(b)

(a)

SensorsFigure 2018, 18, 4158 12 of 15 11. (a) Detection results in the presence of large cracks; (b) Detection results in the presence of

inconsistent pavement colors.

Figures 10 present present the the performance performanceof ofdrivable drivableroad roadregion regiondetection detectionininthe the presence Figures 99 and and 10 presence of of undesirable factors induced by illumination and road appearance. In most real-world undesirable factors induced by illumination and road appearance. In most real-world driving driving scenarios, scenarios, another another aspect aspect of of the the complexity complexity of of road road environments environments is is that that the the road road region region is is usually usually shared by various road users such as other vehicles, pedestrians, and motorcycles, etc. We tested shared by various road users such as other vehicles, pedestrians, and motorcycles, etc. We testedthe the propose proposedrivable drivableroad roadregion regiondetection detectionapproach approachin invarious variousscenarios scenarioswith withother otherroad roadusers, users,as asshown shown in in Figures Figures 12–14 12–14 (for (forvisualization visualization purposes, purposes, these these figures figures are are shown shown in inthe thefront-view). front-view). Figure Figure 12 12 presents the scenario with single, continuously moving pedestrian. The pedestrian walked across presents the scenario with single, continuously moving pedestrian. The pedestrian walked acrossthe the lane lanethe the ego ego vehicle vehicle was was driving driving in in and and away away from from the the testing testing vehicle. vehicle. As As can can be be seen seen from from Figure Figure 12, 12, the proposed approach can correctly and continuously extract drivable road regions as the distance the proposed approach can correctly and continuously extract drivable road regions as the distance between betweenthe thepedestrian pedestrianand andthe theego egovehicle vehicleincreased. increased.

Figure 12. 12. Detection Detection results results in in the the presence presenceof ofsingle, single,continuously-moving continuously-movingpedestrian. pedestrian. Figure

Figure13 13shows showsdetection detection results various scenarios with different types of road Figure results of of various scenarios with different types of road usersusers other other than than pedestrians (from left toother right:vehicle, other vehicle, motorcycle and As bicycle). canfrom be seen from pedestrians (from left to right: motorcycle and bicycle). can beAs seen Figure 13, Sensors 2018, 18, x FOR PEER REVIEW 12 of 14 Figure the are road aretypes of different types and sizes, with different distances and orientation the road13, users of users different and sizes, with different distances and orientation relative to the ego vehicle. thevehicle. FCN-VGG16 was trained image datasets containing various road relative to theAsego As the network FCN-VGG16 networkusing was trained using image datasets containing users in different scenarios, the proposed approach can effectively identify drivable road regions when various road users in different scenarios, the proposed approach can effectively identify drivable different types of road users share the lane the ego vehicle was driving in. road regions when different types of road users share the lane the ego vehicle was driving in.

Figure Figure 13. 13. Detection Detection results results in the presence of various types of other road users.

Testing results thethe presence of multiple and various road uses areuses shown Figure 14. Figure 14a Testing resultsinin presence of multiple and various road areinshown in Figure 14. shows the with a pedestrian and a motorcycle in the lane of Thevehicle. pedestrian Figure 14a scenario shows the scenario with a pedestrian and a motorcycle in the ego lanevehicle. of the ego The walked across the lane and toward thetoward ego vehicle, while the motorcycle drove in the same pedestrian walked across the lane and the ego vehicle, while the motorcycle drove in direction the same as the egoasvehicle did with changing distances.distances. Similar toSimilar the casetoshown in shown Figure 12, the proposed direction the ego vehicle did with changing the case in Figure 12, the approach can continuously identify drivable regionroad as the two road in the view in of proposed approach can continuously identifyroad drivable region as theusers two moved road users moved camera. Figure 14b include more complex, multiple-road users drivingusers scenarios thatscenarios are common the view of the camera. Figure 14b include more complex, multiple-road driving that in real-world road environments. In such scenarios, thescenarios, lane can be by multiple users of are common in real-world road environments. In such theshared lane can be sharedroad by multiple different types, and they may and drivethey in different directions. In addition, theInopposite lane usually road users of different types, may drive in different directions. addition, theisopposite occupied by other road users at the same time. As same can be seenAs from our approach can lane is usually occupied by other road users at the time. canFigure be seen14b, from Figure 14b, our effectivelycan dealeffectively with such deal scenarios testing results the indicate proposedthat approach has the approach with and suchthe scenarios and theindicate testingthat results the proposed potential be the applied to autonomous vehicle systems operating in systems real-world driving scenarios. approachtohas potential to be applied to autonomous vehicle operating in real-world driving scenarios.

(a)

are common in real-world road environments. In such scenarios, the lane can be shared by multiple road users of different types, and they may drive in different directions. In addition, the opposite lane is usually occupied by other road users at the same time. As can be seen from Figure 14b, our approach can effectively deal with such scenarios and the testing results indicate that the proposed Sensors 2018, 18, 4158 13 of 15 approach has the potential to be applied to autonomous vehicle systems operating in real-world driving scenarios.

(a)

(b) Figure Figure14. 14.Detection Detectionresults resultsin inthe thepresence presenceof: of:(a) (a)Multiple Multipleroad roadusers; users;(b) (b)various variousroad roadusers. users.

6.6.Conclusions Conclusions This paper proposes proposesa arobust robust drivable region detection approach on a monocular This paper drivable region detection approach basedbased on a monocular vehiclevehicle-mounted camera for autonomous vehicles. The proposed approach takes advantage of the mounted camera for autonomous vehicles. The proposed approach takes advantage of the heuristics heuristics from the route map and vehicle location data and fuses the local route map to generate from the route map and vehicle location data and fuses the local route map to generate map-fusion map-fusion images to improve the robustness of The detection. The map-fusion imagessignificant provide significant images to improve the robustness of detection. map-fusion images provide heuristics heuristics (e.g., heading, location, specified driving direction and lane of the vehicle) for drivable road (e.g., heading, location, specified driving direction and lane of the vehicle) for drivable road region region detection in complex road environments such as turning segments and intersections in the detection in complex road environments such as turning segments and intersections in the absence absence of road/lane markings. In order to extract drivable road regions, the map-fusion images of road/lane markings. In order to extract drivable road regions, the map-fusion images are are segmented using an FCN-based neural network which is trained also using a dataset consist of selected segmented using an FCN-based neural network which is trained also using a dataset consist of map-fusion images. The proposed validatedisusing real world captured by acaptured camera selected map-fusion images. The approach proposedisapproach validated usingvideos real world videos mounted on mounted a testing bus. results showresults that the proposed by a camera on a Experiment testing bus. Experiment show that theapproach proposedachieves approachdesirable achieves detection accuracy using an industrial camera, and it demonstrates robustness in complex driving desirable detection accuracy using an industrial camera, and it demonstrates robustness in complex scenarios involving absent road/lane markings, undesirable illumination conditions and pavement appearance, as well as various road users. Evaluation results also indicate that the proposed approach outperforms conventional FCN-based algorithm using raw images in terms of detection accuracy. The future work will focus on the incorporation of other vehicle location data feeds such as LiDAR point-cloud matching. More comprehensive investigation is also required to further improve robustness of the proposed drivable road region detection approach. The implementation and integration of the drivable-road detection approach to an autonomous driving system will be investigated. In addition, more testing will be conducted to further evaluate the performance of the proposed approach in more real-world driving scenarios. Author Contributions: X.Z. proposed the methodology and designed the overall framework, Y.C. and D.L. accomplished the detailed design of the proposed framework and wrote the manuscript; Y.C. developed and implemented the algorithm; D.L. and X.Z. were in charge of the experiment design and the validation of the proposed approach; X.M. supervised the research. Funding: This research was supported in part by National Natural Science Foundation of China (61701357) and China Scholarship Council under Grants 201606955118. Acknowledgments: The authors would like to thank Kejun Ou for his assistance and insightful suggestions. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2.

Yenikaya, S.; Yenikaya, G.; Düven, E. Keeping the vehicle on the road: A survey on on-road lane detection systems. ACM Comput. Surv. 2013. [CrossRef] Liu, X.; Deng, Z.; Yang, G. Drivable road detection based on dilated FPN with feature aggregation with feature aggregation. In Proceedings of the IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), Boston, MA, USA, 6–8 November 2017; pp. 1128–1134.

Sensors 2018, 18, 4158

3. 4. 5. 6.

7. 8. 9. 10.

11. 12. 13.

14.

15. 16. 17.

18.

19.

20. 21. 22.

23.

14 of 15

Gu, S.; Lu, T.; Zhang, Y.; Alvarez, J.M.; Yang, J.; Kong, H. 3D LiDAR + monocular camera: An inverse-depth induced fusion framework for urban road detection. IEEE Trans. Intell. Veh. 2018, 3, 351–360. [CrossRef] Shin, B.-S.; Tao, J.; Klette, R. A Superparticle Filter for Lane Detection. Pattern Recognit. 2015, 48, 3333–3345. [CrossRef] Wang, Y.; Dahnoun, N.; Achim, A. A novel system for robust lane detection and tracking. Signal Process. 2011, 92, 319–334. [CrossRef] Borkar, A.; Hayes, M.; Smith, M.T. Polar randomized Hough transform for lane detection using loose constraints of parallel lines. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Repulic, 22–27 May 2011; pp. 1037–1040. Yi, S.-C.; Chen, Y.-C.; Chang, C.-H. A lane detection approach based on intelligent vision. Comput. Electr. Eng. 2015, 42, 23–29. [CrossRef] Lim, K.H.; Seng, K.P.; Ang, L.-M. River flow lane detection and Kalman filtering-based B-spline lane tracking. Int. J. Veh. Technol. 2012. [CrossRef] John, N.; Anusha, B.; Kutty, K. A reliable method for detecting road regions from a single image based on color distribution and vanishing point location. Procedia Comput. Sci. 2015, 58, 2–9. [CrossRef] Zhou, S.; Jiang, Y.; Xi, J.; Gong, J.; Xiong, G.; Chen, H. A novel lane detection based on geometrical model and Gabor filter. In Proceedings of the IEEE Intelligent Vehicles Symposium, San Diego, CA, USA, 21–24 June 2010; pp. 59–64. Kong, H.; Audibert, J.-Y.; Ponce, J. General road detection from a single image. IEEE Trans. Image Process. 2010, 19, 2211–2220. [CrossRef] [PubMed] Shang, E.; An, X.; Li, J.; Ye, L.; He, H. Robust unstructured road detection: The importance of contextual information. Int. J. Adv. Rob. Syst. 2013, 10, 1–7. [CrossRef] Li, Z.-Q.; Ma, H.-M.; Liu, Z.-Y. Road lane detection with Gabor filters. In Proceedings of the International Conference on Information System and Artificial Intelligence (ISAI), Hong Kong, China, 24–26 June 2016; pp. 436–440. Janda, F.; Pangerl, S.; Schindler, A. A road edge detection approach for marked and unmarked lanes based on video and Radar. In Proceedings of the 16th International Conference on Information Fusion, Istanbul, Turkey, 9–12 July 2013; pp. 871–876. Garcia-Garcia, A.; Orts-Escolano, S.; Oprea, S.; Villena-Martinez, V.; Garcia-Rodriguez, J. A Review on Deep Learning Techniques Applied to Semantic Segmentation. arXiv 2016, arXiv:1704.06857. Zhang, W.-Z.; Wang, Z.-C. Rural road detection of color image in complicated environment. Int. J. Signal Process. Image Process. Pattern Recognit. 2013, 6, 161–168. [CrossRef] Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. Oberweger, M.; Wohlhart, P.; Lepetit, V. Hands deep in deep learning for hand pose estimation. In Proceedings of the 20th Computer Vision Winter Workshop (CVWW), Seggau, Austria, 9–11 February 2015; pp. 1–10. Yu, F.; Koltun, V.; Funkhouser, T. Dilated residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 636–644. Zhao, H.; Qi, X.; Shen, X.; Shi, J.; Jia, J. ICNet for real-time semantic segmentation on high-resolution images. arXiv 2017, arXiv:1704.08545. Dvornik, N.; Shmelkov, K.; Mairal, J.; Schmid, C. BlitzNet: A Real-Time Deep Network for Scene Understanding. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2380–7504. Lee, S.; Kim, J.; Yoon, J.S.; Shin, S.; Bailo, O.; Kim, N.; Lee, T.-H.; Hong, H.S.; Han, S.-H.; Kweon, I.S. VPGNet: vanishing point guided network for lane and road marking detection and recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1965–1973.

Sensors 2018, 18, 4158

24.

25. 26.

27. 28. 29.

30.

31. 32.

15 of 15

Brust, C.-A.; Sickert, S.; Simon, M.; Rodner, E.; Joachim, D. Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding. In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (VISAPP), Berlin, Germany, 11–14 March 2015. Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [CrossRef] [PubMed] Pham, V.-Q.; Ito, S.; Kozakaya, T. BiSeg: Simultaneous Instance Segmentation and Semantic Segmentation with Fully Convolutional Networks. In Proceedings of the British Machine Vision Conference (BMVC), London, UK, 4–7 September 2017. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. Teichmann, M.; Weber, M.; Zoellner, M.; Cipolla, R.; Urtasun, R. MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving. arXiv 2016, arXiv:1612.07695. Kowsari, T.; Beauchemin, S.S.; Bauer, M.A. Map-based lane and obstacle-free area detection. In Proceedings of the International Conference on Computer Vision Theory and Applications, Lisbon, Portugal, 5–8 January 2014; pp. 523–530. Tao, Z.; Bonnifait, P.; Frémont, V.; Ibañez-Guzman, J. Mapping and localization using GPS, lane markings and proprioceptive sensors. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 406–412. Li, C.; Wang, X.; Liu, W. Neural features for pedestrian detection. Neurocomputing 2017, 238, 420–432. [CrossRef] Zhu, Z.; Liang, D.; Zhang, S.; Huang, X.; Li, B.; Hu, S. Traffic-Sign Detection and Classification in the Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2010–2118. © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Suggest Documents