Defocussing Estimation for Obstacle Detection on Single Camera ...

7 downloads 120071 Views 7MB Size Report
developed on an Android smartphone. The warning messages .... for the NDK of an Android phone for the development of the single camera evolution of.
Defocussing Estimation for Obstacle Detection on Single Camera Smartphone Assisted Navigation for Vision Impaired People

Alessandro Grassi

Cataldo Guaragnella

MS student in Telecommunications Engineering, DEI: Electrics and Information Engineering Dept. Politecnico di Bari, PoliBa Bari, Italy [email protected]

DEI: Electrics and Information Engineering Dept. Politecnico di Bari, PoliBa Bari, Italy [email protected]

Abstract — An efficient application has been developed to detect close obstacles using a single smartphone camera and give speech information to vision impaired people. The proposed algorithm uses a simple technique aimed at measuring the local estimated image defocussing to roughly estimate its closeness to the camera. Preliminary results of the developing system are presented. Keywords — Smartphone; Obstacle detection and avoidance; Assisted navigation; Visual impaired

I.

INTRODUCTION

Many techniques exist to detect the distance of objects in photos taken from digital cameras. Most of them deal with stereo cameras disparity map computation, i.e. they require the same scene to be captured by two different cameras at a known distance from each other [1,2]; only few techniques need a single camera, but generally the scene has to be captured several times with different focus settings, to approach algorithms like depth-from-focus and depth-from-defocus [3,4,5,6]. In depth-from-focus an object is selected and the image is captured many times until the object is focused; assuming that the camera parameters are known, it is possible to estimate the distance of the object basing on the equations of optical lenses. In depth-from-defocus the comparison of multiple images is used to calculate the distance of every object in the scene. There is also a recent attempt to use a single camera to realize stereo vision [7]. In this paper we propose a different approach, aiming not in measuring precise distances of objects, but to highlight the closest objects in the scene and use this information in a simple navigation system for vision impaired people, developed on an Android smartphone. The warning messages are raised when the closest object in the scene is below a given threshold. The vision impaired using the application is warned by a voice message about the presence of a possible obstacle.

978-1-4799-3020-3/14/$31.00 ©2014 IEEE

Figure 1 This work, developed as a practical experience at Politecnico di Bari, during the course on Statistical Signal processing, follows a previous MS thesis developing an obstacle avoidance system on a smartphone equipped by a stereo camera (FiatLux by Giovanni Sblano: http://www.youtube.com/watch?v=ICS5uYCWXdw). The project is a part of a wider research/implementation student project pictorially described in Figure 1. In this paper, a simple but less accurate algorithm is proposed, requiring a single camera. The single-picture requirement, together with single camera shot and low computational requirements should allow a continuous use without excessive battery draining. In the next paragraphs the theoretical method and its implementation are presented, followed by some preliminary results. A discussion, future developments and conclusions close the paper.

309

II.

THE PROPOSED METHOD

The result is the normalized contrast for that point, which is closely related to how much the edge is focused.

The proposed method is based on fixing the range of focus of the smartphone camera to infinite, so that the closer the object in the scene, the higher its defocussing. Defocussing appears on images as a blurring effect. Blurring is a low pass effect acting on sharp luminance transitions. A sharp transition in the image is smoothed, so that by simply computing the image gradient, the smoothing effect can be measured in every image region as the wideness of the region in which the image changes its values. In one dimension, as the gradient simplifies into a derivative, if implemented as a numerical algorithm, for a given steep of the transition it can be approximated as the height of the luminance transition around the edge divided by its extension in the x domain, that is:

G (x )=

I(x) 1 B A C 0 x w(A)

w(B)

Figure 3

dI ( x ) dx

In this Figure 3, A should be normalized by 1, B and C should be normalized by 0.5. This results in the same value of normalized contrast for all the three points.

where I(x) is the luminance signal and G(x) represents the gradient function. The background idea is to measure the extension on the x direction of the luminance variation. If a finite difference approach is used, Δx, extent of the transition region, can be computed as: |Δx| = |ΔΙ(x)| / |G(x)|.

III.

THE PROPOSED ALGORITHM

Normalized contrast, as defined in the previous chapter, can be used on real images from digital cameras to detect close objects. This requires some knowledge on how the image is taken. In this work, the focus point is set to an infinite distance, so that close objects get blurred. The algorithm is tuned to detect all edges and then delete them, except those that are blurred more than a desired value. The block diagram of the algorithm is shown in Figure 4. The source image is assumed to be in gray scale as no color information is used in the algorithm.

As an example, a 1-dimensional gray scale image is represented in Figure 2. The intensity of the image, ranging from 0 to 1, is plotted in the graph. It contains a black-to-white transition and a wider white-to-black transition, respectively identified with the edge points A and B. The goal is, for each edge, to measure its focusing with respect to the other. In this example, a contrast measure will detect that point A has a higher contrast value and so it is more focused than point B.

Source image

However, this will not work with most other intensity profiles, like the one shown in Figure 3. The three edge points A, B and C are equally focused, but B and C are respectively white-to-gray and gray-to-black transitions. A contrast measure gives them half the contrast of A because the difference between the maximum and the minimum value within the transition is lower.

1. Edge extraction

3. Contrast

4. Masking

I(x) 1

5. Normalization A

B

0

6. Threshold

x w(A)

w(C)

w(B)

Figure 2 To compensate for this, the contrast value in each edge point has to be normalized by the difference between the maximum and the minimum values in the surrounding region, that is about by the absolute gradient value of the transition.

Final image

Figure 4

310

2. Max - min

To simplify the algorithm description several steps have been described hereafter. Anyway they should not be considered as the true algorithm implementation as, in this form, the developed software is expensive both in memory resources and computation complexity. Its efficient version is being developed in C language for the NDK of an Android phone for the development of the single camera evolution of the FiatLux project. Step 1: edges are extracted using the Canny algorithm. These are saved to be used later. Step 2: morphological filtering: a new image is generated in which each pixel is replaced with the pixel which has the highest intensity in the surrounding region (dilate). Another image is generated with the same logic, but each pixel is replaced with the lowest surrounding pixel (erode). The second image is subtracted to the first and the result is saved to be used later. Step 3: a contrast measure is applied to the source image. Specifically, a laplacian filter is used, then the the absolute value of the result is computed, so that the intensity of each pixel represents the energy of the image signal in that point. Then a blur filter is applied, for two different reasons: 1. the laplacian of the image computed in edge points will be zero or almost zero, because they are likely to be flex points; surrounding pixels will have an intensity which is proportional to the contrast of the edge, so this information needs to transferred to the edge point itself; a low-range blur is enough to solve this; 2. ideally, all the edges at the same distance have the same normalized contrast, so the entire objects can easily be separated by applying a threshold; in real images a quite large variance is found and the use of a threshold doesn't produce a perfect separation; blurring the images at this point helps with this issue, but a larger threshold has to be applied than in the previous point; To enhance the image contrast, an exponential distortion is applied: the intensity values are also elevated to the power of two, to improve the separation between high-focus and low-focus edges. Step 4: all image pixels from the previous step are set to black, except edge points. The edge image produced in step 1. is used as a mask for this operation. This produces an image in which the only non-zero pixels are edge points, and their intensity quantifies their contrast. Step 5: the intensity of each pixel of the image produced in step 4 is divided by the intensity of the corresponding pixel from the output image of step 2. This results in the normalized contrast of the image calculated on edge points. Higher values indicate an edge which is better focused, and low values indicate a more blurred edge. All other pixels remain black. Step 6: a threshold is applied and all the pixels above the threshold value are converted to black. This removes the edges which are more focused and leaves only low-focus edges, which belong to closer objects. This is the final images, which

reveals the presence and position of close objects. The value used as threshold determines the distance at which object are cut off from the image. IV.

PRELIMINARY RESULTS

The proposed algorithm has been applied to some test images and the result is presented in Figure 5, Figure 6 and Figure 7. They were taken with the focus point set to infinity, as assumed for processing. The processed images should have white details on a black background, but in this paper they have been inverted for better readability.

Figure 5

Figure 6

Figure 7 The proposed algorithm successfully detects and highlights close objects up to a limited distance, respectively a bench in Figure 5 and a car in Figure 6 and Figure 7. Some random details are detected in the remaining part of the picture, but they do not reveal almost any part of the structure from the original image. For the sake of developing an easy video to speech interface to easily describe the closeness of obstacles, the image is split in nine regions and a reasoning system is used to indicate the image regions interested by the presence of obstacles, depending on the number of present contours in each subregion, as shown in Figure 8, Figure 9 and Figure 10; the nine regions are defined as active or inactive and different audio messages are proposed to the video impaired user for the obstacle avoidance.

311

V.

USING THE TEMPLATE

between 640 and 1024 pixels wide; in very large images most edges are defocussed and no useful information can be extracted, but resizing them to the suggested range solves the problem; in very small images very few edges are detected in the first place; the ideal case would be to capture the images with a resolution within the suggested range;

Here are some issues faced during the development of the algorithm: • the effectiveness of the algorithm greatly varies with the characteristics of the digital camera; better results are obtained when close objects become very blurred; unfortunately most mobile phone cameras have a short focal length and a large depth of field, so the defocussing effect is small and/or it only appears on very close objects;





some parameters still need some manual tuning to give similar results with different images; the most important is the final thresholding value, which is quite sensitive to variations around the optimal value: a lower value would rapidly hide many interesting parts of the image, and an higher value would show many unwanted details; shadows are detected as edges, but they do not constitute real edges and should be excluded; this can be achieved using the color information of the image: a shadow produces a variation in intensity but not in color, so it can be detected and excluded from computation. VI.

The proposed algorithm seems to prove the ability to separate close objects from far objects, as required. Even if the effectiveness is highly dependent on the camera available on any mobile phone, used to capture the image, an easy calibration procedure will be implemented to setup processing parameters in the first use of the application. For mobile phone images with large depth of field, some preprocessing may also be needed to enhance the defocussing effects and allow high accuracy in obstacle detection.

Figure 8

REFERENCES

Figure 9

[1] [2] [3] [4] [5]

[6]

Figure 10 •

CONCLUSIONS AND FUTURE WORK

the effectiveness of the algorithm varies with image resolution; best results have been obtained with images

[7]

312

MROVLJE, Jernej, VRANI, Damir. Distance measuring based on stereoscopic pictures, Institut Jožef Stefan, 2008. J. Carnicelli, Stereo vision: measuring object distance using pixel offset, [2] http://www.alexandria.nu/ai/blog/entry.asp?E=32. Yalin Xiong, Steven Shafer, Depth from Focusing and Defocusing, Robotics Institute, Carnegie Mellon University, 1993. V. Michael Bove, Entropy-based depth from focus, Massachusetts Institute of Technology, 1993. J. Ens, P. Lawrence, An Investigation of Methods for Determining Depth from Focus, IEEE Transactions on Pattnern Analysis and Machine Intelligence, vol. 15, no. 2, 1993. Murali Subbarao. Jenn-Kwei Tyan, Selecting the Optimal Focus Measure for Autofocusing and Depth-From-Focus, IEEE Transactions on Pattnern Analysis and Machine Intelligence, vol. 20, no. 8, August 1998. C. Holzmann, M. Hochgatterer, Measuring Distance with Mobile Phones Using Single-Camera Stereo Vision, 32nd International Conference on Distributed Computing Systems Workshops, 2012.

Suggest Documents