Skip to content

Detecting Obstacles Part 2: Homogenizing sensor inputs

Shae Bolt edited this page Jul 31, 2016 · 3 revisions

So lets assume we've gotten the white lines out of an image to a sufficient degree. Detecting surface features like lines is only one part of the process of detecting obsticals, we still haven't dealt with cones, sawhorses or other objects. On our robot, we have something called LIDAR. Lidar is basically just "light radar", on our system it basically shoots a beam of light and measures the time it took and spectrum reflected back to get the intensity and distance of any objects it hits. This is great for detecting if anything is in-front of us and where relative to where we are at, but it doesn't help with finding features like lines or flat color features. We only care about detecting objects in our robot so we can avoid them later but we'll need some way to combine all inputs into one single form of object detection (because later we are going to feed this into another map making module).

To do this, we have to transform the data we've gotten from each of our sources into the same coordinate plane. This is typically done with transforms such as Homography Transform or affine transform. for our system we will be using homography transforms.

What a homography transform does is basically create a matrix that maps between two sets of coordinates (the linked example shows an example of applying this technique really well). You "project" the first set of coordinates onto the second. OpenCV has a findHomography() function that allows you to retrieve the homography matrix between two sets of mappings. You then take this matrix and multiply it by every coordinate vector [x, y, 1] you want projected onto the new set of cooridnates. This will project your image onto the new set of coordinates.

On our robot we use this to project the camera pixel coordinates onto the the same cooridnates our lidar uses when detecting distance. In our case its a mapping from pixels to meters. We find out where the corners of the image touch on the ground in meters then project our line image there (assuming camera pointing directly on the ground, other wise you have to pick a set of points based on the horizon and the bottom of the image). Because we are looking for objects to avoid, it doesn't matter if the lines appear at ground level or in the sky for us, we just need to be able to avoid them.

Our lidar information is not in a square mapping, its made of a distance and an angle from the source of the lidar. We need to then translate our camera coordinate pixels to data that corresponds with this. If we know where our lidar is in relation to the points in our new project coordinate system, we can find the distance. We can then calculate the distance for every point using the "two norm" or the classic distance formula. Once we've done this with every camera image we wish to translate in terms of lidar points, will then have to compress this data into a one dimensional array with fidelity that corresponds to the lidar fidelity (angle fidelity, max distance etc..). We can do this using angle binning (putting each point in a bin of the rounded angle it corresponds to) of our new points to make sure that each piece of data actually corresponds to what the lidar would actually see had there been a real object there. We would then take the closest point and use that in our lidars final point array.

Once we've fully translated all of our data into a single array we can input it into our gmapping service, which takes point data and translates it into a map. This will be discussed in Creating a History of the Environment