So I’m in an embedded systems class. We have a bunch of labs, but as an alternative to the labs we can do a semester project, programming a Freescale car. This is lots of fun.
Here I will document the camera algorithm I developed, and the process I took to get there.
The Freescale Cup Car is a small autonomous car about a foot long which is supposed to follow a racetrack. It follows this racetrack using a 128 pixel analog line camera. That is, it’s a camera that only sees a single line. On our car (I’m on a team), the camera is mounted 12″ above the ground at the front bumper of the car, angled so that it sees about 2 feet ahead.
This is all fine and dandy. Except that the camera’s vision dims at the edges:
This graph shows a calibration frame, which is the camera looking at a blank white piece of paper, alongside a frame showing two lines with regulation spacing.
The conventional wisdom for parsing these camera frames is to take the derivative and search for peaks and zero crossings. Let’s do that:
Okay. Not too shabby. There’s clearly something going on. As a human we can see a line on the left and a line on the right.
How to get a computer to detect the lines? A dark line happens when the derivative goes down, and then goes up. So maybe we could use a threshold for the start of the dark line and the end of the dark line.
It’s clear that this won’t work very well in this case. A negative threshold for darkening won’t be able to catch the left-hand line without catching a substantial portion of the rest of the frame. A positive threshold for the frame getting lighter (lightening?) won’t be able to catch the right-hand line without catching a substantial portion of the rest of the frame.
A second derivative won’t work, either, because the noise spike in the middle is about the same height as the spike on the left. A little bit of smoothing might help, but too much and we smooth away the lines on the left or right.
An astute eye would notice a general slant from the upper left to the lower right in the first derivative. If we found a best-fit line and subtracted that from this sample, then we could more easily identify the lines.
There’s two ways to find the amount of this slant: Manual tuning and least-squares of some sort.
Manual tuning takes valuable developer time to get it right, and it will vary from track to track and arena to arena. The lighting in my house is different than the lighting outside which is different than the lighting in a gym, so we have to re-calibrate the car at every event.
Least-squares takes valuable computation time, but we get proportional gain (and later we’ll do parabolic least-squares, and get even more gain). If you constrain the best-fit line to pass through zero halfway through the frame, then all you care about is the slope, and it becomes an averaging problem. Averaging takes N integer adds and an integer division, but it turns out that this integer division is by N, which is a power of 2 – this is a very fast operation.
Both of these, however, will either take up N space to store the value of the line at each pixel, or will require an additional N integer multiplies to remove the trend from the data.
But we still have one more problem, as the title of this post suggests. This method is highly dependent on the focus of the camera. If the camera is out of focus, then the derivatives will be smaller, and the thresholds more subject to missing the line.
Notice how the first derivative has a straight trendline. Therefore, the second derivative (the slope of the slope) will be constant. We all know what that means – it’s a parabola.
Enter parabolic least-squared fits to the camera frame.
That sounds expensive, and it is, but it’s less susceptible to camera focus issues.
If you are able to calibrate the camera on a white background, then you can either remember that frame and subtract it from incoming frames or you can find the best fit parabola and subtract that from incoming frames. This is the results of that:
Two clearly identifiable peaks, both orders of magnitude above the noise floor.
This method still requires on-site calibration, though, which we want to avoid as much as possible. But we have a data source that looks a lot like the calibration data – the frame itself. If we fit a parabola to the frame, and then subtract that from the frame, we get something like:
A strangely parabolic noise in the center, caused by the lines on the edges throwing off the least-squares. This data smooths well, and then a peak-finding algorithm can be run to find the lines reliably.
The astute reader will notice that generalized linear least-squares is not a cheap algorithm. In fact, it requires 3*128 integer multiplies plus 3*128 integer adds plus a half-dozen floating-point multiplies and divides. Then to evaluate the parabola at every point, we have 128 * 4 multiplies and a similar number of additions.
This takes a while – we read a camera frame every 20ms, and this process takes up about 6-7ms or so. But we have the time to spare, and this is the most crucial part of the project, and now the camera doesn’t have to be re-calibrated under different light sources.
For reference, we’re using the KL25Z chipset:
Stay tuned for more car-related algorithms!