Debugging SIFT / RANSAC-based alignment

Another day, another (set of) bug(s). I'm taking a bunch of different RGB-D images, blowing them out into 3D space (instead of using the flat image), and stitching them together. Sort of like a 3D panorama.

The stitching pipeline involves a couple of steps, but I'm not happy with the first step. Here, we take a stream of images taken at time 1, 2, ..., t, and find the relative transformation between frames 1 and 2, 2 and 3, and so on, and tack each successive frame onto the world we're building.

There are a bunch of optimizations later on, so this first image is a decent starting place (yes, mom, I know it looks bad).

This point cloud (you can tell because there are places that look like lots of dots) was produced from a MATLAB implementation, which typically means slow. We have a much faster version that's been CUDAfied (something I'll probably talk about in the future), but not getting great results.

In this image, you can see the ceiling is really all over the place. Not good! I'm writing this as I'm debugging, so no clue what the problem is yet. First step: isolate the problem. There are a few steps in this part of the pipeline:

  • Find key features in each image (that's SIFT, which we've seen before)
  • Find matching features between adjacent frames
  • Identify which set of matching features best explain the transformation between all the other matches (via RANSAC, and my very brief layperson's explanation here)
    • Take a set of matching features, compute the expected transformation required to get from the coordinates of the features in one frame to the features in the next
    • Check how many other pairs of matched features can be explained by this transformation
    • Repeat a bunch of times until we're satisfied we have a good transformation

There are a zillion knobs to turn, so here goes nothing. First up, take the SIFT features (i.e., the first bullet) computed from the MATLAB implementation and seeing what they look like when run through the CUDA implementation.

...And that looks even worse! SIFT, why do you forsake me?!? Technically, I should figure out what's going on with this SIFT stuff first, but I'm kind of curious to see if later parts of the pipeline are affected. In retrospect, I should've gone from the end of the pipeline to the front (i.e., start with the final transformations that MATLAB produces and just make sure the CUDA code that transforms point clouds is working...), but oh well. Lesson learned for next time.

Ok, but not great either. This is what I get if I use the matched key points that MATLAB produces (i.e., second bullet point). The CUDA code runs RANSAC on these points and figures out the final transformations. Something is still wrong. So I'm going to the end of the pipeline and make sure there's nothing wrong.

Derp derp. This is really wrong, but probably because I messed something up when converting MATLAB to CUDA transformations. The results from before would've been way worse if this step were this wrong...

And yes, I messed up by accidentally passing the absolute transformations for each frame (i.e., the transformation of a given frame relative to the first frame, not the one immediately preceding). Good news is that the result looks good and there doesn't seem to be a bug in this last step of the pipeline. But, the last step is also by far the easiest to debug. Sigh.

Ok, found a bug in the CUDA RANSAC code. Basically the CPU and GPU code calculated a different number of iterations for the RANSAC step, so the data prepared by the CPU was not correctly interpreted by the GPU. If I use everything up to the RANSAC step from MATLAB, I get a better result, although still not as good as if I directly use the transforms from MATLAB (see image above)

Woo yeah! Resolved the RANSAC issue. Turns out there was a second problem: I was using too lax a threshold for acceptable errors (i.e., which points 'count' toward the consensus). Discipline is important...

And one more piece of good news. I think the remaining issues are how we match key features between two images (i.e., how do we know two points in the images are the same point in real life?). Why? When I use the SIFT features generated by the CUDA code and SIFT features generated by the MATLAB code, I get almost identical results. (Btw, you can click on images to make them bigger)

Here's with the CUDA SIFT key points:

And here's with the MATLAB SIFT key points

After fiddling around with parameters (yeah, dirty, I know), we get a much better result. Still not quite as good as MATLAB, but good enough for now. The MATLAB and CUDA implementations differ a lot in how they implement key point matching, so they're hard to compare. An examination for another day. Mischief managed.