Same sequence, but changing the cost function in the optimization process. Before, we optimize the relative rotations of all the camera poses, then the relative translations, and finally all the rotations and translations simultaneously. The residual vector in this scenario has length six: three parameters each for rotation and translation. Once we remove the last step (optimizing rotation and translation simultaneously), we get a better result. Still bad, but better.
Some more improvement if I weight the relative poses between consecutive frames at one-tenth the cost of relative poses between two loop closing frames. In other words, I really want to make sure I solve for loop closures.
Finally! Some success. It turns out, I need to add an additional constraint because I was letting the camera twist (i.e., imagine rotating the resulting image) freely during the optimization. By adding a constraint to control for this, we see a much better result. There were two other minor changes. First, we weight the time-based alignment much more heavily. Second, we fix the cost function to correctly change coordinate systems.
Separating the pose graph optimization and bundle adjustment into separate residual blocks made the optimization much faster, but the results aren't any different. Separating the two into different optimization problems (pose graph first) was even faster, and improved the loop closing (see bottom left chairs), but made the time-based portion (top wall) worse.
Now, it looks like there is one break where we have a duplicate of the top left-hand corner of the room (looks messier than that, but it's because there are a lot of ducts and light fixtures).