The Final Bug, or, Please please please please please work this time!

I’ve been banging my head against this problem, on and off, for a couple months now. It’s an EP-like algorithm that a collaborator and I came up with for integrating external aggregate data into a Bayesian analysis. My colleague tried a simpler version on an example and it worked fine, then I’ve been playing around with a multivariate version and . . . it kinda works. At one point it was working ok, and then in writing up the algorithm I noticed some places where it could be improved, and then I did the improvements, and it was failing. I was getting extreme importance ratios and degenerate covariance matrices. Then I realized my algorithm wasn’t quite right, I was using the wrong factor in my EP computation so that it would not converge to what I wanted. So I fixed that. Then more problems. Etc etc. I tried going back to the simple version of the algorithm but it ran really poorly in my example. At this point I don’t know what I’m doing, I start playing desperately with the algorithm, pulling factors in and out of the importance weights, changing the distribution from which I was drawing the initial approx, etc etc etc. Can’t get it to work. Even when it doesn’t crash, I’m getting simulation efficiencies approaching zero.

Then I look at the code one more time. Damn! I was passing the arguments in the wrong order to my R function. OK, that’s the bug. I run it one more time . . . No! I was confused, the order of my arguments was just fine. So I’m still in the thick of it. Ugh.

P.S. On reading the comments I see there’s some confusion here. The problem is not simply: “I want to do X, I wrote code to do X, but it’s not working.” The problem is I’m doing research, I have a sense that this algorithm should work, but there could be something I’m missing. Bugs in the code are interacting with my incomplete understanding of the method that my colleagues and I are developing.

Also, I wrote the above post a couple of months ago. Since then we’ve made progress and I hope to post the paper soon.

25 thoughts on “The Final Bug, or, Please please please please please work this time!

  1. Anyone who codes has been there with you! Right now I have the opposite problem — getting really good results from something that I’m pretty sure can’t be correct. That’s also frustrating!

  2. I, too, have often found myself in this situation. Here’s my advice:

    1. It’s hopeless to keep re-modifying the code you’ve got. Scrap it all and start over from scratch.

    2. The same thing is sort of true for what’s in your head about it. Leave it alone for a while–don’t think about it at all. And then come back to it with a fresh viewpoint.

  3. Depending on how interdependent the whole program is you could think about using unit tests. Break the problem down into bits you’re pretty sure you understand and test that you can get the expected sets of outputs from a given set of inputs for that component.

  4. Use named arguments?

    Get a third set of eyes to look at the code? Sometimes one doesn’t notice the obvious but another person quickly can.

    Why not post the actual code on here?

  5. So it’s not only me who does that! When the problem just doesn’t want to be fixed I’ve found that it’s usually some little thing I’ve assumed must be right and that has nothing to do with the functionality I’m focussed on. Other people often spot it immediately.

    I’m a daily reader and really appreciate this blog. Your take-downs of all the PR bait that journals and departments put out are a tonic.

  6. Paraphrasing Tomas Lozano-Perez, “When you find yourself searching the space of Turing machines, you know you are in trouble, because there are an awful lot of Turing machines.”

  7. As you say in the PS, it’s not just that you have difficulty with the code, it’s that you can’t separate difficulty with the code from difficulty with the new method… So, unit testing the code is fine, but in the same way, a test-problem where you can work out the correct answers in some other way is also key.

  8. RE your PS: After Rahul’s suggestions (name arguments, get second opinion), do what Jon M suggests for exactly the reasons Daniel Lakeland lays out.

    I find it’s helpful to approach problems top-down for design and bottom-up for coding. The key is to code in thoroughly testable units, building up functionality like a stack of blocks. Not doing it hubris, and we all know how that ends. In addition, it can actually be much much faster to code this way, because it makes each layer easy to read and gives you the necessary confidence to keep building without continually doubting your infrastructure or looking there for bugs.

    I also keep meaning to write a whole post on research coding vs. production coding. The rules are definitely different, but not perhaps in the way most peple might think. For research code, you don’t need to worry about getting your code working in multiple versions of R on different compilers and operating systems. You don’t need to worry about efficiency. You don’t need to worry about external-facing doc. But you do need to worry about testing. When you’re working on plugging components into and out of the research code, you need to trust them.

    • Would love that post. Especially about unit testing.

      It sounds so good in theory but I’ve never manged to use it on a practical problem. The overheads just seem too large for research code. Of writing the unit tests etc. Maybe I’ve never worked on a project big enough or important enough.

        • I couldn’t find any where to comment on that post. I think you’re right that academic code is different than production code. It has a different purpose and therefore a different functional specification. One of the big differences is robustness to various inputs and portability and scalabilty. Even corner cases like no data, etc., don’t need to be implemented.

          Now, once you have your functional spec for research code, which is basically that it gets the analysis right, that’s what you want to test. So it’s not no tests, but different tests.

          I think the goal should be bug-free analyses and was rather surprised at the suggestion from Jeremy Foote’s post: “Instead of eliminating bugs, readers can accept the reality of bugs when assessing arguments based on academic code.” I don’t think I misinterpreted the recommendation, because the conclusion goes on to state “…even if researchers do more tests, bugs will still exist and that’s OK. As readers, we should just take the results of computational research projects with a grain of salt.”

        • Bob, thanks for the response.

          It sounds like it wasn’t super clear. I agree that the goal should be bug-free analysis, and think that researchers should write much, much better and more inclusive tests. My point was that the problem of bugs can also be attacked from the interpretation side.

          We live in a world in which many researchers are not writing even unit tests, much less integration tests, etc. I was trying to point out that this reality has practical consequences, which should include trusting the results of computational research a little bit less.

        • I was definitely confused by your urging people to squash bugs and then telling people they’re going to have to live with them, but now that you explain, it makes a lot more sense. Thanks for the clarification. And believe me, I do take the results of most research (computational, statistical, mathematical, whatever) with a grain of salt.

  9. I ran into something similar in my thesis doing importance sampling – the Mathematica code I was using do not make it clear (to me) whether it was marginalizing over row or column and for the initial suite of test problems doing it incorrectly still provided a good approximation. In fact the wrong marginalizing appeared to give more realistic results as the correct marginalizing seemed too accurate (I had yet to grasp why it was so accurate.)

    The real cause of the confusion was a looming deadline and it being too important not to be wrong.

  10. I have always said that coding is much more empirical than it should be, considering there isn’t anything in the code that you didn’t put there (well, except other people’s code, but the number of times the problem has *actually* been in a library or api I can probably count on one finger).

  11. The Carl Friedrich Gauss file drawer effect – “Also, I wrote the above post a couple of months ago. Since then we’ve made progress and I hope to post the paper soon.”

Leave a Reply to Rahul Cancel reply

Your email address will not be published. Required fields are marked *