Tag: bugs
Difficult Bugs To Catch…
by HidekiAI on Oct.09, 2008, under Technology Opinions
Wikipedia’s “Software Bug” has a generalized sections on types of bugs, cause of bugs, and so on…Â But no matter what, I want to believe that catching/finding bugs are based on experiences.
According to this page, Brian Kernighan is quoted for:
Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
- “The Elements of Programming Style”, 2nd edition, chapter 2
We all get what Kernighan is saying and smile… A friend of mine once said “debugging is a skill of its own”, and I agree with him. I’ve worked with programmers who can understand and spot codes so quickly, that they are impressive at finding bugs. I’ve worked with programmers who can sit and debug for days and find that hardest-to-catch bug that nobody else would have caught (so they get assigned with that bug because everybody knew they will find it). I’ve worked with QA who can catch those hard-to-recreate bugs over and over again that no other QA could, in which we’d set up a hardware-ICE on his station so we can record and take snapshots of the cause of bug.
Most bugs, I think, are quickly caught if we understand the intentions behind its logic. That is why it is easier for the programmer her/himself to debug their own code rather than have others debug it for them. Alternatively, if the original author has done well with documentations (technical design documentations), commented, etc (basically any forms of documentations) in which represents the intentions of the logic in the code (overall picture of what it does), it makes it easier to find bugs. Of course, if the documentation is stale and not updated since the intentions has changed, it can mislead the programmer debugging.
In most cases, we care more about input and output of the logic more than its implementations (optimization is a different story). There are so many ways to write and implement a logic. For example, you can write a parser with scanf(), or boost::tokenizer or boost::spirit or boost::regx (or any combination of), or lexx/yacc, etc. You name it, there’s probably a way (again, disregard optimizations and correctness of usages of libraries). But at the end of the day, we usually only care about the outcome of “did it parse as expected and as designed?”. The debugger (person who is debugging in this case) needs to know the “as designed” part to verify. Nowdays, there are practices such as Unit Test in which you’d purposely pass invalid parameters to verify that the specific function will correctly handle its edge-cases. But these edge-cases are most likely common edge-cases, not one of those Heisenbugs…
But this blog is not about these kinds of common bugs, bugs which does not follow the “expected as designed”. It is more about bugs which most of the time, follow the expected path (including edge-cases) but once in a while, it steers to the wrong way, and it just takes longer to catch/find even for the person implemented the logic and fully understands the “expected as designed” perspectives (see Unusual Software Bugs).
I think (from experiences) there are few kinds of bugs that are hard to catch, or when discovered, you’d always tell yourself “oh gawd, I had this issue before, why didn’t I look for this first?” (it’s the “last thing you tested” because you stop there *grin*). Here are some categories:
- Data bugs – this one is especially the kind of bug that you’d say “I knew it was the data!” when it is the last thing you’d look for because you keep telling yourself that it must have been some subtle code changes you’ve made that caused some side-effect. Wouldn’t techniques such as Unit Testing catch this issue? Sure, if the data was small and isolated. But there are cases where data gets retrieved and distributed to other systems and during this time, the context can get misinterpreted (sure, that can be a code bug more than data bug). It’s like those “whispering game” where there is a line of children, and the first child whispers the sentence to the next child and so on, and the last child announces what s/he has been told, and in most cases, the sentence has completely been misinterpreted. Misinterpretation can be in another way, for example units. As a video game developer, I often have to reverify issues such as passing angles as radian or degrees. Because from the function sin(x) point-of-view, x is just a number (real, float, double, etc), there is no sin_radian(x) and sin_degree(x). The API does not know what unit you are passing it as, so it just computes what you tell it to. The infamous example of this is that sad event of first Mars mission where all the data was entered in meters except for one data-set was entered in feet, causing that probe to crash on Mars.
- Scoped Variables bugs – I ported games from one platform to another for few years. So I’ve done indirect-code-review in a sense, where I’ve looked at others code and seen this happen few times. One that commonly gives you the double-take is the following code:
...
for(int i=0; i < x; ++i)
{
...
for(int i=j; i < y; ++i)
{
...
myvar[i] = z; // we're not mind-readers,
// so we don't really know if the
// original author's intentions
// for 'i' to be from outer loop or
// really did mean inner loop,
// so we don't know whether
// this is a bug or valid
...
}
...
}
...
Then there are the dreadful bugs, in which you know what the bug is, but you cannot catch it because it only occurs during runtime. Often cases of this is that the application you are debugging is large-scale or you won’t be dreading on it (meaning, it’s a kind of bug you’d easily catch in a prototype and/or small project but because of other dependencies involved, it is almost difficult to catch):
- Race conditions – I am not sure who said this, but somebody once said “I rather have a deadlock than a race condition”, and I agree. If you are debugging multi-thread and you get a deadlock, you can easily inspect all the threads at the moment of deadlock and figure out where each thread is at and most likely determine what causes deadlocks. Race conditions on the other hand is usually too late to catch by the time it happens and often causes side effects such as memory leaks, buffer overflow and overruns, etc. For example, suppose you have a circular buffer in which you use to read network stream. One thread would just write to the buffer while the other thread (usually the main thread) would read from the buffer and process it. Mutex (and volatile variables) would commonly (most likely) solve this issue (or causes deadlock instead to make you realize you’ll have to restructure) but usually are forgotten because the original code used to be intended for single thread in an idealistic/perfect world situation. So all is working fine 95% of the time and some uncommon edge-cases causes this race-condition 6 months after it was implemented, so you begin to blame the latest code or newly added modules and components as the problem of the issue instead, barking up the wrong tree.
- Time Dependent – One of my heroes, Bill Gates once said that he reboots his Windows every night. And then there is this issue, where it has caused lives of brave soldiers and several injured due to time issue. Bugs generated based on long period of time before it happens is a very difficult bug to catch and/or reproduce because of the time involved in it.
- Timing Dependent – This is different from time dependent bugs in a sense that it happens probably anytime but based on the exact timing such as during the Vertical Sync (VSync) or when an NMI or other types of Interrupts occurs. Once we’ve had a bug (back in the SEGA days) in which the bug would only occur when the joypad input occurred during the vertical blanking state. I’d imagine drivers developers has more skills and tricks-of-the-trade for catching these kinds of bugs then most of us do.
So I’d imagine there are many more types of hard-to-catch bugs that I’ve encountered in the past, but I’ve ran out of steam on this (off-and-on I’ve been writing this BLog for 3 or 4 days as I remember things).
One of my favorite signature I put as my default signature is “There are 3 things inevitable in life: death, taxes, and software bugs“. Back when I was at CalPoly many years ago, one of the professor once said “There’s no such thing as bug-less software, if there is such a thing, that is a bug of its own“. Another professor once told a story about not taking a contract job relating to medical equipment. He explained that when they asked him to do the contract, the agreement was also that he must test it on himself. It’s similar to the roller-coaster engineers traditionally are the first to test ride them, they put their lives to proove their design and engineering is sound.
I want to finish up with the most recent example I’ve encountered. This bug is not really in my list of hard-to-catch bugs but for untrained eyes, it could possibly be one. A third party library decides that they want to be portable to multi-platform-targets. So they define a boolean type as BOOL and behind the scene, it is of an integer type (int).
This library also has few API’s which allows function overloading based on types of parameters passed. The problem is that it was inconsistent when passing boolean values as parameter. Suppose you have two functions, overloaded by just one parameter:
foo(const BOOL bar)
{
...
}
foo(const INT bar)
{
...
}
As you already guessed, that won’t compile if both BOOL and INT was typedef’d as int because compilers will complain that foo(int) is already defined. So they must have had this issue and went ahead and changed the first one to bool type but left the second to their own typedef for INT. So when I called their functions passing in the BOOL type function, it kept on doing unexpected stuffs.
I believe that today, this won’t be categorized as hard-to-catch bugs thanks to Intellisense (even Eclipse and KDevelop has it, and it’s become such a spoiling feature to us developers) which can display overloading lists, you can easily and quickly catch this bug (try to imagine in the old Vax VMS or any type of line-editor days, trying to find this bug!).
Enjoy bug-hunting!
LinkedIn profile
Recent Comments