A Model for Debug Complexity

 
Author:  Follow: TwitterFacebook
Job Title:Sarcastic Architect
Hobbies:Thinking Aloud, Arguing with Managers, Annoying HRs,
Calling a Spade a Spade, Keeping Tongue in Cheek
 
 

Debugging programs is well-known to be a complicated task, which takes lots of time. Unfortunately, there is surprisingly little research on debugging complexity. One interesting field of research is related to constraint programming ([CP07][CAEPIA09]), but currently these methods are related to constraint-based debugging, and seem to be of limited use for analysis of debugging complexity of commonly used program representations. This article is a humble attempt to propose at least some mathematical models for the complexity of debugging. It is not the only possible or the best possible model; on the contrary, we are aware of the very approximate nature of the proposed model (and of all the assumptions which we’ve made to build it), and hope that its introduction (and following inevitable criticism) will become a catalyst for developing much more precise models in the field of debugging complexity (where, we feel, such models are sorely needed).

Yes, it is indeed obvious...

Basic model for naïve debugging

Let’s consider a simple linear program which is N lines of code in size. Let’s see what happens if it doesn’t produce the desired result. What will happen in the very simplistic case is that the developer will go through the code in a debugger, from the very beginning, line by line, and will see what is happening in the code. So, in this very simple case, and under all the assumptions we’ve made, we have that

    \[ T_{dbg} = N \times T_{line} \]

where Tline is the time necessary to check that after going through one line, everything is still fine.

So far so obvious (we’ll analyze the possibility for using bisection a bit later).

Now, let’s try to analyze what Tline is about. Let’s assume that our program has M variables. Then, after each line, to analyze if anything has gone wrong, we need to see if any of M variables has gone wrong. Now, let’s assume that all our M variables are tightly coupled; moreover, let’s assume that coupling is ‘absolutely tight’, i.e. that any variable affects the meaning of all other variables in a very strong way (so whenever a variable changes, the meaning of all other variables may be affected). It means that, strictly speaking, to fully check if everything is fine with the program, one would need to check not only that all M variables are correct, but also (because of ‘absolutely tight’ coupling) that all combinations of 2 variables (and there are C(M,2) such combinations) are correct, that all combinations of 3 variables (and there are C(M,3) such combinations) are correct, and so on. Therefore, we have that

    \[ T_{line} = \sum_{i=1}^M C(M,i)\times T_{singlecheck} \]

where

    \[ C(M,i) = \frac{M!}{i! \times (M-i)!} \]

and Tsinglecheck is the time necessary to perform a single validity check.

Therefore, if we have a very naïve developer who checks that everything is correct after each line of code, we’ll have

    \[ T_{dbg} = N \times \sum_{i=1}^M C(M,i) \times T_{singlecheck} \]

which is obviously (see picture above on how obvious it is) equivalent to

    \[ T_{dbg} = N \times  (2^M - 1) \times T_{singlecheck} \quad\quad\quad \textrm{(*)} \]

Optimizations

Now consider what can be done (and what in the real-world is really done, compared to our hypothetical naïve developer) to reduce debugging time. Two things come to mind almost immediately. The first one is that if a certain line of code changes only one variable X (which is a very common case), then the developer doesn’t need to check all 2M combinations of variables, but needs to check only variable X, plus all 2-variable combinations which include X (there are M-1 of them), plus all 3-variable combinations which include X (there are C(M-1,2) of them), and so on. Therefore, assuming that every line of code changes only one variable (including potential aliasing), we’ll have

    \[ T_{dbg} = N \times \bigg( \Big(1 + \sum_{i=1}^{M-1} C(M-1,i)\Big) \times T_{singlecheck} \bigg) \]

or

    \[ T_{dbg} = N \times 2^{M-1} \times T_{singlecheck} \]

Therefore, while this first optimization does provide significant benefits, the dependency on M is still exponential.

The second optimization to consider is using bisection. If our program is deterministic (and is therefore reproducible), then instead of going line-by-line, a developer can (and usually will) try one point around the middle of the program, check the whole state, and then will see if the bug has already happened, or if it hasn’t happened yet. The process can then be repeated. In this case, we’ll have

    \[ T_{dbg} = \log_2 N \times (2^M-1) \times T_{singlecheck} \]

Note that we cannot easily combine our two optimizations because the first one is essentially incremental, which doesn’t fit the second one. In practice, usually a ‘hybrid’ method is used (first the developer tries bisection, and then, when the span of a potential bug is small enough, goes incrementally), but we feel that it won’t change the asymptotes of our findings.

Basic sanity checks

Now, when we have the model, we need to check how it corresponds to the real world. One can build a lot of internally non-contradictory theories, which become obviously invalid at the very first attempt to match them to the real world. We’ve already made enough assumptions to realize the need for sanity checks. We also realize that the list of our sanity checks is incomplete and that therefore it is perfectly possible that our model won’t stand further scrutiny.

The first sanity check is against intuitive observations, that the longer the program, and the more variables it has, the more difficult it is to debug. Our formulae seem to pass this check. The next sanity check is against another intuitive observation that debugging complexity grows non-linearly with the program size; as usually both N and M grow with the program size, so our formulae seem to pass this check too.

Sanity check – effects of decoupling on debugging complexity

Now we want to consider a more complicated case. Since the 70s, one of the biggest improvements in tackling program (and debugging) complexity has been the introduction of OO with hidden data, well-defined interfaces, and reduced coupling. In other words, we all know that coupling is bad (and we feel that no complexity model can be reasonably accurate without taking this into account); let’s see how our model deals with it. If we split M ‘absolutely tightly’ coupled variables into two bunches of uncoupled variables, each being M/2 in size, and each such group has only M/2×Kp2p public member variables (Kp2p here stands for ‘public-to-private ratio’, and 0 <= Kp2p <= 1), then our initial formula (*) (the one without optimizations) will become

    \[ T_{dbg} = N \times (T_{firstbunch} + T_{secondbunch} + T_{interbunch} ) \]

where both Tfirstbunch and Tsecondbunch are equal to

    \[ \sum_{i=1}^{M/2} C(\frac{M}{2},i) \times T_{singlecheck} \]

and

    \[ T_{interbunch} = \sum_{i=1}^{\frac{M}{2} \times K_{p2p} \times 2} C\Big(\frac{M}{2} \times K_{p2p} \times 2, i\Big) \times T_{singlecheck} \]

After some calculations, we’ll get that

    \[ T_{dbg} = N \times \Big(2 \times \big(2^{\frac{M}{2}}-1\big) + \big(2^{M \times K_{p2p}} - 1\big)\Big) \times T_{singlecheck} \quad\quad\quad \textrm{(**)}\]

It means that if we have 10 variables, then splitting them into 2 bunches of 5 variables each, with only 2 public member variables exposed from each bunch (so Kp2p=0.4), will reduce debugging time from approx. 210×Tsinglecheck (according to (*)), to approx. (26+24)×Tsinglecheck (according to (**)). So yes, indeed our model shows that coupling is a bad thing for debugging complexity, and we can consider this sanity check to be passed too.

Conclusion

In this article, we have built a mathematical model of debugging complexity. This model is based on many assumptions, and is far from being perfect. Still, we hope that it can either serve as a basis for building more accurate models, or that it at least will cause discussions, leading to the development of better models in the field of debugging complexity.

Don't like this post? Comment↯ below. You do?! Please share: ...on LinkedIn...on Reddit...on Twitter...on Facebook

[+]References

[+]Disclaimer

Acknowledgements

This article has been originally published in Overload Journal #114 in April 2013 and is also available separately on ACCU web site. Re-posted here with a kind permission of Overload. The article has been re-formatted to fit your screen.

Cartoons by Sergey GordeevIRL from Gordeev Animation Graphics, Prague.

Join our mailing list:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.