The Cathedral and the Bazaar | Page 7

Eric S. Raymond
written by different people, and that communications/coordination
overhead on a project tends to rise with the number of interfaces between human beings.
Thus, problems scale with the number of communications paths between developers,
which scales as the square of the humber of developers (more precisely, according to the
formula N*(N - 1)/2 where N is the number of developers).
The Brooks's Law analysis (and the resulting fear of large numbers in development
groups) rests on a hidden assummption: that the communications structure of the project
is necessarily a complete graph, that everybody talks to everybody else. But on
open-source projects, the halo developers work on what are in effect separable parallel
subtasks and interact with each other very little; code changes and bug reports stream
through the core group, and only within that small core group do we pay the full
Brooksian overhead. [SU]
There are are still more reasons that source-code-level bug reporting tends to be very
efficient. They center around the fact that a single error can often have multiple possible
symptoms, manifesting differently depending on details of the user's usage pattern and
environment. Such errors tend to be exactly the sort of complex and subtle bugs (such as
dynamic-memory-management errors or nondeterministic interrupt-window artifacts) that
are hardest to reproduce at will or to pin down by static analysis, and which do the most
to create long-term problems in software.
A tester who sends in a tentative source-code-level characterization of such a
multi-symptom bug (e.g. "It looks to me like there's a window in the signal handling near
line 1250" or "Where are you zeroing that buffer?") may give a developer, otherwise too
close to the code to see it, the critical clue to a half-dozen disparate symptoms. In cases
like this, it may be hard or even impossible to know which externally-visible
misbehaviour was caused by precisely which bug-but with frequent releases, it's
unnecessary to know. Other collaborators will be likely to find out quickly whether their
bug has been fixed or not. In many cases, source-level bug reports will cause
misbehaviours to drop out without ever having been attributed to any specific fix.
Complex multi-symptom errors also tend to have multiple trace paths from surface
symptoms back to the actual bug. Which of the trace paths a given developer or tester can
chase may depend on subtleties of that person's environment, and may well change in a
not obviously deterministic way over time. In effect, each developer and tester samples a

semi-random set of the program's state space when looking for the etiology of a symptom.
The more subtle and complex the bug, the less likely that skill will be able to guarantee
the relevance of that sample.
For simple and easily reproducible bugs, then, the accent will be on the "semi" rather
than the "random"; debugging skill and intimacy with the code and its architecture will
matter a lot. But for complex bugs, the accent will be on the "random". Under these
circumstances many people running traces will be much more effective than a few people
running traces sequentially-even if the few have a much higher average skill level.
This effect will be greatly amplified if the difficulty of following trace paths from
different surface symptoms back to a bug varies significantly in a way that can't be
predicted by looking at the symptoms. A single developer sampling those paths
sequentially will be as likely to pick a difficult trace path on the first try as an easy one.
On the other hand, suppose many people are trying trace paths in parallel while doing
rapid releases. Then it is likely one of them will find the easiest path immediately, and
nail the bug in a much shorter time. The project maintainer will see that, ship a new
release, and the other people running traces on the same bug will be able to stop before
having spent too much time on their more difficult traces [RJ].
When Is a Rose Not a Rose?
Having studied Linus's behavior and formed a theory about why it was successful, I made
a conscious decision to test this theory on my new (admittedly much less complex and
ambitious) project.
But the first thing I did was reorganize and simplify popclient a lot. Carl Harris's
implementation was very sound, but exhibited a kind of unnecessary complexity common
to many C programmers. He treated the code as central and the data structures as support
for the code. As a result, the code was beautiful but the data structure design ad-hoc and
rather ugly (at least by the high standards of this veteran LISP hacker).
I had another purpose for rewriting besides improving the code and the data structure
design, however. That was to evolve it into something I understood completely. It's no
fun
Continue reading on your phone by scaning this QR Code

 / 22
Tip: The current page has been bookmarked automatically. If you wish to continue reading later, just open the Dertz Homepage, and click on the 'continue reading' link at the bottom of the page.