''a local WikiZen beat a nice essay on cheap tests out of MichaelBolton:''

''reposted from comp.software.testing without permission & with tiny edits''

''The blog entry at http://www.developsense.com/blog.html contained an intriguing line; "'''''lots of quick tests that have very low cost but uncertain value'''''"''

''(I don't know where its permanent link will go. Google the web for the subject line if you are in the future.)''

''How defensible is such a position? How could cheap but innaccurate tests possibly help anything?? ''

As Lllary pointed out, "uncertain value" does not mean "inaccurate".
It means "we don't know whether this test will reveal a bug or not".
In fact, all tests are like that, so it really means "we're even less
sure than usual that this test will reveal a bug".  What we do know
about a cheap test is that it takes little time, little effort, and
little cost to perform, so why not?

Certain kinds of automated tests are often--though not always--cheap.
In fact, the intent of good automated tests is to make testing cheaper. If we can push a button, walk away, do some other testing tasks (such
as manual tests or writing more automated tests) that's swell.

Here's a great example of a cheap test, arrived at through automation.
It comes from Doug Hoffman.  A while ago, he was working a new 32-bit
processor, and he was trying to think of powerful tests for the integer
square root function.  He considered values that would be interesting
various number bases, he considered values that would twiddle the carry
and overflow and sign flags in various ways, he selected values that
were at the edges of various conceptual boundaries, and based on all
sorts of theories of error.  All those tests passed.  Everything looked
good.  And then he thought:  wait--why not try them all?  On a 32-bit
processor, there are only 4 billion integers to choose from, and he had
access an oracle in the form of a reference processor, so he cobbled
together a little program to try them all.  The program took six
minutes to run.  He found two bugs that he contends he never would have
found otherwise; they just didn't fall into any theory of error that he
had considered.

Now:  why wouldn't you always do this?  Today we're being asked to test
a 64-bit processor.   The same heuristic--"why not try them all?"--has
a teensy little problem:  the cost of the test has gone up from 6
minutes to 40,000 years.  So now we have to go back to more targeted
approaches--tests that we believe are most likely to expose bugs,
because what was once cheap is now expensive.

There are other kinds of cheap tests, some of which can be done with
automation assistance, and some of which can more quickly and easily be
done manually. Undermining the application is a form of cheap test.
Unplug the network cable in the middle of a transaction--almost no
setup cost, and we expect that our product will recover gracefully, but
maybe not.  Use a tool like PerlClip to create a string of 10,000,000
characters on the clipboard in a second or two, and throw that at any
given input field.  We expect the input to be truncated pretty close to
the GUI level, but maybe not.  On the last project I worked on, I
wanted to do this in Fitnesse, but found that either the tool or our
fixtures wouldn't support strings in excess of 60K or so.  (Why such a
big number?  A moderately huge input (60K) has some likelihood of
exposing a buffer overflow; a huge-huge input (10M) has a higher
likelihood.)

My daughter performs cheap tests on JumpStart Preschool by clicking at
weird times in weird places, too soon after the last click, and so
on--a quick test technique that James calls "click frenzy".  Her
hand-eye-mouse-pointer co-ordination is a little rough, which increases
the randomness of the behaviour.  But we get crashes often enough that
she recognizes the Windows General Protection Fault dialog (which
appears sometimes) and the runtime library's Bad Memory Allocation
dialog, and she says "OH-OH!" every time she sees them.  She's learned
to be sanguine about it, but for some time our lives would be easier if
JumpStart's testers had performed tests using this same approach.  It
takes less than a minute per screen.  It would likely have been
possible to automate this, but apparently they either automated it
poorly or didn't automate it at all, thinking it too expensive.  A
click frenzy session might have been sufficiently valuable and
sufficiently cheap for them, had they known to do it.

Cheap tests often cause developers to say "no user would ever do that". No user, for example, would place a shoe on a keyboard in order to see
how the application reacted to an endless stream of keystrokes.  But we
reframe "no user would ever do that" into "no user that I can think of,
and that I like, would do that on purpose".  A nice user might place a
book or a binder on a keyboard by mistake.  A malicious or impish user
(think of a 15-year-old encountering a keyboard on a kiosk) might hold
down a key for fun.

Here's the rub:  TestDrivenDevelopment tests, when run after the code has been written
and has passed the test the first time, tend fall into the category of
cheap tests too.  We don't really believe that we're going to break
anything, do we?  But we run them, because they're already written,
they have very low cost, and uncertain but potentially very high value.
Moreover, WardCunningham has pointed out that TDD tests should be
relatively trivial; if they get too elaborate, they take too long to
run, and thus they won't get run; and if they get too elaborate, they
get too specific, and they'll need to be modified when the code changes
significantly enough.  The whole idea is to keep them at some uncertain
value but low cost.  As a tester, I like the idea of TDD.  Its risk is
that the tests will reflect the same mental models of the developers
that lead to bugs in the code, but the value is that they will detect
certain kinds of problems very rapidly effectively, and they're
inserted into the development process at the time when it's really
inexpensive to fix a mistake (that is, immediately after the mistake
has been made).

In answer to your descripition of "inaccurate tests".  I don't know
what you mean by inaccurate tests, so you can define them as you like.
:)  Do you mean "useless tests"?  Tests that are based on erroneous
presumptions?  Tests that ignore things like system requirements?
Tests that could not conceivably matter to anyone?  Tests that do not
address some risk?  Such tests can exist, but I don't think I would
call these "inaccurate"; perhaps "inappropriate" or "unhelpful" or
"(probably) valueless". But you may have something else in mind.

You suggested "I think it's a test that fails even when the user could
never perceive any bug."  There are a couple of sticking points for me
in that suggestion.  One it "the user"--but which user?  Some users
might be exposed to or vulnerable to some risk; others not.  Some
users--like hackers--might exploit some risk that "normal" users would
not.  Some users--like blind, deaf, or physically disabled
people--might be ill-served by things that would do fine for
able-bodied folks.  "Never" is another sticking point, and "perceive"
is another one.  As an ordinary observer, I would "never perceive" a
blown pointer, for example--for a while.

''In the paradox of Achilles and the Tortoise, we imagine the Greek hero Achilles in a footrace with the plodding reptile. Because he is so fast a runner, Achilles graciously allows the tortoise a head start of a hundred feet. If we suppose that each racer starts running at some constant speed (one very fast and one very slow), then after some finite time, Achilles will have run a hundred feet, bringing him to the tortoise's starting point; during this time, the tortoise has "run" a (much shorter) distance, say one foot. It will then take Achilles some further period of time to run that distance, during which the tortoise will advance farther; and then another period of time to reach this third point, while the tortoise moves ahead. Thus, whenever Achilles reaches somewhere the tortoise has been, he still has farther to go. Therefore, Zeno says, swift Achilles can never overtake the tortoise. Thus, while common sense and common experience would hold that one runner can catch another, according to the above argument, he cannot; this is the paradox.''

''(Any similarity between that and the WikiPedia entry must be just a shocking coincidence.)''

''Now suppose Achilles is a software tester, trying to exhaustively test a program. He writes 100 test cases, and tests the program to 50% exhaustion. The next 100 test cases go to 66% exhaustion. The next 100 go to 70% exhaustion. This geometric progression shows that Achilles can never squeeze out the last 0.00...1% of exhaustion.''

''Now suppose Achilles writes cheap tests, and writes a program that can be constrained by them. The tests can fail too easily, even when there's no bug. The metaphor here is Achilles leaping over the turtle instead of running to it. Achilles writes such a program as can be constrained by tests as cheap as he has written.''

Your Achilles metaphor is nicely put.  However, I'd suggest that the
tortise has an effectively infinite head start, since anything other
than the most trivial program has an effectively infinite input space
(the set of all valid inputs PLUS the set of all invalid inputs gets to
infinity pretty quickly).  We never get anywhere near 100% exhaustion.
What we might get is near 100% skepticism that there's a bug there, but
it's asymptotic.  We do that to some degree with shotguns--some number
of quick, uncomplicated, scattered tests that fill a space relatively
close by--and to some degree with rifles--some number of harsh,
elaborate, precisely targeted tests that might be (to stretch the
metaphor) some distance off.

There are at least two risks associated with any test.  First, the test
might fail to identify a bug that's there, and second, it might falsely
identify a bug that isn't there.  In the first case, we know going in
that a cheap test might fail to identify a bug, but running the test is
by definition cheap enough that we don't mind.  In the second case, if
a cheap test consistently finds bugs that aren't there, it defines
itself out of existence by being expensive, so we don't run it.

Finally, note that a test strategy composed entirely of cheap tests is
likely to be a failing strategy.  But a test strategy that is composed
entirely of expensive tests is likely to be... expensive.

Again, thanks for affording me the opportunity to explain this stuff.
I hope it helps.

---Michael B.