'''Test''''''s can have bugs too'''

''[from CodeComplete, chapter 25 (quoted here on CompleteCoverageIsExpensive)...]''

* "Test cases are often more likely to contain errors than the code being tested. [Weiland 1983, Jones 1986]" p. 612. [M''''''cConnell 1993]

That's because people are really lax when writing tests, and tests are boring, and tests aren't normally elegant. Some test harnesses are so complicated, they need UnitTest''''''s for themselves. So, you can't even be sure your tests are correct, let alone whether your implementation is correct.

----
This is somewhat academic. In XP we use unit TestsAsScaffolding. This is different from trying to find all bugs, so you end up with a different set of tests.

----
There are several perspectives on this:
* '''A False Alarm''' (aka FalsePositive). The test may be wrong: the test prints an error message, in spite of the fact that the code is correct.
* '''A Silent Horror''' (aka FalseNegative). The test may be inadequate: The tests don't give any warnings ("all OK"), in spite of the code it tests having bugs.

----
If the test prints an error message, but you find the application code is correct, then you fix the bug in the test.

''That happens to me pretty often, but it doesn't bother me; I just fix the test code and go on.  (I fixed a bug; it just wasn't in the place I'd expect it to be, if I thought my tests were perfect.) -- JeffGrigg''

----
This must be taken into consideration too:
'''Tests have to be simple!'''

M''''''cConnell, Steve. ''Code Complete.'' Redmond, WA: Microsoft Press. p.612 (1993); M''''''cConnell in turn cites the studies in

Weiland, Richard J.  ''The Programmer's Craft: Program Construction Computer Architecture, and Data Management.'' Reston, VA: Reston Publishing. (1983)

Jones, Capers. ''Programming Productivity.'' New York: Mc''''''Graw-Hill. (1986)

''I'm confused. Explain how a test can be wrong, yet not fail, if the code isn't wrong also. Do you perhaps mean that the tests are '''''inadequate'''''?''

* Sometimes the test prints an error message, even though the code is correct. But that's easy to fix, so it's not really a problem.
* If a test fails to run, then it fails to check to see if the code is correct. But CodeUnitTestFirst will tend to avoid that problem.
* Tests, or the testing environment, may not be robust. For example, success may depend on initial conditions, such as the initial data in a database. Or, two tests that utilize the same shared resource may interfere with each other.
	 :	If there is an error in how the test is written, it may falsely "all OK" when should give an error message. Say, a misunderstanding of the interface that suggests a function runs non-zero on success when it really returns the number of characters transmitted. Thus, if you are trying to verify that the function "fails" when the socket is closed, the function may correctly report that 0 characters were sent without flagging the error (which may come through another interface). That's a constructed example, but the point is that it is not logically impossible nor out of my experience that happens.

----
It is theoretically '''possible''' that both the test ''and'' the code will be defective, in such a manner that running the test will signal "all OK". The question is, how '''likely''' is such a situation? I don't think it's very likely.

''Disagree. If the code is constructed to make a faulty test pass, then it will naturally have a built-in defect exactly compensating for the error in the test.  This is a weak point in any write tests first methodology.  There is help available:  keep test code simple; test redundantly via different algorithms; check that tests map properly into requirements via reviews of some sort; when you find a faulty test, treat it as a bug and write a new test for the code that will print an error if the bug ever comes back.''

There is the more probable eventuality that the test doesn't test '''enough''' of the functionality of the code being tested; the test is correct, but incomplete. Therein, perhaps, lies the reason why XP advocates that one TestEverythingThatCouldPossiblyBreak. -- LaurentBossavit

It's not likely, but it's happened to me. It's not likely on the 'is-the-string-the-right-length?' tests, but tests that have large complicated setups are more likely to have bugs. 

My experience is that bugs in tests causing them to print an error at first run, even though the production code is correct, are quite often. But this takes seconds to find them and fix and gives me more confidence in my production code. There was exactly one test from more than 200 (currently, much more) in my previous project which succeeded while production code was wrong. -- PP

''Just a reminder. In XP UnitTest''''''s, one starts with a failing test. This would seem to eliminate the likelihood of a false positive. Before I have written any of the application code, I know that my test will report failure. I then write the test code and verify it will also report success. [As an aside, this assumes the test and the code are deterministic. It is really up to the programmer to avoid causes of non-determinstic code, such as uninitialized variables.]''

The scenario where a bug in the test masks a bug in the code ''is'' likely when there are bugs in the programmer's brain. I've written tests based on faulty understanding of some aspect of the domain or the technology, then written code that passed the tests. The test said the code worked, but the test and the code worked together to mask a misunderstanding. -- EricHodges

''That is why it is much better to DoBothUnitAndAcceptanceTests.

----
In most domains, the quest for certainty will make you (and ultimately, your customer) miserable. Even in trial for murder we do not ask for certainty. So do not ask that the tests give you certainty of having no bugs. Ask instead that they give you reasonable doubt that there are bugs.

When you ask for reasonable doubt, it's not as worrysome that your tests are sometimes wrong or incomplete.

''As mentioned above, you know your tests are bogus when you need tests for your tests. At some point, you have to stop writing tests and rely on human inspection. The earlier you do this, the better. So, keep your tests simple. Actually, this is a good practice for normal code. The value of writing tests for simple code like''

 public Enumeration children() { return children.elements(); }
''is far, far lower than writing some for the networking kernel.''

I ''think'' I agree with you, especially if you permit me to twist your words into this: "Tests are a re-runnable form of human inspection. The earlier you do this, the better."

''Agreed. That's exactly it. Why use eyeballs when you can use code. That allows you to slack off whilst the other rubes are running manually regression tests. "Why aren't you testing, Sunir?" "What are you talking about? I hit F5."''

----
''Yes, but does BugsInTheTests'' matter ''that much?''

Types of BugsInTheTests:

* The programmer and test writer (whether identical or not) both misunderstand the requirements in the same way. Having a test defined by someone other than the original programmer is one good way to avoid most of these cases.
* a simple bug in the test exactly cancels out a corresponding bug in the real code. The only type that causes major problems. Extremely rare.
* a simple bug in the test that is *not* canceled by a corresponding bug in the real code. Most common. The test will report an error, and the programmer will have to figure out whether it's the code, the test, or both, that are wrong. That's what tests are for.
* a simple bug in the test causes the test to do nothing -- so it fails to print an error message even when it should. The practice of coding the test, checking that it fails, then making it work, avoids most of these problems.

BugsInTheTests don't matter that much. -- RonJeffries

Erm, can you back that up RonJeffries? Plenty of comments on this page (and common sense) suggest such bugs matter a great deal.

Hmm. Until I started looking at test-first test suites, I'd always been convinced that any test suite with a certain degree of complexity had silent horrors - that I would be able to find them. The problem was that programmers who are accustomed to having testers check their work fall into a nasty trap when they build test scaffolding: no-one checks their work! So I think BugsInTheTests are a big deal, in general. But they may not matter much with TestDrivenDevelopment - assuming that the test harness is also built test-first. -- BretPettichord

We have discovered several tests that have been falsely succeeding.

* One set of tests had a MockObject overriding some virtuals that weren't calling the superclass virtuals. I discovered this while recycling the MockObject for another set of tests, and wondering why the virtuals under test in the superclass weren't being called. When "upgraded", the original set of tests demonstrated crashes.

* Same MockObject's main virtual method was not called by production code. Since we initialized m_bSuccess = true, the test returned true. Instead, we should have initialized m_bSuccess = false and waited for the MockObject to explicitly set m_bSuccess = true in the virtual method.

* return memcmp( &expected, &fixture, sizeof(expected) ); instead of return !memcmp(...); (This took me five minutes to figure out on a caffeine-free morning.)

* Our precision fudge factor on our fixed point tests was too large.

		 :	''Why do you even have a precision fudge factor? The whole point of fixed-point is that it internally uses integers, so that there is no precision fudging required...''

		 :	First, can you demonstrate ''any'' digital computation that doesn't use integers and only integers? Using a Dogbertism, "I don't think you understand your own question." ;)

		 :	Anyway, recall high school physics. When you multiply two 32-bit quantities together, you get 64-bits of significant digits, which you must smush into 32-bits. That means you lose 32 bits of precision. This error compounds through calculations, hence the fudge factor.

		 :	Precision in fixed point gets more subtle than that. Our fixed point system is accurate to less than a bit of precision in the resultant, and we've done some acrobatics to make that happen. The tradeoffs in discrete digital arithmetic always come down to precision. Most people don't understand fixed point, thinking it's a magical thing that makes everything faster like throwing RSA at systems makes them secure (e.g. Phone.com cellphones that use RSA to protect CNN headlines!). Fixed point isn't a replacement for an IEEE double unless you really don't care about the values you get back at the end of the computation.

		 :	Yes, I'm bitter. Stupid Kommittee. -- ss

* Test of our memcpy() did not correctly verify memory overlap cases by using the ''singular'' wrong buffer length to test with.

* ''My favourite...'' the tests were essentially line-for-line copies of the (incorrect) production code after some rewriting took us to that state. Whoops.

Naturally, lack of coverage in the tests is also an issue for false positives. 

Maybe I'm just an idiot, but it's not unlikely that complex tests ''for complex subsystems'' return true instead of false. It's only off by one, after all. The best solution, I've found, is to write multiple tests covering the same production code from different angles. That way, they should either all fail or all succeed. In this way, the tests essentially test each other. -- SunirShah

----
But since UnitTest''''''s need to have neither false negatives nor false positives, you could say that the burden for correctness for UnitTest''''''s is possibly even higher than in regular code. As Ron writes, this isn't so much a problem if you practice TestFirstProgramming, or at least when you first write the tests. But if you ever want to refactor them, you can't check against false positives without crippling the underlying code, which seems really slow and cumbersome.

See RefactoringTestCode.

''Well, with VersionControl, you've still got the version of the target that didn't pass the tests the first time around.''

----
This page is a little confusing to me.  I think this page assumes that the reason you write UnitTest''''''s is to reduce the ''number'' of bugs in your code.  That is probably true for a lot of people.  But that is not the primary reason I use UnitTest''''''s.

My code still goes to the user with about the same number of bugs in it as before I started using XP.  The reason I use UnitTest''''''s is to reduce the ''time'' I spend fixing bugs, which means I can code new features faster.  Especially time spent fixing bugs that occur when I try to add a new feature to brittle code I or others wrote many months ago.

So my code can go to the user sooner than it did before I started using XP. If I started testing my UnitTest''''''s, I am guessing I would lose this advantage.

''Software Engineering 101 says that out of the threesome of quality, schedule and budget, you can pick at most two to control. Budget is usually in the hands of someone higher up, so you get to pick between quality and schedule. In your case, you seem to prefer to control schedule, while others writing here may be working on quality. Should be no need for confusion on this point.''

----
Here's an example for you. This is code from a real test suite. Can you find the ''silent horror'' or inadequate test? Actually, it contains two. One was from the original code, the second was introduced when the code was first extracted to make an example of it. Which only proves how easily these kinds of bugs can happen.

 int main()
 {
   AddressBookE''''''ntry * abEntry = 0;
   THANDLE hDataFile; 

   hDataFile = sp_OpenExcelFile(AB_DATA_FILE);
   if (Check(hDataFile, NULL)) {
     if (Check(sp_GetABDataFileEntry(hDataFile, "2.9", &abEntry), 0)) {
       switch (abEntry->Test.iSubStep)
	 {
	 case 1:
 	   sp_StartTest(TextB''''''uffer, &abEntry->Test);
           test1();
	   sp_EndTest(&abEntry->Test);
	   break;
	 case 2:
	   sp_StartTest(TextB''''''uffer, &abEntry->Test);
	   test2();
	   sp_EndTest(&abEntry->Test);
	   break;
	 case 3:
	   sp_StartTest(TextB''''''uffer, &abEntry->Test);
	   test3();
	   sp_EndTest(&abEntry->Test);
	   break;
	 }
      }
   }
   sp_CloseExcelFile(hDataFile);
   return GetTestS''''''tatus();
 }
''No default in the switch statement ("silent horror"?), closing an Excel file that may not be open?''

Why do you have 'if's without 'else's in your tests?

----
CategoryBug CategoryTesting