NULLcounter arguments to WhyNumberingShouldStartAtOne "Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration." -- Stan Kelly-Bootle ---- Here is EwDijkstra's justification (from 20 years ago): Given four possible ways of representing the sequence 2, 3, ..., 12: a) 2 <= i < 13 b) 1 < i <= 12 c) 2 <= i <= 12 d) 1 < i < 13 Dijkstra rejects b) and d) on the following grounds: "There is a smallest natural number. Exclusion of the lower bound... forces for a subsequence starting at the smallest natural number the lower bound as mentioned into the realm of the unnatural numbers. That is ugly." In other words, the sequence 0, 1, 2 would have to be written as -1 < i ... Dijkstra further rejects c) for: "inclusion of the upper bound would then force the latter to be unnatural by the time the sequence has shrunk to the empty one." That is, we would have to write 0 <= i <= -1. Dijkstra further observes that in an environment where all four options were provided to programmers, use of notations other than a) were linked to more bugs. Now if a) is the accepted form, there are two ways to deal with a sequence of length N: e) 1 <= i < N + 1 f) 0 <= i < N of which Dijkstra selects f) as the more elegant, stating: "an element's ordinal ... equals the number of elements preceding it in the sequence." You can read the whole thing for yourself at http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF. ---- Not totally convincing. In particular, the argument relating to the empty sequence seems weak, and the final 'benefit' of a subscript equalling the number of preceding elements appears no more useful than the subscript equalling the number of elements up to and including it. -- VickiKerr ''Yes. Empty sequences are perverse anyway. So the fact that their representation may be "perverse" is only natural. ;->'' ''I reject (a) using an argument similar to his - in that the top element may be forced to be invalid. March has the range of days 1..31. So '1 <= day < 32'? 32 is *not* a valid day of month. '1 <= day <= 31' would be more appropriate.'' This argument above seems compelling. When people in "natural language" speak of a range of selections from modular choices (days of the week, days of the month, months in a year) they '''always''' use form c). We say "from Tuesday through Friday". So ingrained is this that when a person says, "from Tuesday until Friday" it is '''still''' understood that they mean Tuesday to Friday inclusive of both Tuesday and Friday. If Dijkstra's form generates less bugs in programming (is there actual evidence for this?) then that's great for programming. Writing books that start at "Chapter Zero" sacrifices clarity for obtuse dweeb cuteness points (which possibly has greater value for hardcore f) aficionados). ''To answer your question, no there isn't any evidence that Dijkstra's form generates fewer bugs in programming. '' '''However''': The days of a month are (arbitrary) ordinals (first, second, third...). Dijkstra's argument does/can not apply to ordinals. His whole argument is based on counting, not ordering. See OrdinalsAndCardinals ---- (c) would be a relatively good choice if array indexing began at 1. Then the sequence of valid array indices would be 1 <= i <= 15 for 15 element arrays, 1 <= i <= 0 for 0 element arrays. '''But''', people's intuition breaks with sequences like 13 <= i <= 27: guess how many elements that contains? 15, of course... ''If indexing begins at 1, then 0 < i <= 15 (b) is a good choice for the same reasons that (a) is a good choice when indexing begins at 0.'' ---- I vote for option d). I don't see that 0 < i < 0 for the empty set is less clear than Dijkstra's option f) of 0 <= i < 0. -- AndyPierce ''The problem with choice (d) is that the empty set would then be representable by both 0 < i < 0 and 0 < i < 1. This non-unique representation of a unique entity could be argued to be ugly.'' On the other hand, with option (a), the empty set can be represented by '0 <= i < 0' or '1 <= i < 1' or '2 <= i < 2' or even '100 <= i < 3' for that matter. ''Which is still true of (d): 0 < i < 0, 1 < i < 1, 100 < i < 3, or whatever. I think the point was that (d) has an *additional* ambiguity over (a). For any fixed lower bound, there are two valid representations of the empty set.'' ''Question for the voter-on-d's - what about the original problem with sequences starting at 0; I have to agree with Dijkstra that -1 < i < N is inelegant (and problematic if you are working with signed values ;)'' ---- ItDepends applies here. If you number real objects (and avoid abstractions), numbering begins at 1. Zero Objects is an abstraction. If you number in the abstract, the numbering starts with zero and goes both ways (in a signed numbering system) in an unsigned system numbering starts with zero and numbers toward infinity or a limit using integer or non-integer numeric systems. There are other dependencies to numerous to continue including starting anywhere and using whatever incrementing, directional, non-directional, combinational or whatever numbering scheme you desire. (Where do you start with the temperature absolute zero? ItDepends on the scale, F, C or R). [-273<0 giving and , for each works as follows: 1) No repeats, nothing missing 2) No repeats, nothing missing 3) The split element is in both 4) The split element is in neither But which one's best? A top-down merge sort wants to still represent the whole range with no overlap, so options a and b are clearly the most natural. For instance, a.inplace_merge(i, j, k) will merge the sorted subranges [i,j) and [j,k), giving the sorted subrange [i,k). Binary search seems like it would want fully-open ranges, since if it were the correct one you could just return the result immediately. If comparisons are cheap, though, it's faster on average to only check for found at the very end, so half-open or half-closed ranges are again the right choice. Example: after testing p(k), you want to continue searching in [k, j) if the test passed, meaning k is a valid result, or in [i,k) if it didn't, making it clear that k is not a valid option. A partition on an arbitrary boolean predicate has no overlap and considers all elements, so half-open or half-closed ranges are again best. Example: You partition the range of elements at [i,j), resulting in the midpoint k, telling you that your predicate holds for the elements now at positions [i,k) and does not for those at [k,j). The inplace partition inside quicksort, however, would be happiest with fully-open ranges. This would split the elements (i,j) into the subsequences at (i,k) and (k,j), neither of which contains the element that was correctly-positioned at k. Recursing down a Binary Search Tree can also be though of as a split. If no duplicates are allowed, then the left and right children of a node with value x have values in (-inf, x) and (x, inf), so fully-open seems most natural. With duplicates allowed it's less nice, but the one type you really don't want to think in is fully-closed ranges, since if the children are [?, x] on one side and [x, ?] on the other, you have no way of knowing which side to traverse in order to find them. So I'll claim that off-by-one errors are more likely caused by using the wrong ''kind'' of range than by any particular range type being less intuitive than another. That said, I can't think of any algorithm that would be most natural with fully-closed ranges. So, since I've never heard anyone advocate half-closed or fully-open ranges, it seems like half-closed ones are the right choice. -- ScottMcMurray ---- Someone said above that "Since all options can be made to work, the argument can be only [...]". That, however, is only true since we're dealing in integers (or naturals). What if, instead, we look at uncountable reals or even countable rationals? Now it's impossible to specify the same set with different range types (since you can't add a "one over infinity" fudge factor). Consider these 4 different sets: 1) [a,c) 2) (a,c] 3) [a,c] 4) (a,c) What if we want to split them at b, where a < b < c: 1) [a,b) & [b,c) -- ok, that works 2) (a,b] & (b,c] -- that works too 3) [a,b] & [b,c] -- the union is correct, but the intersection isn't empty 4) (a,b) & (b,c) -- oops, that misses the b The one advantage that type 3 has over the others is that it can represent the degenerate case where the range contains exactly one element. This could, however, also be seen as a liability, as it creates a distinction between zero-size empty ranges and zero-size non-empty ranges. Consider the Dirac Delta function to explore some consequences of these different kinds of ranges. It's total integral is 1, but its value is 0 everywhere except the origin. (Of course that means it's not a Real function.) One of the properties of integrals is that the integral from a to c should be equal to the sum of the integrals from a to b and from b to c. If we choose a=-1, b=0, and c=1, then this property only holds if the integral used ranges of type 1 or 2. With the overlap from type 3, the sum gives 2, and the hole from type 4 means the sum gives 0. -- ScottMcMurray ---- Comment on the distinction between OrdinalsAndCardinals moved there. ---- 'Cos GisueppePeano was a badass: http://en.m.wikipedia.org/wiki/Peano_axioms -- PissedOffOfTumbridgeWells CategoryMath