''(moved from LeasedString''''''s to avoid the WikiNamePluralProblem)'' In CeeLanguage, strings are TerminatedStrings -- there's a null character that tells you where the end is. In PascalLanguage and some BasicLanguage implementations, strings begin with a byte or word (integer) that tells you how many characters are in the string. This initial byte/word is called a "leasing" byte/word, hence LeasedStrings. ''In case it's not clear, leased strings are generally considered inferior because they limit the possible length of a string (to 255 or 32K bytes).'' ''In case it's not clear, leased strings are generally considered superior because they allow O(1) determination of length, thereby making things like automatic bound checking practical, and they allow embedded null characters since they do not rely on in-band signals to describe meta-information.'' * On many modern architectures, the length is encoded as a 32-bit word -- few applications call for strings of 4 gigabyte length. A 64-bit word would be longer than the memory available on any computer available today (2004 --perhaps some server or supercomputer out there has 2^64 bytes of memory + backing store, but I doubt it). If you want to be pedantic about it and avoid the FixedQuantityOverflowBug, you could encode a pointer to a BigInt or use an unambiguous, variable-length form for the length field such as UtfSixteen. (UTF-16, and other UTF-xx encodings, while ostensibly for Unicode, can be used to encode any positive integer). For in-memory representation, size_t will always be sufficient. For serialization, I suppose you might want some sort of bignums (but pointer-serialization only adds useless complexity; for machine-independent purposes your pseudopointer will need to be a bignum anyway). Be careful: a common bignum representation is limited to the range [0, 256^{2^32-1}). (Okay, okay, there aren't enough bits in the universe to hold a string that long...) Oh, and last time I checked, officially UTF-xx can't handle over {8:31, 16:20_and_a_bit, 32:32}[xx] bit numbers, but UTF-8 can be extended to arbitrary length by allowing bytes FE and FF (which UTF-8 avoids for easier detection of BOMs). ---- ForthLanguage calls it a "counted string". They are used for symbol storage, such as word names in the dictionary. However, most AnsForth string routines use the more flexible (address,length) tuple. The word COUNT converts from a counted string to an addr-len string. Most stack operators have a two-cell equivalent (e.g. DUP -> 2DUP), which are handy for dealing with these tuples. ---- See StringWithoutLength, NonNullTerminatedString