It seems appropriate to begin with a definition. This is from the American Heritage Dictionary (through answers.com): '''da·ta''' (dā'tə, dăt'ə, dä'tə) pronunciation pl.n. (used with a sing. or pl. verb) * 1. Factual information, especially information organized for analysis or used to reason or make decisions. * 2. Computer Science. Numerical or other information represented in a form suitable for processing by computer. * 3. Values derived from scientific experiments. * 4. Plural of datum (sense 1). [Latin, pl. of datum. See datum.] 3 And a '''datum''' (from Latin) means "something given". ---- '''Operant Definitions and Representations for Datum''' A typical computer science text will identify '''"label:value"''' or '''"label:(tupleVal1, tupleVal2, ..., tupleValN)"''' (or some other isomorphic structure) as a conceptual representation for a datum. The latter is heavily utilized within the RelationalModel and is, thus, associated with the most common and popular of databases -- the RelationalDatabase. The former representation is commonly associated with ObjectOriented programming, as the labels correspond to attribute names of the object. * I suspect it pre-dates the RelationalModel, but, alas, I have not been able to find a definitive reference. AnIntroductionToDatabaseSystems turns out not to be the source of the definition, so I'll have to re-visit some texts and examine their references. It might be found in Elmasri & Navathe's "Fundamentals of Database Systems" or FD Rolland's introductory "The Essence of Databases". If it's the latter, I should be able to easily find the original source by simply asking the author -- FD Rolland's office is next door to mine, and unlike many academics, he can usually be found in his office during office hours. BTW, the tuple is more accurately "label_t: (label1: tupleVal1, label2: tupleVal2, ..., labeln: tupleValN)". -- DV * Any isomorphic representation is the same. If you wish to use a mathematical record instead of a mathematical tuple, as you suggest, then that's fine, too. Underlying this ''conceptual'' representation is a ''physical'' representation for the same data. In the physical representation, the label is often implicit (based, for example, on location in memory), and the value(s) are also represented in efficient formats for computation and retrieval. While '''label:value''' and its variants are representations for data, they are not, technically, definitions for data. A particular '''label:value''' pair may simply be a value, possessing no extrinsic meaning and reflecting neither fact nor information about a world. As values, these representations are commonly used with records and tagged unions. To qualify as a datum, the presence of a particular '''label:value''' must also be (at least implicitly) interpreted as representing a fact about a world. E.g. when seen within an object, the '''label:value''' is representing a fact about that object -- that the attribute referenced by 'label' is described by 'value'. When seen within a relational database, '''label:(value1,value2)''' indicates that a predication concept referenced by 'label' is true when applied over 'value1' and 'value2' (or the entities these values reference) within some implicit world. * If I understand what you're illustrating, it may be more accurate to demonstrate your relationship between value1 and value2 as label_r:(label1:value1, label2:value2). As for a datum having to represent a fact about a world, I question this. Can one conceive of a "label: value" pair that does not represent a fact about a world? If so, by accepted uses of the term "datum," it would still be a datum even though it's a lie or random. If not, then the extension to the definition is arguably redundant. -- DV * ''Can one conceive of a "label: value" pair that does not represent a fact about a world?'' Sure. It's very easy. Simply make up a label, then make up a value, then stick em' together. Examples: "snoobarble:(7,blab)". "axeltrough:(egz,xlagg)". "label:(value)". I can keep going. There are two conceptual problems with calling all "label:value" pairs datum. The first is that, by calling ''all'' "label:value" pairs datums, then you have no qualifier for what ''isn't'' data. The very concept of a ''database'' would be pointless, since anything you can imagine (that fits 'label:value') is a valid datum... a valid piece of information, whether you are holding it in a database or not. The other problem is is that, if the label means nothing, or means nothing when attached to the value, then you don't ''really'' have information/fact/etc. What does "dgzxves:(txshgfe,19193945)" mean to you? What does it mean about the world? No meaning -> Not Data. * Are you sure you're not conflating the generally accepted (albeit subtle) distinctions (at least within information systems) between the meanings of "data" and "information"? Obviously, "dgzxves:(txshgfe,19193945)" doesn't mean anything. It's not even data, because 'txshgfe' and '19193945' are unlabelled. "dgzxves:(sploog:txshgfe,ghhk:19193945)" at least tells us that 'txshgfe' is a sploog and '19193945' is a ghhk. In examining it closely, I suspect the datum is either encrypted, written in an alien language, or is random gibberish. Of course, because it conveys no information to me -- though maybe it does to someone -- I have no way of telling which of the three is true. Fortunately, it doesn't matter. It's at least of the form 'label:value' so it's data by definition, which means I can work with it. I can construct database schemas to generate databases that will house it, write programs to manipulate it, and so forth. I can generate random data with which to test my programs and schemas, too. Whether the data makes sense -- i.e. delivers information -- to me or anyone else is immaterial. -- DV ** In information systems, the basest form of data is the collection of measurements from very real sources (raw facts -- received:(source, measured_value, time)). People typically distinguish as ''information'' that which they possess after processing this raw data in some meaningful manner to produce useful facts. However, one agent's information is another agent's data. It isn't ''"raw"'' anymore... you might call it "cooked" data. But it's data. Data and Information are the same sort of thing; they're just at different stages of processing. Properly, if you were to categorize the output of a random label:value generator, the correct approach is to create data of the form: "received:{ value: (