Continuation (part 2) of RelationalTreesAndGraphsDiscussion, because it's getting TooBigToEdit. ----- '''Source of Utilities/Libraries''' ''Who do you mean to call the end-user of an RDBMS? The programmer, or the person using the program? I would say the latter, but I doubt programmers will often be entering trees via a CRUD screen. I suspect in your approach, the programmer (user of the RDBMS) will usually need to provide these utility operations. Perhaps you have a different perspective because you work mostly with CBA, whereas I primarily do systems software and and so see a different user as primary.'' I view the tree operations/utils as being downloadable kits or libraries that programmers/DBA's share for the DB. Some would come with the standard "build" and some could be from other's custom libraries or some shared add-on repository. ''So the RDBMS will embody different set of DML/DDL standards for each kit, possibly with some common standards if the kit programmers work closely together, and the application-side languages will also need a helper API kit for each RDBMS kit. I suppose this could work, though I feel it is bordering on TheMostComplexWhichCanBeMadeToWork. I would highly favor a slightly more advanced DML/DDL that doesn't require dedicated kits to allow extension for UDTs and UDFs; I believe it would better capture EssentialComplexity.'' We could add common "structure types" to the built-in DML/DDL, such as graph and trees. This is fine by me as long as we can still do our regular relational operators on the nodes. If you can find a way to add custom data types without isolating the nodes from the usual and existing operators and indexes (node X can be simultaneously "in" Stack A and Tree B and Table C, for example), then please show how it's "done right". I see either encapsulation interfering with relational on a fundamental philosophical level, or else you have to make ugly or complex compromises. Prove me wrong. -t ''reply at: (page_anchor: don't see encapsulation)'' [What standard reference on the RelationalModel claims that "encapsulation interfer[es] with relational on a fundamental philosophical level"? Would it not be more accurate to state that encapsulation interferes with Top's TableOrientedProgramming approach?] If we add all relational operators to a "stack", is it still a "stack"? -t [Please explain how your question answers my question.] ----- '''Appending Trees and Node Identification''' ''Say I have two tree tables, treeTable1 and treeTable2. treeTable2 was produced by "table-izing" a particular EssExpression via a utility function, and has its own set of node_ids numbered from 1..N for N nodes. treeTable1 is a permanent tree table, carries about ten thousand trees, and has node_ids numbered from 1..~100k. Now I wish to update treeTable1 to contain the tree described in treeTable2. That is what I mean by "adding". The difficulty in doing so involves mucking around with node numbers. '' Tree nodes need *some* way to be uniquely identified in all known digital technology. Traditional RAM-pointer-based trees use RAM addresses to be unique. Thus, if your version of tree A and tree B are in the memory of one computer, then it's just a matter of linking up the two trees to "add" them. Simple, I will agree. However, this technique/assumption is limiting. What if you want to merge trees stored on different machines or one tree comes from disk or from an XML file from another company? There is no guarantee the node addresses will be unique if these cases exist. Thus, you have to solve a similar problem unless you hard-wire your system to be single-machine-only. You present your solution, and then I'll present mine. ''That was dishonest of you. The issue of different formats or distribution is not analogous. The relevant comparison regarding distribution and working with different formats would be, say, a union of tables on two different machines, or where one table comes from disk and the other is in a CSV file with non-standard headers.'' ''In the above problem, treeTable1 and treeTable2 can be presumed to be in the same format, on the same machine. One of them was a temporary table created to carry a single tree, and the other was a permanent table carrying many trees. This is a situation that exists because your your own suggested strategies for working with complex values often involve the construction of a temporary table, and because the user is responsible for maintaining the tables. The analogous situation for an RDBMS with structured DomainValue''''''s is to insert into any table a tree value described in a query, which is performed with the normal insert statement. There is no 'tree table', no 'node ids' exposed to the user; if the RDBMS must move a value from volatile memory to persistent memory then that is done under the hood with no effort from the user. That is my solution. Now present yours.'' I'll give one approach now: serialize the 2nd tree, and then load it in the same way an original EssExpression-like doodad would be using an "appendTree" operation. (The appendTree and makeTree operation could be the same thing since appending to an empty table is the same as making, but may not be optimized for each task.) ''So you suggest I work around the RDBMS by serializing the tree from treeTable back out then loading it again with 'appendTree'. Nice. One would have thought you would favor a relational solution to these problems, especially with all your raving about the advantages of relational, especially after all your efforts to get these trees into relational form in the first place.'' Let's back up a bit. There seems to be a big miscommunication going on here. Suppose you have a "big type database" and have your two trees, A and B, in virtual memory, not actual memory, that you are planning to merge soon. (Actual memory is too small.) Suppose you need to shut down the server to fix a noisy fan. Thus, you save your "types database", with the trees, to disk somehow. When the fan is fixed, you boot up the computer and your "types DB" is somehow loaded back into virtual memory, and you now plan to do the merge/link of tree's A and B. First question: how are two tree's nodes kept separate (unique) while stored on disk during the fan repair? (Remember, the merge has not happened yet.) Second, when both trees are loaded back into memory after fan repair, how are the two tree's nodes kept separate (unique)? I suspect your answer will be: "We just have one big virtual address space. God-RAM. The two trees reside in different addresses; thus, there's never overlap." But we *could* emulate this technique in a RDBMS by having a row-ID that is unique across the database. No two rows would ever have the same address even if in *different* tables. Then it would not have the ID overlap issues you describe and our tree libraries could use very similar techniques to your RAM-centric approach. (Whether this has big down-sides or not has to be checked.) -t ''You said: ''"I suspect your answer will be: 'We just have one big virtual address space. God-RAM.'"'' - That is essentially correct, but there is no reason to be so disparaging about the idea. One big virtual address space is how most RDBMS's and FileSystem''''''s are implemented under the hood. There is no reason for one that supports complex DomainValue''''''s to be any different. Memory-mapped IO, a log, application-driven paging, and a few bits in each pointer for GarbageCollection and type annotation can even make this very efficient, scalable, and AtomicConsistentIsolatedDurable.'' ''You said: ''"we *could* emulate this technique in a RDBMS by having a row-ID that is unique across the database"'' - global auto-number would be a good idea, and it would resolve many of the problem described above, and would also simplify identifying relationships between tables just by looking at the data. I'd give it a "win" overall. It does imply the user (by which I mean the programmer) must 'reserve' node IDs when constructing new trees into tables for transport, but one can escape even that using RFC 4151 instead of numbers (at some cost to verbosity). Well... I'll give you that battle if you start pushing for global IDs. (But by no means the war. ;)'' Woah! We agree on something. I need a deep-scrub shower. I've proposed this in MultiParadigmDatabase. (One slight down-side is that it may take up slightly more bytes.) -t ------ '''Serializing and Transfer''' Re: "with user-defined types the language knows how to serialize all of the UDTs by their construction" - That sounds odd. Their creation technique/syntax shouldn't generally affect their output layout. Sounds like arbitrary binding of input and output. Please clarify. -t ''UDTs in an RDBMS would necessarily be described as part of the DataManipulation+Data Definition language, similar to how SQL has ways to describe new tables. The physical layout in the RDBMS persistent storage would be entirely up to the RDBMS and its optimizer, but would still be determined based on the UDT definition in the DML. Translation and serialization of these structures in communications between systems would also be determined automatically from the UDT description in the DML, possibly with modifications by the communications protocol (e.g. to allow lazy transmission of large values in a query). When I say "the language knows", what I really mean is "all protocols and APIs based on the DML/DDL" can automate the serialization, persistence, etc. of these values without any additional support from the user.'' How does this work when a given "data structure" shares nodes with a variety of different data structures/tables? ''The same as usual. Under the hood, it would be shared pointers into persistent storage, with GarbageCollection after the last pointer to a value is removed. Since values are immutable (modulo lazy evaluation) there is never an issue with sharing them... e.g. the same million-character string may be shared by a hundred thousand different 'facts' and yet have only one physical copy. Any RDBMS that supports "strings" as a type can do that sort of sharing for strings, but an RDBMS that supports UDTs can do it for any UDT. For example, a tree structure could be an element in stacks, part of various other trees, used in many different facts and tables, etc. and yet have only one physical copy. (As a side note, it is a considerable advantage for sharing to have use reference-to-child rather than reference-to-parent: ref. to parent requires copies whenever the same subtree is part of two different parent trees.)'' And what keeps a RDBMS from doing the same? ''I '''AM''' talking about an RDBMS, you nut. What's up with that question? It sounds like an insinuation that a system can't be an RDBMS if it supports structured DomainValue''''''s. If you meant to ask: "what keeps my favored approach of destructuring DomainValue''''''s into tables from doing the same?" then the answer is that you've pushed the complexity of node maintenance (entry, layout in tables, use in queries, garbage collection, etc.) to the user of the DataManipulation language as opposed to such things being implied from the definition of a UDT. Since this stuff is in the hands of the user, the RDBMS lacks the necessary information and control to implement useful automation.'' related: (page_anchor: shoving it to the user) If a given data set fits our convention of "tree", then serialization can also be automatic. ''Indeed. And the effort to establish such conventions in a manner that is reliable enough for such automation is precisely equivalent to the effort required to standardize DML/DDL to support arbitrary UDTs.'' However, we might as well use the table-centric facilities already in place since our transfer file will be simpler (less formats) if we use tables instead of a different custom layout for each data structure "type". For example, if the customer in a different company does not have the "type definition" by chance, they can still read the data as tables. ''You can reasonably argue that your system has fewer primitives and thus a simpler implementation, but, before you argue that "we might as well use it," you would be wise to ensure said simplification isn't introducing AccidentalComplexity elsewhere. The need to introduce new information into a layout - node_ids in particular - is not insignificant AccidentalComplexity introduced to this file transfer. When comparing file transfers and serialization, this AccidentalComplexity must be weighed against the relative complexity of supporting a few extra UDT primitives (record type, union type, etc.) that would relieve the need for the AccidentalComplexity. In this particular case, I don't believe either of us can justify a claim that one side in this debate has an advantage over the other for serialization and file transfer.'' ''Regarding lack of "type definition", I (very strongly) favor structural typing where the serialization format automatically implies the type definition (see NominativeAndStructuralTyping); at most I need to ship with each value a version ID for the overall data format. However, with nominative typing one would ship a URI for the official type definition along with the data. Serialization and file transfer are well understood problems. One can complicate them up a bit with 'cursors' and 'lazy transport' and 'compression' and 'encryption', but I do not believe either of our favored approaches offer advantages or weaknesses on those issues.'' --------------------- RE: Since this stuff is in the hands of the user, the RDBMS lacks the necessary information and control to implement useful automation.'' (page_anchor: shoving it to the user) ''I am not "pushing it to the user". Why do you keep saying that?'' I'm not certain what you think when you see the word "user", but I consider an RDBMS to be SystemsSoftware and so (as with all systems software) "user" refers to the programmers who interface with the RDBMS and write the CRUD screens and such, most emphatically '''not''' to the people who use the CRUD screens except insofar as the categories overlap (users of the CRUD screens have a thick layer of indirection between them and the RDBMS). With my definition of "user" clarified, you are incorrect when you claim you are not pushing these details to the "user". This is fundamental; indeed, very intentionally, you are exposing (as opposed to encapsulating) the details of representation for trees. The users - programmers - are aware of such things as 'parent_id' and 'node_id', of the need to identify a tree by both a table and a node_id. The degree to which you "push" representation details to the user is also the degree to which you take that power away from the RDBMS and its optimizer, garbage collector, etc. Is (as I suspect) the confusion here about different understandings of the word "user" in SystemsSoftware? Or do you honestly believe you are ''not'' "pushing it to the user" even given clarification of my use of the word? ------------------ ''RE: If you can find a way to add custom data types without isolating the nodes from the usual and existing operators and indexes (node X can be simultaneously "in" Stack A and Tree B and Table C, for example), then please show how it's "done right". -t'' Sigh. This is about the tenth time I've answered this question. For structured values, sharing is the default. The same 'string' can be (simultaneously) in Stack A and Tree B and Table C without difficulty. See CopyOnWrite, ValueObjectsShouldBeImmutable, KillMutableState. With immutable value structures, there is no writing so there also is no need to copy. Beyond persistent structure, one is also free to "internalize" value structures such that there is exactly one copy of any given value and so the same values aren't indexed multiple times. THESE ARE SOLVED PROBLEMS AND WELL KNOWN EVEN BY PRACTITIONERS/PROGRAMMERS OUTSIDE OF ACADEMIA. And indexes over values are no more difficult than is lexical indexing for strings (which is common in SQL). One keeps an index of substructure to the full value (e.g. word to string), and from that full value to table regions, thus allowing one to ask for all entries in a table containing strings containing particular words, or all rows containing trees containing 'red' nodes. THESE ARE SOLVED PROBLEMS, ALBEIT THOSE WHO KNOW THEM TEND TO READ DBMS / INFORMATION PROCESSING BOOKS. ---------- ''RE: I see encapsulation as interfering with relational on a fundamental philosophical level. You have to make ugly or complex compromises. Prove me wrong. -t'' (page_anchor: don't see encapsulation) PROOF ONE: I suspect you'd agree that "it doesn't matter how the RDBMS represents strings or NULLs under the hood". Do you disagree? Are you going to argue that programmers should have exact control and knowledge of the bit-layout in the RDBMS? If you don't disagree, that is a '''killing''' blow (as sufficient and complete proof) against your assertion "I see encapsulation as interfering with relational". * "Incompatible" would perhaps be a better word than "interfere" for that case (but appropriate for others). If strings were in tables, such as a word per record, then I could do some parsing via relational easier than custom loops. As it is now, "string world" and "relational world" are two different worlds that don't share much currently. -t * ''"Distinct" or "opaque" would be fine words. "Incompatible" is still far too aggressive in its connotation since it is clearly the case, for example, that use of strings is not incompatible with use of the relational model. I would happily agree that strings are "distinct" from relations or "opaque" to relational operators (see PROOF FOUR). Will you use these words, or will you persist in use of aggressive words with unjustifiably negative connotations?'' PROOF TWO: RelationalModel is defined as relating DomainValue''''''s held in attributes. A 'value' is different from the 'representation' of a value... e.g. 4, four, 100 base2 can all 'represent' the same DomainValue. Because of this, an RDBMS remains compatible with the RelationalModel even if there is a layer of indirection that encapsulates the representation of DomainValue''''''s. Thus, if you '''don't''' believe that representation can't be encapsulated without interfering with relational, you should. * But that's at a pretty low level. That's like saying that everything is 1's and 0's under the hood, and therefore relational integrates well with binary. It's comparing apples to orange seeds. -t * ''That complaint is not relevant to the proof, but I do agree that the first two proofs are about encapsulation of 'representation', which is lower level than proofs three and four. Your original statement was not precise enough to distinguish between encapsulation of representation and of value, and so I covered both bases. Encapsulation of representation, that is how values are represented under the hood, does not violate relational - do you agree or do you not?'' PROOF THREE: (Concluding) There is no encapsulation of DomainValue''''''s. (Proof) DomainValue''''''s are values. Values are immutable are defined (mathematically) in terms of operators that, when applied, return other values in a formal and definite manner. Since values are immutable, you are free to apply the same operators to the original value in any permutation you wish, and thus explore all possible properties of the value that can be reached by use of operators. Since values are defined mathematically in terms of operators returning other values, the set of properties you can reach by use of operators is precisely identical the the definition of the value. If the whole definition of something (call it X) is accessible, it is unreasonable to say that X is encapsulated. The whole definition of any DomainValue is accessible, therefore it is unreasonable to say these values are encapsulated. From PROOF THREE we can also conclude that your complaint about encapsulation is a StrawMan. There never was any encapsulation of DomainValue''''''s to complain about, and the fact that a DomainValue is "structured" makes '''no difference''' on that fact. At most, one encapsulates '''representation''', which is not a problem for reasons described in PROOF ONE and PROOF TWO. * ''From a purely mathematical standpoint, you may be right. However, performance and other practical issues may make it unusable. -t'' * extended discussion moved to (page anchor: peekaboo) What you '''can''' say, TopMind, is that DomainValue''''''s (of all sorts) are "opaque to ''relational'' operators". However, that is very different from encapsulation, has very different engineering properties. And anticipating your next complaint brings me to: PROOF FOUR: (Concluding) "Opaque" DomainValue''''''s do not interfere with relational. (Proof) RelationalModel is defined by its creator (EfCodd) and maintainers in terms of DomainValue''''''s and 'relations' between them. DomainValue''''''s in the RelationalModel are never given greater requirement in the RelationalAlgebra and RelationalCalculus than an the ability to compare them for equality. This means that RelationalModel was defined in terms of "opaque" DomainValue''''''s. Thus, the RelationalModel would be inconsistent if opaque DomainValue''''''s interfered with relational. Assuming the RelationalModel is internally consistent, opaque DomainValue''''''s must not interfere with relational. ''I used "interfere" in an informal way, and it perhaps was not the best way to state it (details were given elsewhere). Thus, this proof is not useful to settle anything outstanding. You only "proved" your personal mis-interpretation of my phrase. -t'' Yes, you used 'interfere' because you believe everything should be broken into relational, and while structure DomainValue''''''s don't in any way inhibit that (i.e. an RDBMS supporting structure domain values can still be used the exact way you want to use it), they also don't ''require'' it, and you want it to be '''required''' the same way StaticTyping fans want to '''require''' you to ensure type-safety. Between your opinions on typing and your use of strings, I find your attitude on this subject to be entirely hypocritical. * ''It's that "requiring" relational is something I found far more useful than "requiring" static typing. I see insufficient value in type-heavy approaches (except for certain apps/domains where reliability trumps flexibility and economics).'' * Your attempt to excuse your hypocrisy fails when you're still not "requiring" relational for the types you use (integers, strings, etc.). And StaticTyping approaches are also useful for a balance of NonFunctionalRequirements including security and performance, and can be achieved without sacrifice to flexibility (using ImplicitTyping and MetaProgramming), but this isn't the place for another typing debate. That still leaves PROOF THREE as settling something outstanding. It destroys one of the premises you are using for half your arguments. You repeatedly make assertions about how DomainValue''''''s are encapsulated or how 'if' they are encapsulated it causes problems (run a search for 'ADT'), but the fact is that they aren't encapsulated, as demonstrated in PROOF THREE. ----- ''I perhaps should reword my complaint. It was written off-the-cuff. But opaqueness has down-sides, as the string example illustrates on a smaller scale. We'd get more powerful collections and reuse if strings could use relational operators and perhaps vise versa.'' [By which measure of "powerful"?] ''We cannot share operations and values across "collection types" as easily with hard types. I see no evidence that the upsides make up for this (with coded examples, not anecdotes and "trust me" HandWaving).'' ''In short, it's better factoring and reuse if these are '''shared''' across "collection types" (where appropriate):'' * Operators * Data values (to avoid copying if possible) * Indexes ''All else being equal, do you agree this is generally a good thing?'' Well, the "where appropriate" makes it difficult to say no, but also makes any agreement non-substantive. I believe this would make a yes/no answer misleading, so I'll refrain from producing one. More relevantly, I still have no reason to believe your assertions that, say, sharing GenericProgramming operators 'easier' with relations than with other types. There is plenty of reason to believe value sharing of data values is achieved far more thoroughly with 'hard types' since one never encounters the 'parent_id' duplication of trees (two subtrees 100% identical except different parent_ids), and one is even free to 'intern' values such that there is exactly one copy of any given structure in the entire database (e.g. via use of a hashtable). Indexing is the only possible sticking point, and on that one I'll make few comments: * The proposed strength of your approach for indexing is dubious at best. Reasons: ** First, by default one is unlikely to index anything but the 'key' for each node. *** ''I find this difficult to swallow.'' *** You find it hard to swallow? Then I give you a challenge: '''Name one RDBMS that, by default, indexes on something more than the primary key for each node.''' *** ''MicrosoftAccess: it guesses what needs indexes based on column name. I personally don't like that feature, but it proves you wrong. Second, as I read it, you were talking about the DBA (table designer), not the RDBMS. Good DBA's have a fairly good feel for what needs indexing. There are also some new (experimental?) RDBMS that index every column. They are meant for data-warehouses, so don't need to do many writes. -t'' *** I'll grant that a DBA might have a good feel for what needs to be indexed. But the case remains that, in general, one isn't getting the indexing 'automatically' simply because one has destructured values into tables. Therefore, one cannot argue there is a benefit here above requiring special indexing requests for structured types. *** ''I asked to "share" an index, not necessarily have automatic ones.'' *** So you never intended to imply your approach offers better sharing of indexes automatically? In that case, why did you keep bringing up indexes as though your approach has any advantage? *** ''I'm not quite sure what you are referring to, but one does not have to create a brand new indexing system to get indexing for new "structure types". And an index on a column shared by two or more "structure types" can be used by both "structure types".'' *** Are you claiming that one needs "brand new indexing systems for new structure types" in the other approach? Try convincing me of that claim rather than just insinuating it. Also, it sounds like you plan to stuff '''every single structure type from the entire database into one monolothic table''' so you can share indexes on columns... is that right? If so, how do you justify that behavior? *** ''Even for "fixed" file systems (trees) we may want to index on name (outside of tree), extension (file type), creation-date, change-date, archive-date (or flag in some OSs), and possibly size. I'd love to be able to use SQL on the file system with indexes.'' *** FileSystem != Tree, and is also not likely to be a DomainValue (unless one has a MetaModel simulating systems of FileSystem''''''s for some reason). *** ''Existing file systems are not trees? That's news to me.'' *** Unless FileSystem '''implies''' tree, you shouldn't act like it does. There have been filesystems that aren't trees. *** ''I meant a typical/common file system (as of 2009). I apologize for not clarifying this.'' ** Second, all indexes you might build by taking advantage of the relationships are one-dimensional, limited to properties over individual 'points' rather than the connections between them; by analogy, indexing a field for each node in a stack is a bit like indexing each ''character'' in a string rather than indexing whole words. *** ''I'd still like clarification. Indexes can be put on many-to-many tables if needed.'' *** Destructuring a stack into a table with one 'stack entry' per row is '''analogous''' to destructuring a string into a table with one 'character' per row. Just as indexing on individual characters (so you can, for example, find all rows with the character 'a') is of very limited use, indexing on individual entries in the stack is of very limited use. The issue about many-to-many tables is not relevant here. Does that help clarify things for you? *** ''Nothing except habit and hardware stops us from indexing words and paragraphs. I even built a working demo of a text indexing system for our company WAN in the mid 90's because most COTS products either costed too much or were not tuned for slow WANs. (They went with the COTS instead and it flopped so bad that my boss got fired for it.) It stored and indexed on words.'' *** Not even OnceAndOnlyOnce for representing values and other data will stop you, I suppose? The problem with the line of argument you just made is that, when it comes to explicitly specifying indexes to values, your approach offers no apparent advantage over doing the same thing with structured values. In the absence of such advantage, there is little value in bringing it up as though it were a point in favor of your approach. *** ''OnceAndOnlyOnce violations are only for efficiency. It's a known reason to violate it.'' *** True, but one doesn't need to violate OnceAndOnlyOnce to get the same indexes using structured values. *** ''And you have not demonstrated with code samples or RAM-trace examples how your approach uses indexes and joins efficiently as a "regular" table would.'' *** And you have not studied what has already been done on that subject, such as lexical indexing of strings in MySql, so you aren't in a position to pretend competence to breech that subject. *** ''Your words don't count because they are obtuse. Hit the code or hit the road, Jack. You ignored my stack example but demand I focus on your lab toy examples. -t'' *** My words are only as "obtuse" as you are "incompetent to understand them". I'll provide some examples when you aren't still stumbling over the '''trivial''' information processing stuff such as interning of values and function memoization. *** ''Because I've been spending my career solving real problems for real customers using practical tools, not mumbling in a corner diddling with bits and registers.'' *** ["Diddling with bits and registers" can be just as much solving problems for real customers, using practical tools, as defining database schemata and wiring up CRUD screens. It simply depends on who the customer is and what he or she wants. I've made every attempt to understand the needs of customers ''in general'', regardless of the nature of their requirements, and so can appreciate both the technical and the business side of their requirements and these debates. You seem to insist on only seeing one side. Do you really think that is an enlightened point of view?] *** ''Your writing style does not appear to reflect sufficient experience with non-academics.'' *** [This is a technical forum, almost entirely inhabited with participants having the knowledge to discuss technical topics at an academic level, or at least the ability to gain the requisite knowledge to take part at a reasonable level. "Dumbing down" the discourse to suit one irascible participant is unreasonable, especially as you either (a) have the intellectual capacity to gain the relevant understanding and join in as an equal (and therefore must be refusing to do so out of stubbornness or argumentativeness), or (b) you are incapable of understanding. In either case, there's clearly no point in altering my writing style. By the way, you haven't answered my question.] *** This is a technical forum, not necessarily an academic forum. I've seen good technical writing, and you(s) is not it. *** [As this is a technical forum and not a classroom, it is reasonable to expect that the participants are (at least) practitioners possessing a roughly equal level of knowledge, or the desire and skills to obtain it, who will make an effort to look past the less-than-perfect writing skills. It is unreasonable to expect participants to be technical writers, who will presumably author textbook-quality chapters to accompany every comment, for those few participants lacking in foundation knowledge or independent-learning skills and no apparent desire or ability to obtain either one on their own.] *** Aren't you picking and choosing what skills a WikiZen should have or not a bit too specific here? Add the ability to produce basic business examples to illustrate one's claim to the list, if we can all play that game. -t *** [Huh? No, I'm simply pointing out the prerequisites for ''any'' reasonable technical discussion, on ''any'' technical subject (including, but not limited to business software), on ''any'' technical forum. These requirements certainly aren't unique to WardsWiki. ''In general'', if you haven't the prerequisite knowledge to engage in discourse on a roughly equal level with your peers, it's unreasonable to expect your correspondents to teach you the subject. Asking them to clarify points, answer questions, or suggest resources for further learning is certainly reasonable, but asking them to work up from first principles -- simply because you're unwilling or unable to learn them on your own -- is not.] *** Your "first principles" are often your pet techniques and pet authors. I too could label something as "first principles", such as the ability to present decent examples beyond lab toy examples. There are plenty of reasonably-well-known domain scenarios you can choose from for semi-realistic illustration, including college grade tracking/reporting (CampusExample) and some aspect of airlines. You appear to focus on "intellectually interesting" problems over problems that better reflect real-world issues. People with too much education sometimes fall into this habit; perhaps some kind of "exam reflex". *** [It's notable that you're so unaware of first principles that you're not even aware that you're not aware of them. First principles, in most discussions on WardsWiki, are no more and no less than ComputerScience itself. With that basic knowledge, it is possible to effectively discuss pet techniques and pet authors, and evaluate them rigorously in the context of the appropriate mathematical, logical, and conceptual foundations. The problem with toy examples like the CampusExample is that they represent a single scenario that is not necessarily reflective of the all-important ''concepts'' that are being discussed. Concepts are most effectively discussed via models, not examples, except where the examples clearly highlight issues raised by the models. Now, obviously there are real-world examples that appear that violate or contradict their presumed models, but I don't see you raising examples to identify these. Rather, I see you asking for examples because you hope that they will somehow deliver the foundation knowledge that you lack. They won't.] ** Third, you really aren't achieving much 'data value sharing'. The need for explicit garbage collection, data entry, and even the organization you favor (use of 'parent_id' in the child rather than 'child_id' in the parent) respectively make sharing difficult to achieve and (for 'parent_id') forbid it outright. Between these, the same 'structure' will need to be duplicated approximately once per use rather than once for all uses. This, in turn, undermines much of the efficiency of indexing on the structure or properties of it. *** ''What is an example of duplication? And parent-to-child linking can be considered poor normalization.'' *** Sigh. This is something even a non-academic table user such as yourself should find obvious. It is not my job to educate you. Solve this problem yourself: using your 'parent_id' schema described for RelationalTreeEqualityExample, build a tree of strings of at least seven points ('''any''' tree) in a database such that the left and right sides are exactly identical in structure (same value) without duplicating the values (strings and structure) on the two sides. Then try the same thing using a 'child_id' schema. Put aside your concerns for normalization for the moment, and focus on 'data value sharing'. *** ''This is not clear.'' *** ''''THEN DO THE EXERCISE''' *** ''The statement of exercise is not clear. Same value without duplicating the values?'' *** The exercise is sharing as many nodes/values as possible in a seven node binary tree where the left side equals the right side. The problem must have been too deceptively simple for you to grasp, yes? I'll give you a hint: it can be done with just three tuples in the database using the child_id scheme. *** ''Sharing what with what? I'm tired of your demeaning and insultive talk. Your example can crawl up your.'' *** Sharing data values and structure within a binary tree. It seems you couldn't even ''try'' this two-minute exercise without whining and complaining like a spoiled brat. Exercise result at (page anchor: node sharing example). * What we really need are indexes and fast access to properties that precisely match the requirements of our queries. (That much should be trivial.) This can't be done, but something close can be achieved even when using structured DomainValue''''''s. Related: ** First, indexes over DomainValue''''''s often need to be performed over patterns, often of more than one dimension, such as searching for words in a string. This can be done; more precisely, it already is often done with 'words in a string' being an example that is commonly implemented. To implement this in the general case without requiring a sufficiently smart RDBMS optimizer (or AdaptiveCollection), I would suggest we offer the ability to hand a function to the RDBMS - e.g. a function over strings that returns a relation of the words in the string - and tell it to index such that we can find the string given the words. This index could be explicitly referenced if necessary, making sharing quite easy to achieve. ** Second, '''memoization''' of function application to the whole-value can make for both an optimization (if queries often need the memoized result) and index (if queries often select/compare/join on a memoized result). Memoization is readily achieved by the same mechanism noted above - handing the RDBMS a function in advance of its use in queries. Your (TopMind's) approach doesn't allow implicit memoization; for example, if you needed the ability to rapidly find all stacks of a particular size, you'd need to ''explicitly'' store that value into the database on a more permanent basis, whereas the implicit approach is free to delay or forget memoized information (since it can always be reproduced later). ** Third, indexes over DomainValue''''''s often need to support regional clustering based on an ordering or distance relationships based on the ''other'' DomainValue''''''s in the RDBMS. Indexing such that one can rapidly find all values with a particular region of floating point values is distinct from indexing over sub-structural properties like 'words' in a string. It isn't supported at all in your approach. This (in the general case) requires whole-value comparisons rather than point by point comparisons. So, at the very least, structured DomainValue''''''s offer a more efficient starting point for producing such indexes. ** Fourth, indexes over values are shared just as readily as the values are shared. The interning available for structured DomainValue''''''s means that indexes are shared ''more'' readily with different tables that might carry these values. ** ''Huh? Monica Lewinsky?'' ** I've explained 'interning/internalizing' of values to you before, more than once. If you haven't paid attention, then look it up rather than making trite comments. (Try here: http://en.wikipedia.org/wiki/String_interning) ** ''I forgot. You talk too much and too obscure and round-a-bout, so some of your side jabber gets ignored.'' ** What I say only appears obscure because you get involved in discussions that are far outside your expertise. ** ''You just want to force-bend the discussion there because you have no real ammunition for the real issues.'' ** Oh, I've been firing live, real ammunition. It's just alternating between going over your head and being deflected by your ferrous cranus. The only time I'm in your league is when I lower my aim a bit and go for the balls. Indexing - and automatically optimizing queries to take advantage of indexes - still represents a grand technical challenge. It isn't something that becomes easier with structured DomainValue''''''s. I will agree with this much, TopMind: your approach of destructuring DomainValue''''''s like trees and graphs is better at indexing structured values other than strings '''today, when we don't have any RDBMSs that support structured values much less indexing of them (other than strings)'''. But '''strings''' themselves serve as a counterpoint to your argument, '''and you have provided NO good reason to believe your approach is inherently better for indexing or sharing indexes'''. Given these points and those above, I am forced to conclude your assertions about the alleged superiority of indexing in your approach have been premature, naive, and probably involved more WishfulThinking than technical analysis. ''Well, I give more weight to empirical testing than you do. You are falling into the "Greek Fallacy" of thinking that you can solve such intricate problems merely by thinking about it on paper hard enough. -t'' If you honestly think you've been giving weight to empirical testing, then show me the empirical testing of one system against the other that has demonstrates the relative sharing and indexing properties of each approach that you have been claiming. If you cannot provide such empirical testing, then it is '''ridiculous''' for you to believe you are giving 'weight' to empirical testing in making those conclusions. In the absence of numbers, I at least rely on objectively justifiable statements. You don't even do that much. It seems you are arrogant enough to believe you can provide advice like "nested values shouldn't be in relational cells" without empirical observation AND without analyzing it based on known technical properties. Perhaps you should start a religion for IT practitioners... you have a book already, don't you? I often see you make a comment containing "in my book". * [If Top would write his "book" on TableOrientedProgramming rather than disseminate inaccuracies about the RelationalModel, I doubt there'd be anywhere near as much frustrating debate. Indeed, if he'd expend even half as much energy on creating a practical implementation of his TableOrientedProgramming approach as he does arguing here, he'd have something to empirically demonstrate the merit of his views.] * Where did I say something objectively and clearly wrong about relational? Why is your '''evidence always so far from your accusations'''? That is very very poor form. Don't splat accusations about so easily and randomly. Your advice about energy applies to type-heads also. -t * [You have repeatedly stated that "relational is against nesting" or some such, which is patently untrue. By the way, why such vitriol? You seem to insist on wanting to "contribute" to the RelationalModel. Why not focus on TableOrientedProgramming as a distinct, and different, approach in which nesting et al. is explicitly forbidden and types are defined in terms of tables?] * ''You still have not provided clear evidence about what relational is or isn't that shows me flat wrong. Your insults and their defense are HandWaving. Wavity wavity wavity! '' * Evidence is irrelevant. RelationalModel has a precise, formal definition. Playing HumptyDumpty makes you look like an imbecile. * ''A non-answer, as usual.'' * That was a sufficient answer. Asking for evidence about "what is relational or isn't" when it has a formal definition was simply, utterly, horribly illogical. Therefore, pointing out that there is a formal definition is a sufficient answer. It would take an irrational nutcase to not recognize it as one, but... there you are. * ''That definition [of relational] does [not] explicitly define the domain type system (and shouldn't)'' ** Correct. And, '''BECAUSE''' RelationalModel doesn't define the domain type system beyond requiring an equality property, '''ALL''' domain type systems supporting equality tests are valid for use with RelationalModel, '''including the nested ones'''. That's how formal definitions work, and that is why you were objectively and clearly wrong whenever and wherever you argued that "relational is against nesting". It's also why you were illogical when demanding "evidence" about what is and isn't relational when you've got the formal definition telling you '''exactly''' what is and isn't relational. ** ''But you are not USING relational when you create structure types. You are bypassing it. You are skipping it within those types. Perhaps "against" is not quite the right word I'm seeking to explain this (it's not precise enough to be "objectively wrong", though because "againstativity" cannot be measured without more specifications). But the problem still remains. (By the way, it seems that only a Boolean result is needed to interface with relational, not even equality. However, such may hinder performance. But that's not important here.) -t'' ** Relational models "data" as relationships between DomainValue'''''''s, and describes DomainValue''''''s as going into the tuple attributes in each relation. Based on this, I say YOU are not using 'relational' - not as it was designed, not as it models data - when you destructure DomainValue''''''s into relations. Also, saying "the problem remains" is dishonest because you have yet to demonstrate there ever was a "problem". BTW, equality is necessary because we're working with '''sets''', which preclude duplicates. That is, it isn't merely an issue of joins. ** [To Top: Creating "structure" types, or ''any'' types ("structure" is immaterial), is acceptable within the definition of the RelationalModel. To do so is not a "violation" of the RelationalModel. The RelationalModel describes relationships between attribute values that belong to domains, i.e., that are of specified types. These types may be anything within the capability of a given (unspecified) TypeSystem, as long as the TypeSystem allows values to be tested for equality. '''The RelationalModel has never required that "structure" types be defined in terms of relations or anything else.''' It simply states that attribute values belong to a domain aka type, and that the model describes '''relationships between those values''' -- not their type -- in terms of tuples and relations of tuples.] * ''and it does not exclude extensions from the base set of operations that DrCodd originally included'' ** Well, '''in general''' it does exclude extensions. What it doesn't exclude are '''specializations''', which is an extension that doesn't contradict anything already in the definition. Extending the definition of RelationalModel to support NULLs or duplication of tuples is a violation of RelationalModel since the demand for value equality comparisons and the definition's use of "set" excludes these. It might help understand this general rule by observing a slightly more extreme example: extending the definition of the RelationalModel with the phrase ''"nevermind what was just said, when I say 'RelationalModel', I'm really talking about house-cats"''. Also, you need to be careful when talking about 'excluding' specializations, since just because you have a specialized definition doesn't mean ''that'' becomes the new definition of the RelationalModel. (Note that "RelationalModel with Concurrency" does qualify as a specialization, since the default is the more general "RelationalModel with ''or without'' Concurrency".) * ''If I said something that violates the "official" definition, then point it out and show where specifically in the definition text that it produces a conflict.'' * Now you're just being pissy and looking for an excuse for more HumptyDumpty HostileStudent word-lawyering. ''The revolution really started in the Renaissance when people were willing to get their hands dirty to test their thoughts. And there is also the learning curve: IT people know RDBMS. They will be easier to absorb if we add some extensions to them rather than almost start from scratch. I'll take evolution over revolution until the "type base" has been road-tested. I hope your experiments go well, but we need something in the meantime. Relational: this generation; type-base: next generation. -t'' * [Note that a number of current-generation SQL DBMSes already support UDTs -- SqlServer, PostgreSql, and Oracle, for example. They're obviously lacking the level of TypeSystem sophistication found in some programming languages (e.g, Haskell), but such support is almost inevitable. Much of the current research work on the RelationalModel is focused on RelationalModel-friendly type systems.] * Yes, but they are generally not "structure types", but merely "cell" types. It is structure types (stacks, trees, etc.) that are at issue here. * [Note that I wrote, "They're obviously lacking the level of TypeSystem sophistication found in some programming languages (e.g, Haskell), but such support is almost inevitable." That includes trees, graphs, etc.] * Oracle's trees and graphs generally build on the table concept, not hard-encapsulated ADT's. * [Oracle provides a CONNECT BY clause in SELECT to present hierarchical relationships in dynamic data, such as organisational charts, family trees, bill-of-materials, etc. This is very different from UDTs, which allow the user to define new column types. Remember that a column type or domain represents an abstract specification of some ''possible'' or ''allowable'' collection of immutable values, of which zero or more instances will be recognised (from user input, source code, etc.) and assigned to attributes aka columns. Types in SQL are traditionally integers, strings, dates, timestamps, etc., but some applications may require column values to be temperatures, weights, complex numbers, geographical locations, graphs, trees, and so on. The fact that trees may both represent the static relationships between the individual elements of a complex value of 'tree' type ''and'' represent the structural relationships in a dynamic collection of values in a 'tree' heirarchy, is effectively coincidental. The concepts of "type", "DomainValue", and "relationships between DomainValue''''''s" should not be confused or conflated. A type is defined in terms of operations (e.g., equality, conversion to values of other types, extraction of interesting details about values) on immutable values of that type, rather than their internal representation. A value's internal representation is irrelevant outside of the implementation of its type, but a dynamic tree of DomainValue''''''s is important ''because'' of its representation.] * How about we make a distinction between '''"soft types" and "hard types"''' (working defs only). Soft types would be like our table that is used as a stack. Even though we may write Push and Pop operations for it, it is still "exposed" to the rest of the RDBMS like any other table. We are breaking encapsulation. A "hard type" only allows changes to go through pre-designated operators (GateKeeper). I have nothing against soft types. -t * [Sorry, I'm not following you at all. Are you sure you posted this in the right place?] * Oracle's "tree" does not appear to satisfy no-handle's vision of a db type system. * [That's my point. Oracle's tree operator is intended to present hierarchical relationships between DomainValue''''''s stored in a table. It is not intended to represent types. The UDT mechanisms provide that facility, but, as I noted above, they are not particularly sophisticated. The overall point of this section was to indicate that DBMSes are providing support for user-defined types, but much as with popular programming languages, type systems are not very powerful except in languages like Haskell. However, though this is pure speculation, I've little doubt that sophisticated type systems will become common in both programming languages and DBMSes.] * Then I don't know why Oracle was brought up. The issue is not "simple" user-defined types, but structure types (not the same as "structured types"), like trees, graphs, stacks, etc. * [Oracle is an example of a SQL DBMS that provides user-defined type support. Not all DBMSes provide this, but it's increasingly becoming a point of consideration when choosing a DBMS. Read the above to see why I mentioned user-defined type support in general.] * Sorry, I don't see any magic key there. UDT for non-structures (AKA "cell only") was never ever an issue here that I know of. I'll repeat it again: '''I've not suggested being rid of RDBMS!''' You are pissing me off with your attitude. Seriously, '''your persistent misrepresentation of 'RDBMS' as though it is mutually exclusive of structured DomainValue''''''s belongs in ObjectiveEvidenceAgainstTopDiscussion. You should be ashamed of yourself.''' If you ''meant'': "IT people know '''SQL'''", perhaps I'd have agreed with your statement. But I doubt you meant that. I think you're once again preaching that ridiculous idea that structured DomainValue''''''s somehow violate something in the definition of the RelationalModel. Relational with inflexible SimplySimplistic types that force lots of workarounds: this generation; '''Relational''' with UDTs capable of supporting arbitrary domains: next generation. ''Not "ridding", but demoting. You are misrepresenting me. -t'' You represented yourself quite well when you used the words 'Relational' vs. 'type-base' to describe the two different approaches, and I have been quite clear in the role of Relational even with structured DomainValue''''''s. Saying I've been "demoting" relational is entirely dishonest (or at least incorrect, if you don't know the definition of "demote"). If you wish to call our approaches the "deep type" vs. "shallow type", or "structure type" RDBMS vs. "destructure type" RDBMS, do so. ------ ''You are not integrating [relational and types] well. They are kind of like neighbors who talk to each other every blue moon. Nor are you factoring CollectionOrientedVerbs (and features) well across types. -t'' Whereas you look at my approach and see "neighbors who talk to each other every blue moon", I see "neighbors that each get their jobs done without need to interact or interfere with one another except at predictable places and times". Correspondingly, I'd probably describe your approach as "kind of like one person with two homes, two jobs, and a schizophrenic identity crisis". To me, "integrated well" means PrimitivesAndMeansOfComposition and SymmetryOfLanguage where each 'primitive' does one thing and does it well, primitives don't overlap in purpose or capability, and primitives may be composed and interleaved efficiently and safely without "gotchas" or arbitrary obstructions. I consider defining relational operators for types other than relations to be a BadThing, since it means those other types overlap in purpose and capability with relations and therefore introduces LanguageIdiomClutter and makes the optimization task more difficult. I suspect that what you consider "integrated well" would be what I'd call an amorphous, unstructured, inconsistent BigBallOfMud where where primitives overlap in purpose and capability (such as arbitrary non-relation DomainValue''''''s responding to relational operators), does them poorly (unclear as to which purpose a primitive is applied, cannot be optimized), where reuse is made a chore by arbitrary restrictions and gotchas about what can be composed where, and where simultaneously achieving safety and efficiency is a new and unique challenge for each project. Why would I want ''that'' idea of "integrated well"? ''Because your magic dream Earth-verifying type compiler database does not exist yet.'' Let me repeat the question: Why would I ''want'' that idea of "integrated well"? ---- As far as "user", can I introduce these classifications of participant, which are more specific?: * DBMS Builder - Builds the base of the database systems software. Example: Oracle corporation. * DBA-Level Developer (or just "DBA") - Adds shop-specific custom functions, extra libraries, or utilities to an organization's copy of the RDBMS. * App Developer - Perhaps who uses the query system to link the database to a custom application or tool. * End-User - A person using applications software that happens to use the database. As far as knowing node ID's, typical tools that communicate with RDBMS have to use a representation of a unique "node" identifier. Otherwise, how is an app developer going to hook say a tree GUI widget to the DB? (Think of Windows Explorer's interface) It is an EssentialComplexity of interfacing with DBs. What is your GUI widget-friendly alternative other than IDs? Paths? Possible, but not always friendly, and there's not always a way to make a unique path, depending on a given tree's rules. We can with files, because of the way file systems are defined, but this is not a guaranteed property of trees in general. ''URIs and tags (RFC 4151) are quite friendly. They are verbose, I'll agree, seeing as they represent something global to the whole universe... but their being global (not tied to any particular DBMS) makes them that much more friendly, plus suitable for use in distributed systems, database integration, and transport between databases and applications without synchronization or extra message passing. I would like to take a position and assert that node IDs, especially '''auto-numbers''', should be excised from RDBMS utilities, and replaced with use of URIs, RFC 4151, and equivalent facilities.'' ''ObjectIdentity (including node_id) is something that should be avoided where unnecessary, since it is ultimately an artificial construct imposed by human institutions rather than nature or measurement, and is unnecessary to RelationalModel. But, where ObjectIdentity is useful, it should be done right - done just as globally as the data in the database. Auto numbering is a hack... one that is error-prone and causes much AccidentalComplexity. RFC4151 is designed for this purpose.'' ''Also, use of paths is unnecessary complexity; tags can generally be opaque to the DBMS. At least when acting in their role as unique identifiers (as opposed to addressing and protocol), properties between objects (paths, containment, etc.) should generally be expressed as relations rather than buried inside the string.'' * Tags? Please clarify. ** ''Clarification readily available by perusing RFC 4151. http://www.rfc-editor.org/rfc/rfc4151.txt'' ** Kind of a "fancy time-stamp" with say department and author name. I can see its use in written documents, but for mass data it needs more testing. Perhaps it's time for a new topic: UniqueIdentificationTechniques. -t * As far as criticisms of paths as identifiers and the use of "domain identifiers", see AutoKeysVersusDomainKeys and LimitsOfHierarchies. My experience with file paths and URL's is that they are not friendly to re-orgs. You cannot shuffle things around to clean up classifications as things change and grow without breaking references. Sets are a more flexible and more powerful categorization/classification technique, but don't offer help with uniqueness identification issues. Plus, paths can be space-hogs. -t * ''Seems we agree on our distaste for paths. I would note, however, that URIs in general need not be 'path' based past the domain name. WikiWiki serves as an example of a flat namespace. There is no reason sets of values (e.g. sets of short strings) couldn't be used to uniquely identify objects; this has some advantage since one can 'classify' objects as part of their name. E.g. this page might have been {relational, trees, graphs, discussion, two}, and would be found by any of those words, and would automatically be "closely" related to all pages you can get by adding or removing or changing just one word. So don't give up on sets for identifiers, TopMind! ;-)'' ''Anyhow, I don't object to further classifying users, but I don't think it will break my habit of referring to developers as 'users' of SystemsSoftware.'' ---------- (page anchor: node sharing example) A seven node tree: "root" . / \ . "node" "node" . / \ / \ . "leaf" "leaf" "leaf" "leaf" . in both cases the '0' id is a null. child_id schema parent_id schema key left right value key parent seq value -------------------- -------------------- 1 0 0 "leaf" 1 0 0 "root" 2 1 1 "node" 2 1 0 "node" 3 2 2 "root" 3 1 1 "node" 4 2 0 "leaf" 5 2 1 "leaf" 6 3 0 "leaf" 7 3 1 "leaf" Now you tell me which has better "data value sharing" and "node sharing". ''Nice example. If we are only dealing with binary trees, the first approach seems simpler in general. However, if we want to scale the branch count up, it may be problematic.'' ----------------------- ''RE: the first approach seems simpler in general. However, if we want to scale the branch count up, it may be problematic.'' Scaling may also be performed in the child_id approach without difficulty. The underlying type requires a simple upgrade: '''type BTOS of tnode(left:BTOS right:BTOS value:String) | null''' to '''type MTOS of tnode(children:{List MTOS} value:String)''' //objection about change moved to (page_anchor: waitaminute) to reduce clutter Scaling up the same tree: "root" | x3 "node" | x4 "leaf" Comparing the schema (using the ordered approach): child_id schema: ----------------- TABLE node_table key children (key to children_table) value (string) TABLE children_table key value (key to node_table) next (key to children_table) .................. parent_id schema: ------------------ TABLE node_table key seq (order) parent (key to node_table) value (string) Resulting datasets. '0' id is used for null, but otherwise all IDs unique in each schema to avoid confusion. .child_id schema . . . . . . . . . . . parent_id schema .----------------------------------- . . --------------------- .node_table . . . . . children_table . . node_table . . . . . .key children value . key value next . . key seq parent value . 1 . 0 . "leaf" . 4 . 1 . 0 . . 1 . 0 . 0 . "root" . 2 . 7 . "node" . 5 . 1 . 4 . . 2 . 0 . 1 . "node" . 3 . 10 . "root" . 6 . 1 . 5 . . 3 . 1 . 1 . "node" . . . . . . . . . . . 7 . 1 . 6 . . 4 . 2 . 1 . "node" . . . . . . . . . . . 8 . 2 . 0 . . 5 . 0 . 2 . "leaf" . . . . . . . . . . . 9 . 2 . 8 . . 6 . 1 . 2 . "leaf" . . . . . . . . . . . 10 . 2 . 9 . . 7 . 2 . 2 . "leaf" . . . . . . . . . . . . . . . . . . . . 8 . 3 . 2 . "leaf" . . . . . . . . . . . . . . . . . . . . 9 . 0 . 3 . "leaf" . . . . . . . . . . . . . . . . . . . . 10 . 1 . 3 . "leaf" . . . . . . . . . . . . . . . . . . . . 11 . 2 . 3 . "leaf" . . . . . . . . . . . . . . . . . . . . 12 . 3 . 3 . "leaf" . . . . . . . . . . . . . . . . . . . . 13 . 0 . 4 . "leaf" . . . . . . . . . . . . . . . . . . . . 14 . 1 . 4 . "leaf" . . . . . . . . . . . . . . . . . . . . 15 . 2 . 4 . "leaf" . . . . . . . . . . . . . . . . . . . . 16 . 3 . 4 . "leaf" The scores (lower is better): . . . . . . . . . SCHEMA . . . . . . . . . child_id . parent_id node count: . . . 10 . . . . 16 strings: . . . . 3 . . . . 16 cells: . . . . . 30 . . . . 64 * Why didn't you compare using the original schema also? I didn't need any changes to go from binary tree to multi-branch. I'd be more included to use the original schema as the default. If we need extra side tables for speed, then so be it. Not all tree type definitions are going to have the same performance/resource characteristics either, I'd note. -t * ''To answer your question: For the obvious reason. The original child_id schema is for binary trees, not for M-ary trees. To reply to your smug "I didn't need to change anything", see (page anchor: waitaminute). Regarding your comments about "extra side-tables for speed", stop the HandWaving and show me the code and tables, preferably at a page anchor so it doesn't clutter this area.'' * First tell me why it is HandWaving. If I didn't supply enough info, then politely request more details. Learn some f8cking people skills. * ''If you don't like the fact that everyone feels justified in claiming and believing that TopMind spends most of his time HandWaving, then try to anticipate objections and respond to them '''before''' people ask. Or, in your own words, "Learn some f8cking argument skills." My impression is that you have done nothing to verify that "extra side tables" '''can''' provide speed, much less do so without violating the various other "micro-rigor" metrics you care about.'' The sort of sharing seen on the child_id schema is algorithmically achieved by "interning" of values. I.e. use a table that indexes values for some reason (any reason), index the new values (from bottom up) as they are added, and reuse old values wherever they exist. This allows one to '''share the indexing''' that already exists wih the old value, and (if done rigorously) also makes equality comparisons very cheap. E.g. since the child_id schema is ''fully interned'', I can compare any two values for equality simply by comparing their reference IDs. It's worth noting that the degree of sharing, above, would increase '''exponentially''' in favor of the child_id schema for each 'layer' of depth added to the tree. Adding another, adding 2 'flowers' per leaf, would require adding 3 nodes and 1 string to the child_id approach, and would require adding 32 nodes in the parent_id approach. The parent_id approach is, therefore, '''exponentially''' worse for sharing than the child_id approach. This is even true in ''common'' cases, not just the general case. Rather than "adding at the bottom", consider the far more common problem of "wrapping at the top". Suppose you want to create a new tree of this form: "new_root" . / \ . "root" "root" . (original "root" tree) |x3 |x3 . ... ... . In the child_id approach one would do something like (in pseudo-SQL): TRANSACTION T freshKeys NewRootID L1 L2; INSERT INTO node_table(key children value) [(NewRootID L1 "new_root")]; INSERT INTO children_table(key value next) [(L1 3 L2) (L2 3 0)]; END T That is, one adds 3 nodes and 1 string, and voila! Done! This child_id schema seems pretty efficient and 'lean'. For the 'parent_id' schema, however, one needs to duplicate the entire "root" tree, and do so '''twice''' if the original "root" tree is to be preserved because it is still being used. One then adds one new node and tweaks the parent_id of the top two nodes to perform the update: TRANSACTION T freshKeys NewRootID; rootCpyA = helperDuplicatesTree(tree_table_in:node_table tree_table_out:node_table key:1); rootCpyB = helperDuplicatesTree(tree_table_in:node_table tree_table_out:node_table key:1); INSERT INTO node_table(key seq parent value)[(NewRootID 0 0 "new_root")]; UPDATE node_table SET parent_id=NewRootID,seq=0 WHERE key=rootCpyA; UPDATE node_table SET parent_id=NewRootID,seq=1 WHERE key=rootCpyB; END T Thus, '''to add this "new_root" tree to the node_table in parent_id schema requires adding 33 nodes, 33 strings''' (which I'm not showing for obvious reasons). Even if done with a destructive update, it requires adding 17 nodes and 17 strings and makes the original tree unusable. In addition, the greater verbosity increases code-bloat and the need for helper functions, and generally further reduces performance. Scores after adding the "new_root" non-destructively (lower is better): . . . . . . . . . SCHEMA . . . . . . . . . child_id . parent_id node count: . . . 13 . . . . 49 strings: . . . . 4 . . . . 49 cells: . . . . . 39 . . . . 196 Despite the fact that it's often rare to have ''that'' much sharing within a single DomainValue, the issue of one value wrapping or carrying or sharing some structure with another is very common in domain maths for both intermediate representations and final results. The parent_id schema will almost invariably involve an abundance of value copies, which is wasteful of resources and does a remarkably '''poor''' job at sharing or allowing for efficient indexing or comparisons. If you've wondered why I've been goggling and calling "ridiculous" your claims of sharing of data values and indexing, this is why. (What I've wondered is how ''anybody'' in the information processing business who has held position as a DBA doesn't already know all this and find it obvious... but perhaps this knowledge is more specialized than I had assumed.) * It's called "normalization". If you have lots of large value repetition, you factor it. For example, instead of repeating an employee's name over and over, you use an employee_ID (or name_ID in a smaller setting). This also makes it easy to join with the rest of the Employee table and related info. In my domain, sharing such with other "structures" is a common need. For example, a company org chart is a tree (well, mostly). We don't need to store employee info *in* the tree if we can get it from the Employee table and related tables. Plus, it "updates" automatically if an employee gets married and has a name change. You can call this "interning", but I'll call it "normalization, thank you. -t * [Normalisation is not about preventing "large value repetition", it's about avoiding update anomalies due to redundant representation of '''facts'''. Large values may acceptably be "duplicated" (though it isn't duplication, really) to represent foreign key relationships. However, the "duplication" may only be a user perception -- the internal machinery of the DBMS will eliminate any physical duplication as appropriate.] * Normalization has multiple purposes. But if the DB can factor internally, then the alleged advantages described above wouldn't exist anyhow. Your cake is eating you and having you at the same time. * [Many modern DBMSes can and do "factor" internally. Each DomainValue (except for certain canonical primitives such as integers), for example, is typically represented OnceAndOnlyOnce. However, this is unrelated to normalization, which is strictly about avoiding update anomalies due to redundant representation of facts. Normalisation-like activities for other purposes -- such as improving performance or simplifying queries to use a single attribute rather than a composite attribute -- is something else.] * [''What you (Top) did isn't normalization in any case. It's adding a surrogate key. Even worse, you have not removed one bit of duplication for your efforts. Where you had duplicate names, you now have exactly the same number of duplicate IDs.''] * What is the goal here? I've shown several ways to optimize various factors: disk size, parent access speed, child access speed, node count, etc. We have choices. Nothing optimizes for everything. Same applies to your stuff too. * [''You said you had "normalized" by adding a surrogate ID. That isn't normalization, regardless of goal. You also said that doing so reduces duplication. It didn't, regardless of goal.''] * Regardless of what it's called, each reference technique has tradeoff advantages, and we can do the same thing regardless of name. (I may debate def of "normalized" later, be its not necessary here.) -t * [''Well, whatever you call it, you still haven't used it to reduce duplication.''] * What specific duplication are you referring to? Whatever you can do with pointers, I can do with record keys. (The keys could even be RAM addresses, but that's another story.) ** ''RE: Whatever you can do with pointers, I can do with record keys.'' - true, but I can do a lot more with '''encapsulated''' pointers than you can do with '''un-encapsulated''' record keys. See (page anchor: favoring encapsulation of representation) * [''"Where you had duplicate names, you now have exactly the same number of duplicate IDs." I would have thought that is explicit enough, but I should never underestimate your inability to understand even the simplest of things. It's not surprising that you can do with surrogate keys what can be done with pointers. Pointers are surrogate keys that have been optimized for looking up the keyed values.''] * Sigh. I meant DIFFERENT than your solution (more dups). That should have been obvious, but apparently you have the inability to understand even the simplest of things. * [''Let's see, you claim to have removed duplication by "normalizing". I point out that what you did neither normalized nor removed duplication. You ask a question about the duplication and insert a RedHerring about pointers. I answer your question. Nope, no sign of a proposed solution on my part. I don't see how it can be obvious that you meant "DIFFERENT than your solution", when I haven't even proposed one. (I'm not even convinced that there is a problem. Your track record here at identifying them is rather poor).''] * I decided to back up and and ask what your goals were. What "count" or metric are you trying to reduce or compare on? I never got a real answer. * [''My goals? It's your goal. You wanted to reduce duplication. The metric is clearly the number of duplications. Since the number duplications was the same both before and after your transformation, you clearly didn't reduce duplication.''] * There is inter-value duplication and intra-value duplication. Different tree-stringing approaches favor different ratios. Not a key issue here anyhow and thus not worth digging into further. So... '''How does the above relate to DomainValue''''''s''' you ask? (rhetorically) Structured DomainValue''''''s and UDTs are typically implemented based on the child_id schema, where the notion of 'child_id' is replaced with 'pointer', and access to any underlying table is encapsulated. Recall, '''encapsulation of representation''' for DomainValue''''''s is by no means 'against' the RelationalModel (see PROOF ONE, PROOF TWO above). By keeping it under-the-hood, the RDBMS would be able to perform the algorithmic "interning" of values automatically (rather than making each application or 'kit' developer reinvent it), thus achieving maximal levels of sharing of node values and sharing of indexes. Additionally, this '''encapsulation of representation''' can (a) prevent people from accidentally mutating a shared value, (b) support automatic GarbageCollection, (c) allow the serialization of structured values to be standardized and optimized separately from its persistence layout (so each RDBMS can go its own way when deciding how to represent these structures), (d) allow one to operate on DomainValue''''''s as whole units (e.g. for an equality test) which helps composability and GenericProgramming and eliminates need for representation management helpers, (e) allows lazy evaluation where functions are only executed insofar as they are needed. '''None of these points are small, insignificant, trivial, or irrelevant. Don't dismiss them simply because I'm not focusing on them here.''' (page anchor: favoring encapsulation of representation) * ''See PageAnchor "list3"'' People obtaining a ComputerScience education often have an option to take courses dedicated to information processing and data management techniques, and there is no shortage of them (there's a reason that databases are a multi-billion dollar per year industry), so I won't pretend to know all the indexing methods available today, but I can point out that indexing of 'strings' under MySql follows the 'structured DomainValue' approach. That is, programmers don't need to care that, under-the-hood, their variable sized strings are represented in dedicated tables and have a bunch of dedicated indexes. All they need to do is issue their "likeness" queries or whatever, and let the query optimizer take care of leveraging the indexes. Tables exist, under-the-hood, to support representation, indexing, interning, memoizations, and so on. Due to encapsulation and full awareness of the purpose of these tables, the RDBMS can maintain these tables automatically and optimally with only moderate degrees of intelligence on part of the RDBMS. One could, with some tweaking to existing RDBMS, probably achieve similar automatic maintenance of 'views' for indexing in the destructured value approach, but the improved sharing of values means the RDBMS can make more optimal shared use of its indexes. Indexing of the sort favored by MySql, without explicit access to tables representing values, is what I favor for DomainValue''''''s. Of course, in the absence of a "sufficiently smart" RDBMS, I'd need to tell the RDBMS what to index. A potential (and simple) way of telling the RDBMS what to index would be to hand it a function that, for any given value, returns a set of values that should be indexed to that value, from that value, or both. If I don't have a "sufficiently smart" query optimizer, I could give that function a name and simply use said name as part of any query that requires the index. For example: '''hasWord = (function body that returns set of words for input string)''', '''full index UDT on hasWord''', later in query '''... where hasWord(A,W) and hasWord(B,W)'''. I mentioned this approach above, but I suspect at least one member of my audience of two lacked the context to understand it at the time. And, in anticipation of a complaint: '''but, but, but... What about mutation?''' Well, the "goal" here would be to share structure across a mutation. There are a few points here: First is that we're dealing with DomainValue''''''s and we want there to be as much sharing '''before''' the mutation as possible. Mutating a shared string directly in an exposed table would essentially mutate the string for all users sharing that value (i.e. changing all references to "red" to be "blue" '''in the entire database'''), which is probably not what you want. So, unless the approach already sucks woefully at sharing, you really don't want to update values that way without producing at least a partial copy. Second is that most implementations of domain maths (both ObjectOriented and FunctionalProgramming) do a magnificent job of sharing structure across operations that allow for it. E.g. whether creating a new binary tree from two smaller ones, or grabbing the left side of a binary tree value. The problem is a bit more difficult for linear structures (sets and relations, arrays and tables, lists, strings, binary large objects), but even those can be shared efficiently if they are represented as ropes under the hood ('tree' representation of linear structure, see 'Rope (computer science)' on Wikipedia). Even fragments of hashtables can be shared using bucket hashing (which is the basis for efficient distributed hashtables). Due to this sharing of structure, there is a lot of node, data value, and index sharing when working with values that are produced from other values. Interning of values (described above) will take care of the rest, allowing sharing to be maximal. The parent_id schema, by comparison, requires creating copying both smaller trees to produce a large one. * ''No it does not if the row identifiers use the "godRam"-like technique we already went over. -t'' * That doesn't help; the need for copying exists even when every key is unique in the database. Each node designates its parent and role, so that node needs duplicated for each different parent and role, and the need for duplication then recurses. I.e. there is no sharing possible. Please review (page anchor: preservation). * ''I haven't checked carefully, but it appears that each link direction favors different kinds of tree surgeries. One is not automatically "better". Plus, how nodes are linked is indendent of wether we use relational or traditional pointers. The implimentation of the relational node could even be identical to the traditional approach if we accept certain implimention tradeoffs. Having different link approaches just makes the implementation more difficult under either scenario because the "driver" has to support more link techniques.'' * Sigh. It is incorrect to say node linkage is independent since relational supports the parent_id approach whereas pointers do not. And, since you have not implemented an RDBMS or any information-processing SystemsSoftware, I suggest you stop embarrassing yourself by making assertions about what is more or less difficult to implement. Most RDBMS, for example, would not have a "driver" that can distinguish between child_id and parent_id schemas... it just does the linking based on available indexes. And creating a DomainValue equal to the left side of an existing binary tree DomainValue requires either destructively mutating the left-side's parent_id to 0 (thus affecting all other uses of the tree identified by said parent_id), or copying the whole left-hand tree. There is ''almost no'' structure sharing unless one performs it via destructive update. And if the goal ''is'' a destructive update that affects the entire database, even then the child_id approach is not at a loss (one simply destructively updates one of the child_ids), so there isn't even an advantage to parent_id on that issue. * ''Not true, as covered elsewhere.'' * Please clarify what is "not true", then give me a page anchor. One does lose the ability to explicitly destructively update DomainValue''''''s (on purpose OR on accident) if their representation is encapsulated into trees and UDTs, but even that is no great loss. With the structured DomainValue approach one may always add layers of indirection to achieve the same effect (e.g. adding a language translation table, allowing one to logically change all references from "purple" to "violet"). With the alternative approach, there is no equivalent option to strip indirection away. The expressive power and performance advantage here still favors structured DomainValue''''''s. Anyhow, I'll conclude this by saying: The use of structured DomainValue''''''s has been theoretically favored for '''performance, flexibility, composability and modularity, scalability, sharing, convenience (implicit support) for entry and garbage management, indexing, safety, security for data access and secrecy preservation, ability to add versioning and bitemporal databases''', and EVERY metric I have EVER tested them against except one. That ONE metric they fail on is simplicity of implementation. Admittedly, I'm only concerned about my own little corner of "micro-rigor". TopMind's approach might be better for obfuscation, job security, enhancing the industry for HDD manufacturers, improving demand for powerful CPUs, and a number of other 'macro-rigor' metrics that I simply don't bother looking at. -------------- ''RE: The underlying type requires a simple upgrade ...'' (page_anchor: waitaminute) Hold on, Tex. First, you had to change the underlying definition. I didn't have to change squat. -t ''All the fact that you "didn't have to change squat" to move from binary trees to multi-node trees really means is that people were able to violate the DomainValue type, which was "binary tree". To call this an advantage on your side is, at the very least, extremely dubious.'' * It's called "flexibility". And as already stated, a tree COULD hard-wire in only two nodes if we wanted to sacrifice future flexibility (left_ID, right_ID). And we could add constraints to the original form (parent_ID) if you want branch count enforcement. Both approaches enforce 2. You are just being a purist for the sake of purity. -t * ''I'm a purist for the sake of performance, security, composability, simplification of concurrency, expressive power, '''flexibility''', and more. Regardless, any form of "enforced" flexibility where constraint is desired is not a good thing, and flexibility for M-ary trees is readily supported as described above.'' * Purists often trick themselves into believing that, be it religion or technology. * ''Perhaps, and yet RidiculousSimplicityGivesRidiculousResources seems to hold true empirically. Purity gives ridiculous simplicity.'' Second, you didn't address what to do with any existing data (an existing binary tree that now wants more branches). -t ''I would suggest converting it. This should do, and shouldn't be difficult to implement any number of ways:'' insert into node_table(key children value)[(0 0 "")]; // preserve left-zeroes For each (Key L R Val) in original_tree_table(key left right value): freshKeys NL NR; insert into children_table(key value next)[(NL L NR) (NR R 0)]; insert into node_table(key children value)[(Key NL Val)] End Loop You still had to do a dump/reload step. ''So? If you wanted to extend a binary tree with a new branch, you'd need to copy the tree too. See the note on "but, but, but... What about mutation?" above. The parent_id approach is NOT in a better position. We're dealing with DomainValue''''''s, not data. If data requires a new DomainValue, we create a new one.'' * ''RE: need to copy:'' Not true. * ''Yes, it's true. With the parent_id schema, you need to make copies. If you don't believe it, then try it on paper.'' * My paper has no copies. * ''Yes, I'm sure it has zero copies just as all blank sheets of paper do. More seriously: First, you need to share the value being modified (if you don't, you've begged the question simply by assuming values are copied ''before'' the exercise). Second, you need to change the value only as seen by one of the users (if you don't, you've done some equivalent of updating 'red' to 'read' in the entire database, which is not a valid goal if we're dealing with DomainValue''''''s). Those are the assumptions you need to make if the exercise is to be at all meaningful.'' Also, if we know that our domain will be dealing with binary trees and only binary trees for a long time, then a binary-tree kit for tables could be built/found for such. It's one of those optimizing for current requirements versus optimizing for future flexibility decisions. -t ''Indeed, your approach allows any number of add-ons that overlap in purpose and reimplement bunches of different stuff.'' Third, that's not a minor definition change. You went from two scalars to a nested structure. -t ''No I didn't. '''type BTOS of tnode(left:BTOS right:BTOS value:String)|null''' is a type with two nested structures (a BTOS on the left and a BTOS on the right), and '''type MTOS of tnode(children:{List MTOS} value:String)''' is a type with one nested structure (a list of MTOS). I'll chalk the idea that those were scalars to a misreading of the type definition language, which I'll readily forgive because it's sort of a hybrid type definition language between OCaml and Haskell and Oz.'' My mistake, they are not "scalars". But it's still a significant change, structure-wise. -t ''Is that supposed to be a complaint? From definition, it is pointless to make insignificant changes. More importantly, it's a simple and obvious change that any competent and interested person would discover without difficulty. "Oh, I want to scale this binary tree up to an M-ary tree! Hmmm... Oh, duh! I'll just turn this binary structure into an M-ary structure!" Simple but significant is the best sort of change to make.'' ---------------------- Unfortunately, we have no production-level data warehouse implementations of RDBMS's supporting structured data against which to compare empirical data between the use of structured DomainValue''''''s and others. That leaves us with no empirical evidence supporting either side, outside of the experiments on paper. ''That's not entirely true. We have Oracle's tree and graph extensions to tables as an example of hard-wired trees and graphs "tacked on" to relational systems. It's not user-defined types, though. But it's half way to the show. -t'' Oracle's tree and graph extensions would support your half, yes. But it still doesn't allow comparisons between the approaches. ''Yes it does:'' * ''Existing similar proof of concept for relational-centric approach: 1'' * ''Existing similar proof of concept for structure-type approach: 0'' ''That's a comparison for at least one metric. -t'' Whenever I think you couldn't possibly be more daft and less reasonable, you surprise me. ''That'' is not a metric supporting comparisons between the approaches. ''You asked for "empirical evidence supporting either side", did you not? Implementing a (partial) solution at least proves that it is implementable (to at least some degree) and used in practice. Where's the problem? -t'' I asked for "empirical data '''between the use of'''". I don't believe implementability is at issue, here. If it's only implementability, we already have approximate implementations of Relational with structure values in Haskell and Prolog, we already have implementations for serializing structure values with YamlAintMarkupLanguage and pickling of values in a wide variety of languages, etc. ''But their use as a database has not been tested for large data-sets, heavy usage patterns, and multi-app sharing of data. One can use JavaScript even as a small personal "database", but that doesn't tell us much.'' [Support for large data-sets, heavy usage patterns, and multi-app sharing of data are characteristics of DBMSes, mainly deriving from the architecture of their underlying storage mechanisms (large data-sets & heavy usage patterns) and/or the ubiquity of SQL and standardised access mechanisms like ODBC or JDBC (multi-app sharing of data). These characteristics have little to do with use (or not) of the RelationalModel. It is quite reasonable to create non-relational DBMSes that exhibit all of these characteristics, or create implementations of the RelationalModel that exhibit none of them. Haskell and Prolog are notable because they demonstrate production languages that support structure values. Large data-sets et al. are irrelevant. Using JavaScript as a small personal "database", for example, can be conceptually illustrative independent of its ability (or not) to handle large volumes of data or transactions.] * ''You claim scaling doesn't matter with what could perhaps be called HandWaving, but it's hard to tell until its actually done.'' * Scaling matters, of course, but it doesn't matter to a test of ''implementability'', which is what you were discussing above. We can't scientifically compare scalability and other properties of a DBMS that are 'due to structure values' unless we have many DBMS implementations controlled for equal quality. * PageAnchor: Scaling01 * ''It's a mix. If we have infininate resources, then we may be able to have our feature cake and eat it too. For example, strings implemented as tables of characters and/or words but that still act like our usual scalar strings would be possible. Under the hood each string would be represented by a "string ID", which reference a set of tables for holding strings. If we then want to access a string as table(s), we use a function like "sysStringID('columnName')" to enter the table-world of strings. -t'' * I think I'd rather use memoized functions and prepared indices where you call a user-defined function that translates a string to a relation, ** ''This is not what I suggested.'' ** Use of the word "rather" denotes an alternative, so there was no pretense of this being what you suggested. Are you objecting to the fact that I'm not agreeing with you? Or did you just get confused with English? ** ''Rudeness objected. Please tell me what "that translates a string to a relation" is in reference to.'' ** The function, of course. "a user-defined function '''that''' translates a string to a relation". It could be any such function. E.g. one could have a user-defined function that translates a string into (offset,character) pairs in a relation, or a user-defined function that translates a string to a relation of (words). That much should have been clear enough, though, since I specified '''user-defined''' when describing the function. How is it you are confused? ** ''How does it relate to this context? I didn't depend on or mention the usage of such, so who did and where?'' ** Did you not read the remainder of the paragraph before injecting your "that is not what I suggested" comment? You suggest giving users of the DataManipulation language access to "strings as tables" under-the-hood. I describe an alternative that is just as capable of offering 'table' conversions of strings, just as prepared in advance and efficient for indexing, doesn't involve any surrogate IDs like 'string ID', doesn't violate security. All of this was described in the last 4/5ths of the paragraph that you split in two just to object that I wasn't agreeing with you. I'm starting to get a bit irritated with your apparent inability to read what is clearly written. * then tell the RDBMS to prepare for that function in advance and even produce inverse indexes from it. That would give more freedom to the optimizer above your 'sysStringID' approach, and wouldn't require peeking under the hood... one would simply use full-sized strings all the time. It also wouldn't violate security (security problem with your approach is that you either give everyone access to every string in the entire database under-the-hood, or they don't get to enter the "table world of strings" at all, and it's going to involve awkward workarounds to do anything in between). I described this approach above. If the goal is performance and translation, the best way to achieve both is memoization + indexing. It's a common technique in FunctionalProgramming, and in algorithms design in general. Implementability is covered - one can even use JavaScript as you say. But it would be a pretty damn poor scientific control of variables to compare (for performance properties, usage characteristics, and such) a JavaScript implementation of a structure-value RDBMS to an SQL system written by a multi-billion dollar company who ships it as their primary product. ----- PageAnchor: list3 '''(a) prevent people from accidentally mutating a shared value''' ''We can use the DB security system to keep out whatever needs to be kept out.'' Please justify. How will the DB security system carefully prevent mutations while still allowing the necessary mutations for the 'kits' developers use? ''Encapsulation is a tree-shaped (nested) security system. A DB security offers set-based security, which is more powerful than tree-shaped.'' More powerful by which measure? And what, precisely, is "set-based security"? Can you point me to a set-based SecurityModel and a justification that it is more powerful in general than encapsulation? Encapsulation is pretty powerful; it's the basis of ObjectCapabilityModel. To be honest, it is my impression that what you said here is BrochureTalk and has no technical merit whatsoever, but I'm willing to hear your answers. '''(b) support automatic GarbageCollection''' ''You keep talking about this, but never show how it's relevant to a practical problem (let alone define it in a DB context), at least in my domain. If it makes your domain easier at the expense of my domain, then fooey on it.'' I'll dumb it down as far as I am able, but as I do so I seriously question your professional competence. * ''I question your business domain experience. Your examples are often unrealistic.'' * Whatever questionable degree of competence you possess in business domain experience is not particularly relevant to claims of node and index sharing, RDBMS implementation, and other areas in which you've been offering statements and advice. * ''I'm looking for practical tools, not academic MentalMasturbation. If you cannot show stuff helping practical and typical settings, then practitioners will rightfully ignore you. There's shelves and shelves full of dusty academic toys. Attach it to the real world if you don't want to be dust in the wind.'' You have described something you do, that you stubbornly call "normalization" despite objections from peers and academics, whereby you mean factoring a database to avoid 'large' duplicate values. For example, you'll perform something like the following transform: . . . . O R I G I N A L . . . . . | . . . . " N O R M A L I Z E D " . . . . . . . a_table . . . . . . . . . . . . . | a_table . . . . str_table . . . . . . . . . . attr1 attr2 . . . . . . . . . . . | attr1 attr2 . . key string . . . . . . . . . . 1 . "a very long string 1" . . | . 1 . . 10 . . 10 . "a very long string 1" . 2 . "a really long string 2" . | . 2 . . 11 . . 11 . "a really long string 2" . 3 . "a very long string 1" . . | . 3 . . 10 . . . . . . . . . . . . . . . . . . 4 . "a really long string 2" . | . 4 . . 11 . . . . . . . . . . . . . . . . . . 5 . "a very long string 1" . . | . 5 . . 10 . . . . . . . . . . . . . . . . . . 6 . "a really long string 2" . | . 6 . . 11 . . . . . . . . . . . . . . . . . . 7 . "a very long string 1" . . | . 7 . . 10 . . . . . . . . . . . . . . . . . . 8 . "a really long string 2" . | . 8 . . 11 . . . . . . . . . . . . . . . . . . 9 . "a very long string 1" . . | . 9 . . 10 . . . . . . . . . . . . . . . . . By doing so, you have reduced the apparent duplication of the strings, albeit many RDBMSs would still have the same count of strings in the physical database both before and after the transformation. A potential "benefit" is that you could now have the option of changing ALL references to a given string just slightly (e.g. you could ''UPDATE str_table SET string="a short string" WHERE key=10;''). * ''You are missing the practical side of things. Often "long strings" are titles that are subject to change. '''They rarely exist in isolation''' outside of toy examples. We want to be able to change the title without changing the whole reference. For example, the title of a web page, news article, power-point presentation, etc. We want to keep the same reference, but only change the title. Thus, in practice it would be an "article_ID" stored in the "tree". You are beating up a strawman here. '''Why are you spending 99% of your time "solving" the 1% occurances?''' -t'' * I'm providing a simple example to isolate a point. And you, sir, are the sort of arrogant imbecile who makes that especially difficult. Anything too simple, and you refuse to understand it because it isn't "realistic" enough. Anything more complicated, and you can't understand it because, frankly, you aren't "competent" enough, often because you refused to understand the simple version. And you ''never'' accept any of the blame for yourself when you fail to understand something. It's rather difficult to explain anything to you. A more 'realistic' example, a table of messages described by trees, is below. Regarding your 'titles' discussion about "changing only the title", I agree that one would typically have some sort of ObjectIdentity for 'article_ID' (albeit almost never in a "tree", since we're discussing DomainValue trees not hierarchical-relationship trees), but that is ultimately not helping you understand the problem. Entity-identity for article is not the same sort of thing as you described above with your so-called "normalizing" to isolate large values. I.e. if 4000 articles have the same title, or the same set of 'keywords', perhaps your 'heuristic' would have you isolate the title or the keywords set. In that case, you would be in the situation described above. * ''The example below is not tied to any real-world scenario. And if 20% or more of all books share titles with 4000+ other books, then monkeys will fly out of my arse and I'll have to see the proctologist. Until then... Hey, maybe try the Chinese phone book companies. However, the probability of repetition in practice seems inverse to title size. The pointer to "Wong" may be longer than the name itself. (Remember the joke?: "You have more chins than the Chinese phonebook")'' * Article titles isn't the best example, which is why I also mentioned sets of keywords (which are pretty common to be shared among other entries). Anyhow, I'd have thought a proctologist would be your main doctor.... * ''Your shrink and proctologist are the same doctor. Generally common key-words in a large system implies the need for categories. Categories reduce the need for synonyms. Supplimental key-words then support the rarer classifications. A tool would check for common key-words, and these would generally be turned into a category.'' On the negative side, your maintenance requirements have increased. First, if you wish to maintain the sharing, you need to write a bunch of helper functions to query the 'str_table' and decide between using an existing key and creating a new one. Second, you need to perform GarbageCollection: Consider the case of deleting nodes 1, 3, 5, 7, 9 from a_table because the data they represent is no longer valid, possibly from five different transactions. Since all references to str_table where key=10 have been deleted, the value "a very long string 1" is now wasted space in the RDBMS. It isn't as though 'str_table' represents 'data', or that (key:10 string:"a very long string 1") is ''meaningful''. Unfortunately, the RDBMS doesn't know the difference between 'str_table' and a data table, so it cannot automate this GarbageCollection for you (but see note at page anchor: annotating for GC). Unless you want an unlimited amount of this cruft building up over time, you need to occasionally take out the garbage. Of course, it wouldn't be very efficient to GC after each delete, so you'd probably leave the task to a periodic cleanup, perhaps nightly or weekly depending on how fast the cruft builds. So you add something like follows to your nightly: gc_batchfile: DELETE FROM str_table WHERE key NOT IN (SELECT attr2 FROM a_table); Alright, so everything is working, so far so good. But GC rapidly grows more complicated than this simplistic example. As a simple complication, imagine a programmer decides to add 'b_table' that relates pairs of really large strings which also uses str_table in order to get decent value sharing of indexes and the strings. TABLE b_table { strA (key into str_table) strB (key into str_table) } Well, no problem... just don't forget to change the batch process! If you forget, you'll regret it later! gc_batchfile: (updated) DELETE FROM str_table WHERE key NOT IN ((SELECT attr2 FROM a_table) UNION (SELECT strA FROM b_table) UNION (SELECT strB FROM b_table)) ; That was just a simple complication, but it still demonstrates a pitfall and some growing complications. When we start dealing with structure DomainValue''''''s, following your approach of splitting these DomainValue''''''s off into separate tables, those will also need GarbageCollection. Essentially, what your 'destructured DomainValue''''''s' approach does is the following: I'll use 'type MTOS of tnode(String {List MTOS})' for demonstration. ______________________________________ . . . . O R I G I N A L . . . . . c_table msgid value . 1 . tnode("X" [tnode("Y" []) tnode("Z" [])]) . 2 . tnode("A" [tnode("B" [tnode("C" []) tnode("C" [])]) tnode("Z" [])]) . 3 . tnode("F" [tnode("F" [tnode("F" [])])]) . 4 . tnode("X" [tnode("Y" []) tnode("Z" [])]) . 5 . tnode("A" [tnode("B" [tnode("C" []) tnode("C" [])]) tnode("Z" [])]) . 6 . tnode("F" [tnode("F" [tnode("F" [])])]) _______________________________________ . . . . " N O R M A L I Z E D " . . . . c_table . . . . tree_table . . . . . . msgid value . . key parent seq value . . 1 . . 7 . . . 7 . 0 . . 0 . "X" . . . 2 . . 8 . . . 8 . 0 . . 0 . "A" . . . 3 . . 9 . . . 9 . 0 . . 0 . "F" . . . 4 . . 7 . . . 10 . 7 . . 0 . "Y" . . . 5 . . 8 . . . 11 . 7 . . 1 . "Z" . . . 6 . . 9 . . . 12 . 8 . . 0 . "B" . . . . . . . . . . 13 . 8 . . 1 . "Z" . . . . . . . . . . 14 . 9 . . 0 . "F" . . . . . . . . . . 15 .12 . . 0 . "C" . . . . . . . . . . 16 .12 . . 1 . "C" . . . . . . . . . . 17 .14 . . 0 . "F" . . . . . . . . . . 18 .17 . . 0 . "F" . . (Just to note: the 'child_id' schema could have been used here, too. Sharing would be improved but GC wouldn't be any easier.) Anyhow, what would the gc_batchfile look for in this case? Well, a simple stab at it in a procedure: gc_batchfile_tree_table: TEMP T = SELECT value AS key FROM c_table; TEMP SAVED = T; // RECURSE DOWN EACH TREE AND SAVE CHILDREN WHILE(NOT EMPTY(T)) { T = SELECT tree_table.key FROM T,tree_table WHERE T.value=tree_table.parent; SAVED = (SAVED UNION T); } DELETE FROM tree_table WHERE key NOT IN T; This recurses down a tree. It should work so long as c_table.value always points to a 'root' tree node (where parent=0). If not, the problem gets a lot more complicated to perform efficiently. (If you don't believe me, you try it.) Anyhow, this is the sort of GarbageCollection to which I've been referring, and so far you're only looking at relatively simple examples... for example, you're not dealing with graphs. The 'child_id' schema I described for the 'MTOS' tree would involve bouncing back and forth between two tables to decide what is conserved (but wouldn't require a search for parents). Anyhow, the problems with these batchfiles are multifold: * they are batchfiles, which might be okay for your domain but isn't okay for realtime domains. * there is considerable potential for error, especially if new tables are ever added to the database. * they are exhaustive, which is really, ''really'' inefficient. RDBMS-supported GarbageCollection of DomainValue''''''s could be far more surgical by examining the logfiles. * implementing them is, frankly, unnecessary work if you simply have FirstClass support for large DomainValue''''''s and don't try to do the RDBMS optimizer's job! Automatic GarbageCollection would be so much more convenient. (page anchor: annotating for GC) Perhaps you could somehow ''annotate'' 'str_table' and 'tree_table' as being GC'd, but that harks back to what I said in RelationalTreesAndGraphsDiscussion when I asserted that ''the requirement for GarbageCollection serves as a practical, distinguishing property between DomainValue and WhatIsData''. That is, if you do annotate tables for GC, after doing so you'll know '''precisely''' which tables could have been represented via structured DomainValue''''''s. Each GC'd table could be a structure value-type. I'd even say "should". I'm a bit curious as to whether TopMind simply lets cruft build up in the DBMSs he manages, since he seems so unaware of the need for garbage management. Or does he push the job to the application? ''Generally data removal falls into one of 4 clean-up categories:'' * It's a temporary table for a temporary process and is cleaned up by the process or an auto time quota. * It's domain data that users are responsible for cleaning up or removing, such as a product in a product catalog. * Some kind of time quota is set, such as keeping deactivated customer info for 7 years. * Automatic deletion based on some time-sensative process, such as schedules for events passed. You have a serious problem, TopMind, in that you keep thinking of those tables you "normalize" to avoid value repetition as containing 'data' rather than 'shared values'. I suspect you're simply calling it "domain data" and '''inappropriately''' making cleanup the user's job. ''It's your claim, so the evidence is your burden. Give some typical biz examples of RDBMS failing to do your def of GC. Or, are you just playing word games? Is it not the domain user's job to remove products from the catalog? The programmer and the computer cannot know when marketers decide to pull a product by reading their minds. If the catalog is not the kind of scenario where you see problems, then state one in which you DO see problems.'' Deleting products from a catalog is not an issue. Deleting value resources (strings, images, geometric information, etc.) associated with and shared among entries in catalog (or, more realistically, among variations and versions of a catalog that are targeted for different localities, seasons, etc.), however, would be a problem. The problem scenario is when you end up with (key,value) pairs of any sort where 'key' is a surrogate that has no meaning in the real world. ''It's called "cascading deletes". And one can set up ON_DELETE triggers for more fine-tuned/custom deletes. Setting this up right requires knowledge of the domain. A compiler cannot know the needs of the domain by itself. And please flesh our your locational/seasonal example.'' You know, I sort of figured you'd bring up "cascading deletes", but I let it alone in part because I wondered if you were incompetent enough to bring it up. '''Cascading deletes work in the opposite direction'''. They don't help here; their use would be more like "I deleted this image, and now all the catalog entries that used that image are gone!" - that is, they delete entries containing a foreign key in response to their primary key being deleted. Also, ON_DELETE triggers suffer every one of the problems with the gc batchfiles described above, plus greatly improved inefficiency. Anyhow, I put no small effort into explaining GC to you, and all you've done is look for excuses to ignore it, so I'm not feeling particularly charitable at the moment. When I am convinced you have made an honest effort to understand what has already been written for your benefit, perhaps I will detail the example. ''You've provided no evidence that cascading deletes and on-delete triggers are not doing their job IN PRACTICE. You again appear to be '''inventing problems''' for your "solutions".'' I provided reason. You appealed to cascading deletes without even that much, and with a clear lack of understanding as to their utility. When I see such incompetence from you, it does not make your claims all that convincing. * ''That's not reason, it's Hand-wavy double-talk. Why can't you just provide a scenario? Are you afraid of scenarios? '''Scenariophobia'''. '' * I've provided scenarios. Two of them above, plus a sketch of another. You have been applying shitty, illogical, irrational reasons for dismissing them, along with hand-waving shouts in favor of objectively incorrect solutions (like cascading deletes). Beyond that you've made shit up about 'set-based security', you failed to accomplish even a simple 7-node example, you still are ignorant of exactly why copies get made in the parent_id schema, you state untrue things about RDBMS implementations. and making up silly claims about 'phobias'. Tell you what: you find someone else to explain logically that the other scenarios I've provided are insufficient, and I'll write another for you. As is, all I have are your claims and objections, and your piss-poor track record to back them up. ''Please clarify your catalog image example. Note that whether accumpanying images will be deleted or not may be a business decision. If they paid a graphics designer thousands of $ to create many of the images, they may decide to keep them around in the off-chance it will be needed again. Re-entry of the catalog item itself may cost $10, so they don't "protect" that side. A machine cannot tell people what they want to keep and toss. It's not smart enough to make case-by-case economic decisions like that.'' That is hardly at issue here. If people want to keep the image, it would be sufficient to keep a reference to it. You are directing your arguments at a StrawMan. ''That's one approach, but not necessarily the superior one. For one, the "addressing" could be obscure. If Milfard Pinkerwikkle keeps personal links to it, but nobody knows about Milfard Pinkerwikkle's links, then to all others it's as good as lost. If we instead keep them in the same table, then people remember the name of the table because that's where the images are kept, deleted or not. You are using speghetti thinking.'' If you want to treat the images as distinct entities, that's available no matter what the approach, and still requires keeping a reference to each image. Anyhow, your entire line of argument is irrational. You're basically arguing that disconnected pieces of data should stay around after people intentionally deleted that data, and you're inventing scenarios where it 'might' be useful but only if people didn't intend to intentiaonally delete the data. The twisted, 'spaghetti thinking' here is your own. If your goal is a versioned database where one can recover past versions of the database information, there are far more direct and efficient and correct ways to go about it. '''(c) allow the serialization of structured values to be standardized and optimized separately from its persistence layout (so each RDBMS can go its own way when deciding how to represent these structures)''' ''Databases are more than just "persistence mechanisms".'' I agree. Nonetheless, any production RDBMS contains a persistence mechanism, and will need to optimize it to support rapid access, indexing, queries, AtomicConsistentIsolatedDurable transactions, and so on. Are you going to pretend this is not a relevant issue? ''And nothing prevents one from re-formatting DB info as needed. "Standardized"? Please clarify.'' Your approach of representing "trees" as, for example, "tables that have 'key' and 'parent_id' attributes" (as you mentioned in RelationalTreeEqualityExample) is defining a representation - "standardizing" a "layout" - for both the representation of that tree in the database and its serialization. Use of structured values allows you to separate these concerns. Only the serialization needs to be standardized. And integrating with those "kits" you mention would often "prevent one from re-formatting the DB info as needed". So would backwards compatibility with distributed applications. '''(d) allow one to operate on DomainValues as whole units (e.g. for an equality test) which helps composability and GenericProgramming and eliminates need for representation management helpers''' ''Nothing prevents one from making equality comparitors for table-based structures.'' That reply isn't relevant. The stated benefit is the ability to operate on DomainValue''''''s as whole units, and equality-tests are but one example. Your 'tableEquals(Tbl1 ID1 Tbl2 ID2)' approach described in RelationalTreeEqualityExample is not a solution that supports composability or GenericProgramming, and '''is''' a representation management helper. ''Are these common needs, or another one of your lost "solutions" looking for a problem?'' [Extremely common. These issues arise any time it is necessary to store and retrieve a complex value in a database. It probably occurs very rarely when implementing common business-oriented applications like CRM, ERP, e-Commerce and so forth (the domains where SQL DBMSes hold sway), but occurs in every domain where complex values are present and it is appropriate to use a DBMS. It is quite common to essentially re-invent DBMSes on a per-application basis rather than endure the difficulty of shoe-horning complex values into existing SQL products. If SQL DBMSes possessed powerful type systems like those found in Haskell et al., this would not be an issue.] [I'm surprised, in fact, that the major DBMS vendors haven't already latched onto this as a way to expand their product sales into relatively un-tapped markets like scientific computing, games development, weather prediction, systems programming and so on. Defining powerful and DBMS-friendly type systems is an active research area, so I suspect it's only a matter of time before they appear in industrial systems.] ''Perhaps because many such shops find a way to do it without a type-heavy approach such that the intersection between the domains you mentioned and those who like a type-heavy style is too small for Oracle.'' [Sorry, I don't know what you mean by "type-heavy style". Do you mean TypefulProgramming, StaticTyping, ManifestTyping or something else? Anyway, needing to deal with complex values which can be tested for equality, etc., and treated as encapsulated units is hardly a "style"; in many domains it's a requirement. This has nothing to do with StaticTyping, DynamicTyping, ManifestTyping, or any other kind of TypeSystem implementation, other than the need to create and manipulate complex values with the same ease as strings or integers.] ''Well, that's not a common need in my domain. If my domain has to sacrifice flexibility and sharability to get that, then it will not fly here. If you can demonstrate that one can have such without sacrifices and without fully tossing existing RDBMS knowledge, then please do. It still appears as if "hard" encapsulation has a big penalty. You are only trying to sell the benefits without addressing the downsides for my domain. I can live with soft encapsulation where nodes can fully participate in relational operators by default for any new "structure type". You can have '''all the type operators you want, as long as that does not preclude relational operators on those nodes.''' You can only tear away relational operators from my cold dead fingers. Keep your feature-crippled types out of my domain neighborhood. -t'' It has already been established at (page anchor: node sharing example) that your approach is the one that sacrifices node and index sharing, and that there is no loss in flexibility. It has also been established in PROOF ONE .. PROOF FOUR that encapsulation of representation is okay, and that there is no encapsulation of structured DomainValue. You have demonstrated none of your claims that these "sacrifices" exist, or that there are "downsides" for your domain in supporting structured DomainValue''''''s. We're not even taking relational operators away from you. Your opinions are not the product of reasoning, TopMind. Your entire line of argument has been one piece of unreasoning faith, extreme ignorance, blind denial, and fallacy after another, and that's when you're not outright making shit up like your comment about the 'power' of set-based security at point (a), above. You should be ashamed of your behavior. But, attempting to reason with you is pointless. I'll continue on integrating functional, relational, and other systems. You'll continue being a ferrous cranus with a net negative contribution to every technical forum you visit. [Top, if complex types aren't needed in your domain, don't use them. No one is advocating that we remove canonical types or alter the RelationalModel, so everything you do now can be done '''exactly''' the same way you do it now. I can't envision any circumstance where providing support for complex types would result in a sacrifice anywhere. Adding 'tree', 'graph' or whatever types is exactly the same, in concept, as adding 'date' and 'money' types to the canonical list of 'integer', 'string', and 'float' types. Indeed, a proper type system would mean that tree values, graph values, date values, money values, integer values, string values, and values could be manipulated with equal ease, with no compromise to flexibility and sharability. Why should a tree value be treated any differently, in concept and from the RelationalModel's point of view, from a string, date, or integer value?] ''Like we already discussed, with strings, one cannot use the "regular" relational operators on them without conversion and copying. There is no "equal ease" for them. I don't want to do the same with graphs and stacks. The author of a graph or stack may not allow/add any relational operations or ability to join with tables out of spite, stupidity, or laziness. The short-sighted will add operators for immediate needs and only immediate needs. I know how developers/designers act on a general basis. -t'' So... you're a hypocrite because you use strings, eh? You stop using strings for a few years, then come back and tell us whether you still believe FirstClass support for structure DomainValue''''''s is not useful to the business domain and, by extension, to other domains. Until then, I'm calling you on sheer, utter hypocrisy. Whatever appeals you'd make in favor of strings, performance or elsewhere, apply equally to other structure DomainValue''''''s. * ''Like I said before, a RDBMS *could* represent strings/words in tables. It just has not been practical to do so. Maybe if hardware gets cheap enough, it will. Plus, strings are usually small enough that copy-into-table is a sufficient workaround. But for a large, shared structure, it may not be. For isolated trees or stacks with less than say 2000 nodes, local copies may be sufficient. But if we want big trees and stacks and/or share/cross-reference them with other tables/data, then we need nodes that can act like DB table records (a super-set of a table record, perhaps). Strings are in the middle grey zone. -t'' * Unless you say it '''*should*''' represent strings in tables, and you do so with ''equal fervor'' to your claims about trees and graphs, your beliefs on this matter are hypocritical. If you think 'up to 2000 nodes' is okay for other structure DomainValue''''''s, then try it with strings: any time a string is ''possibly'' more than 2000 characters or words or some other sane unit, break it down into a table, refuse to keep it 'whole' even briefly; instead, write 'helper' functions for equality tests the same way you suggest for trees. I believe you are preaching something you have never practiced for reasons that you have never properly examined (which is obvious given you keep making claims about sharing and "copying" and such that have truth only in the most naive of implementations). Well, try it. You destructure those strings the same way you suggest people outside your domain destructure their DomainValue''''''s. Try using this in practice. Then try preaching this string-destructuring to your fellow "practitioners" rather than standing on a soap-box and blaming people you consider 'academic' for not listening to what you consider to be practitioner's wisdom. * ''How about we let usage be our guide. If you need operation X a lot, then find ways to make operation X be easier. If we only need it occasionaly, then a slightly-less-than-natural "helper" function kit may be sufficient. *If* we are forced to choose between a first-class "type" and a second-class "type" (such that relational trumps it), then the choice should depend on usage. If the usage within different domains would put it in different camps, and we cannot find a way to make such choice a fairly easy toggle-switch, then perhaps each domain needs its own database system. -t'' * While your proposition sounds reasonable, I don't think you're in a position to be offering advice about "guides" between two options until you've thoroughly tried both. EatYourOwnDogFood. Seriously, try it before you sell it. Are you not willing to follow your own advice about structure DomainValue''''''s the moment the structure DomainValue under examination is the one common in your domain? * ''Such advice applies to both parties.'' * Indeed it does. Fortunately, I've already tried what I suggest, since arbitrary-length strings '''are''' structure DomainValue''''''s. * ''You've tried strings that are also instantly accessable through relational ops? No you haven't.'' * Where did I promote supporting application of relational operators on anything but relations? I do not believe I have. If I have not promoted such, it is irrational of you to accuse me of not trying it before promoting it. [By the way, if you want to be a total relational purist, the only type you should have is 'boolean' -- everything else (including operators) can be represented as boolean column values, tuples and relations. A string or even an integer is conceptually no less a "complex" type than a tree; it differs only the structure of its internal representation and in the operators that make it useful to a user. Because of established convention, we are used to trivially sharing canonical types, but this is effectively a relatively recent achievement in the history of computing. There is nothing that precludes eventually sharing trees and other complex values by convention, and there are even existing means for doing so.] ''No, because relational does not define the "domain math" (cell-level).'' [It doesn't have to "define the 'domain math'" for this to be true. That operators can be defined as relations on boolean values is true, even if the contents and headings thereof are not defined by the model.] ''I'm not necessarily a purist. I'll accept "violations" if there's a reasonable cause. I've already described the practical issues. For example, if ADT's encapsulation "locks" tree nodes from participating in relational operations unless the ADT builder bothers to "allow" them, that is a problem because they usually won't. Experience tells me that interface authors do as little as possible and only satisfy immediate requirements. '''Complex structures need "soft" walls to be useful in my domain'''. -t'' TopMind has raised practical '''non'''-issues, encapsulation of ADTs being one example (countered in PROOF FOUR, above). [On one hand, I can see a certain limited value in providing operators to map user-defined complex values to and from relations. However, you appear to be seeking a certain relational symmetry across all elements of the language, wherein any (and every) given value is simultaneously represented as both its own type (i.e., has its own type-specific operators, presumably) and a relation. In principle, I can appreciate what you're aiming for -- and it wouldn't necessarily require that the internal representation of values ''be'' relations -- though I'd argue that ExtendedSetTheory is a better choice for experimenting with universal, er, homologous-ness than the RelationalModel. As we know, trees, graphs and other non-tabular structures don't map particularly elegantly to relations. However, extended sets map quite well to a wide variety of structures whilst easily incorporating the RelationalModel should it be deemed appropriate. D L Childs (the inventor of ExtendedSetTheory) has been advocating this approach since the late 1960s, and it's notable that in DrCodd's initial paper on the RelationalModel, the first reference is to one of D L Childs's papers on ExtendedSetTheory. Unfortunately, I suspect Childs's rather obscure writing style -- and the fact that the RelationalModel is probably more intuitive than ExtendedSetTheory -- means the former has seen wider adoption than the latter. ExtendedSetTheory has, however, been successfully used as the basis for some powerful database systems.] [On the other hand, if some pre-written UDT doesn't provide the operators you need, you can simply create your own UDT that does. That is a strong argument in favour of UDTs in general.] * ''Everything is convertible to everything else with enough effort and perhaps enough copying. But this is obviously far from the ideal. If we have 30 UDST (s=structure) and they lack relational operators, then building something for each could require up to 30 x R operators where R is the number of relational operators. Thus, if we assume 15 relational operators, we have up to 450 operators to implement. And this says nothing about efficiency, concurrency, etc, as our million-element stack report example illustrates. -t'' * If nothing has been said about efficiency and concurrency, then there isn't much reason to believe they suffer. And why would I implement 450 relational operators when I could write simple functions to construct relations from values then use the resulting relations? That way I'd need at most 30 such functions, likely fewer with GenericProgramming and composition. * ''A '''real world example? I want to perform relational operators on my existing file system'''. I know of no easy way without mass periodic copying or slow iterative loops. Maybe there is a way to hack with the OS to maintain indexes automatically or the like, but this is busting encapsulation, making implementation ADT swapping problematic. We'd have to stick with *just* POSIX-like or FTP-like commands if we want to stick to the "purer" ADT model, and thus no internal hacks. -t'' * I imagine that a FileSystem (an entity that mutates in response to commands) differs in many critical ways from DomainValue''''''s, especially including immutability and the ability to receive commands. As noted in PROOF THREE and PROOF FOUR, above, DomainValue''''''s are never truly encapsulated. Since DomainValue''''''s are not encapsulated, then something about your argument is incorrect: either FileSystem DomainValue''''''s need to be immutable constructs (like versions or snapshots) that can be fully observed and processed without any use of 'commands', or the FileSystem mustn't be a DomainValue. * ''Continued in FileTreeMeetsRelational.'' As far as the EverythingIsa relation approach, one would do better with EverythingIsa database where each 'cell' contains a full record of named relations, and each relation is a set of records of cells. Booleans are represented by TableDee and TableDum, integer N is represented by a relation containing integers 0..N-1, and only records of relations, especially including the empty record and the empty relation, are primitive. key: [] is relation or set, () is record, {} is application for now using implicit (positional) labels. Value . . . SetTheory . . . . . . . . . . . . Relational/DB . . . . . . . . . . . Notes 0 . . . . . [] . . . . . . . . . . . . . . . [] . . . . . . . . . . . . . . . . Empty (primitive) 1 . . . . . [[]] . . . . . . . . . . . . . . [([])] . . . . . . . . . . . . . . {0} Union 0 . . . . also: [0] . . . . . . . . . . . . . . . [(0)] . . . . . . . . . . . . . . 2 . . . . . [[] [[]]] . . . . . . . . . . . . [([]) ([([])])] . . . . . . . . . . {1} Union 1 . . . . also: [0 1] . . . . . . . . . . . . . . [(0) (1)] . . . . . . . . . . . . 3 . . . . . [[] [[]] [[] [[]]]] . . . . . . . [([]) ([([])]) ([([]) ([([])])])] . {2} Union 2 . . . . also: [0 1 2] . . . . . . . . . . . . . [(0) (1) (2)] . . . . . . . . . . false . . . [] . . . . . . . . . . . . . . . [] . . . . . . . . . . . . . . . . TableDum, equal to zero true . . . [[]] . . . . . . . . . . . . . . [()] . . . . . . . . . . . . . . . TableDee != 1 in Relational (a b) . . . [[a] [a b]] . . . . . . . . . . . (a b) . . . . . . . . . . . . . . . (a b c) . . [[[a] [a b]] [[[a] [a b]] [c]]] . (a b c) . . . . . . . . . . . . . . (a b c) is database in relational/db (x:a y:b) . [['x' a] ['y' b]](*) . . . . . . (x:a y:b) . . . . . . . . . . . . . 'x' and 'y' special in SetTheory approach. As syntactic sugar and under-the-hood implementation, one would expect that natural numbers and such are presented in their more traditional forms even if they provide relational or SetTheory semantics in accordance with the 'representations' of the value. For simple natural numbers, the each-cell-is-a-database representation is somewhat unwieldy compared to SetTheory, but their use makes more readily for structured extensions and for typing (by defining each attribute as a whole schema, complete with inter-relation constraints). For example, one can model rational numbers using (dividend:1 divisor:3), and graphs and trees can easily be represented without confusion regarding labels vs. values common to the pure SetTheory approaches. One is still missing a critical feature for graphs and trees is a 'scope' for names. I.e. the following two values represent the same graph up to an isomorphism on vertex names (which are typically assigned in '''accidental''' order): (vertices:[0 1 2 3] edges:[(1 2) (1 3) (1 0)]) (vertices:[0 1 2 3] edges:[(2 1) (2 3) (2 0)]) // the above could be shorthanded to 'vertices:4' Achieving automatic isomorphism here requires some primitive support for scoping of 'identifiers' (or surrogate keys / 'points') as distinct from integers and scoped to some miniature database. This requires a simple, but significant, extension to relational, such that each database-value occurs in a scoped identifier 'space', potentially containing points from the parent space or from its own space. {space S: (vertices:[{S 0} {S 1} {S 2} {S 3}) edges:[({S 1} {S 2}) ({S 1} {S 3}) ({S 1} {S 0})])} Spaces are compared isomorphically, and essentially degenerate to plain-old-values in the case the space is unused in describing values. Given scoping, 'space S' is likely to appear inside a relation or set defined inside some other 'space R' inside some other 'space Q' and so on all the way up to the 'root' database (which would need to use identifiers in a global space). Use of spaces does require some special relational or set theory support when performing unions, intersects, etc. across sets in two different spaces. I haven't read enough literature on ExtendedSetTheory to have formed a solid opinion about it, but my impression is that it mostly focuses on the label vs. value problem (one that doesn't exist in RelationalModel) and doesn't solve the issues of accidental ordering for graphs. ''While an interesting exercise in atomic simplicity, many are not ready to part with relational yet, having grown up on it (or a form of it). Thus, the more immediate goal is to '''extend''' or enhance relational, not replace it. I still believe that a large part of the value of relational is that it maps well to the concept of a physical tabular sheet. This makes it easier to get one's head around and visually inspect and conceptualize. 8-dimensional creatures in a different universe or highly adept mathematicians, on the other hand, may find it limiting; that is, the primitives are "too flat". -t'' [Your goals are different from mine. My interest in ExtendedSetTheory is in providing a common representation (the extended set) for existing structures -- stacks, queues, files in a variety of formats, FrameBuffer''''''s, disks, lists, trees, database tables, etc. -- and an algebra over these. It appears to easily subsume the RelationalModel whilst incorporating other structures without the need to convert them to relations. It strikes me as having essentially the same graph representation problems as, say, XML, EssExpressions, the RelationalModel, etc., and demanding no worse solutions than these require, but I say this having given the issue only the most cursory consideration.] I suspect you may be creating Lisp-like problems: too flexible and open-ended for its own good: insufficient conventions and culture. [Could be, but pre-judging the assumed outcome prior to experimentation would be rather unscientific. What fun (or possible progress) is there in not trying it? If everyone had your attitude, we'd be programming (if at all) in Plankalkül. Or by knocking rocks together...] At least they'd be relational rocks :-) I'm what you might consider a "practical experimentor" in that I try to improve on existing ideas rather than seek out blue-sky solutions. Both types of research are needed. Progress comes from both evolution and revolution. -t '''(e) allows lazy evaluation where functions are only executed insofar as they are needed.''' ''Perhaps. But are you sure a RDBMS cannot also do that same?'' An RDBMS can offer lazy queries - even an NQRDBMS like you favor (not-quite RDBMS where DomainValue''''''s are butchered). But offering lazy production of DomainValue''''''s is a different issue: since in your approach any access to a value requires a query of the whole 'structure value' table, laziness cannot be achieved across accesses to the value, which effectively means that laziness is simply not achieved. --------------- (page anchor: peekaboo) ''From a purely mathematical standpoint, you may be right. However, performance and other practical issues may make it unusable. -t'' What evidence do you have that suggests "performance and other practical issues may make it unusable"? Or is that a baseless accusation? ''It's not baseless: you haven't demonstrated how it's done, such as showing how the index pointers are hooked up to nodes in RAM/disk to provide sharing. Plus, the default is not that it's not an issue. The default is "unknown". '''Your "proof" only addresses theory, not practice.''' -t'' Did you just appeal to ShiftingTheBurdenOfProof? "My accusations aren't baseless," TopMind sneers, "until you disprove them!" If you don't have evidence supporting your accusation, it's baseless. ''I did not make a formal claim. If you want to play lawyer, go elsewhere. I'm only pointing out that proofs for theory don't necessarily translate to practice. Do you disagree with such a statement, in general? A time machine that requires the energy of 1,000 galaxies be harnessed may make for theoretical time travel. However, in practice, we are far away from harnessing such power, so don't have such technology. In theory one can see the bottom of a big stack by popping a million nodes from it, and then pushing them back into place to leave it the way it was. In practice, waiting for the computer to perform such may suck. -t'' I could go around saying: "SMEQL might not be implementable" (it hasn't been disproven yet), or even "TopMind may be downloading child pornography again" (where's your counterproof?), etc. Would you think this is okay? The fact is that your comment '''insinuates''' a claim. You could have, with your "unknown" rule, said something else: "performance and other practical issues may make it superior to TopMind's approach by every metric". Thus, to say "performance and other practical issues may make it unusable" implies you have good reason to believe that assertion moreso than its opposite. If you wanted a neutral statement in accordance with "I'm only pointing out that proofs for theory don't necessarily translate to practice", you could have said "However, we '''don't know''' about the performance yet". Even that wouldn't be entirely true (we have plenty of OOP and FunctionalProgramming languages to which we can look for actual performance), but at least it would be as neutral as the evidence you claim to possess. ''I don't see a significant difference between those two. I'm not sure why the first irratates you more than the second. But anyhow, let's move on rather than argue about arguing. Would it be accurate to say that as written, your proof does not address performance? Another issue to consider is the adaptation effort. Will independent structure types require more adaptation effort and code to serve as different "types"? -t'' TopMind might be wanking off to an image of Arnold Schwarzenegger for the tenth time. Seriously, it doesn't bother you if I went around saying stuff like that simply because I "don't know" whether you're doing it or not? What if I used your real name, instead? If you think it's okay, then we'll just call it a difference of opinion, and we'll move on, but I'll be awfully tempted to say stuff like that just to see whether you get irritated at the insinuated accusations and thus prove your hypocrisy. The other option, of course, is for you to simply admit your sophistry right now and try to do better in the future. Your choice. * ''I don't know what you are rambling about here. Your poor articulation ability is spilling over to your accusations also.'' * There is some probability that TopMind gambles in chicken-fighting rings on the weekends. I'll continue until either you get it or I'm convinced that you really don't see claims made without knowledge as insinuating anything. You don't need to understand in order to participate. Anyhow, performance has been addressed elsewhere (for node/value sharing vs. copies, functional operations including equality tests, indexing and memoization, etc.), and so has adaptation effort (relative difficulty of writing a function vs. a query, GenericProgramming, composition and code reuse), and so has maintenance effort (data entry, collection). Arguments on those points should not be raised again here as if they are entirely new and haven't been discussed. Raising them here makes it appear as though you're just repeating yourself and completely ignoring all the discussion on those subjects thus far, much of which offers reason to believe performance and other practical issues are ''not'' going to be a problem. Your tendency to raise issues like they're fresh even after people feel those points have been defeated in other argument is, likely, part of what earns you your reputation as a 'Ferrous Cranus' as mentioned in ObjectiveEvidenceAgainstTopDiscussion. ''You are the one who did such. You re-mention PROOF X as if the case is closed. That is what prompted me to revisit this section to begin with. And you have not illustrated shared indexing with something as simple as a stack (originally in RelationalWeeniesEmbraceOo PageAnchor "milstack"). You appear afraid to illustrate it, hiding behind academic vocabulary instead. -t'' The case IS closed. I re-mention PROOF X because you keep repeating the same accusations about ADTs and encapsulation that were '''never relevant''' and were '''demonstrated''' to be irrelevant in PROOF X. Your distraction tactics don't change that, and make no mistake, that's exactly what you're doing here: sharing in a stack is irrelevant to PROOF X, so raising it ''here'' is either sophistry or stupidity. But, since you bring it up, ''you'' have also not illustrated any advantage for shared indexing with something as simple as a stack (as you'd prefer to represent it) as compared to using even a completely non-indexed functional approach, and you still have not successfully completed composabile equality tests. ''Just admit that you lack enough real-world experience to be equippped to deal with milstack by showing how it all works. You just handwave that FP or something else magically takes care of everything.'' I've already told you how to deal with 'milstack'. The answer is memoization and pre-processing, just as is already implemented for indexing strings lexically in advance of 'likeness' queries. You lack sufficient programming competence to understand what 'memoization' is, as you demonstrate when you associate 'interning' with 'what, Monica Lewinsky?'. Thus, it appears like HandWaving to you. That is a problem with your competence, not with the solution. You repeatedly demonstrate an appalling degree of incompetence in your statements, such as your appeal to cascading deletes when it was inappropriate, your lack of comprehension about value-sharing and inability to solve a simple seven-node tree problem, and so on. I suspect even the simple stuff is all just magic to you. I am in the process of explaining the mechanism for indexing complex structures at a level that even an barely-competent layman ''might'' understand it for FileTreeMeetsRelational, which gives you a smaller chance of understanding it since I can't compensate for your HostileStudent agendas, especially not in combination with your bouts of plain idiocy. But I'll give it a go since I said I would. ''I'm not your damned student. You are afraid of a concrete example because it puts you up against scrutiny; thus, you hide behind magic acronym handwaving. You exaggerated the need for dealing with repeat words, and were left looking like someone locked in the ivory tower so long that the Hunchback from Notra Dame has a deeper tan than you.'' Every time you ask me for an example, you have implicitly asked to be my student at least for a little while. The fact that you are ''hostile'' to this idea is what makes you a ''damned'' student, and a moron. It takes a rude and disrespectful bastard to ask for examples, clarification, etc. then not listen carefully to the answer. ''Requesting evidence == "student"? You think weird. And, I also think you are an arrogant blowhard idiot who lacks sufficient practical experience. Thus, the hate is mutual.'' Requesting clarification and example == "student". Neither of those are 'evidence'; they're 'explanation'. If you don't want to be a student in a field, then gain competence in diagnosis and analysis of evidence so you don't need to ask for explanations, or at least look things like 'interning' up for yourself when you don't understand. And I'll agree that I lack 'custom biz app' experience, but I don't believe that 'practical' experience to bring into a discussion of RDBMS design. ''You are a spoiled whiner who wants everything on your own terms. There's probably more than 10 practioners for every academic. Thus, it makes sense to target your communication thus. Or at least not complain about such requests like a spoiled elistist. "I don't care if the audience is English, I'm gonna use Swahilli!"'' * [''As a practitioner, I have no trouble following what was being said here. I suspect that 70-80% of the practitioners that I work with would also have no problems following. Of the remaining, about 90% could follow if they first learned the concepts being discussed. I don't see a problem with how the audience has been targeted. BTW, throwing a tantrum because the example doesn't happen to fall in your domain of interest makes you look like the spoiled elitist.''] * You could be exaggerating and I'd have no way to verify. Anyhow, if your technique cannot be demonstrated to be applicable to my domain, then why should I give a flying flip about your odd little example from Mars? Because "all knowledge is useful"? Maybe, but there are more relevant and immediate toys to test than yours. * [''So could you, and since I've worked in a variety environments, including those where I was the only one with any CS after high school, I find it likely that you have. You should "give a flying flip" because you wished to discuss the strengths and weaknessess of the approach. It's your job to transfer this approach to your domain, not the job of the person explaining the approach. Asking us to perform your job, and getting upset when we say no, doesn't do much for the appearance of your maturity.''] * If you want to "sell" the idea of a Type-DB to the custom biz domain, you will need practical demos. I'm just the messenger. I also suspect that your domain may have different needs than mine. Something that works well in domain A will not necessarily work well in domain B. * (page anchor: What-DB?) * [''Quite frankly, I don't care about what you call the "custom biz domain". As far as I can tell, it's just something you use to reject examples as not being "realistic".''] * I try to justify why I don't find them realistic. If you disagree with the reasoning, then please state it. In some cases, I agree it's a matter of anecdotal evidence. But since we usually don't keep detail logs and classify them, our memory is all we have to go on. I imagine we all have a personal bias such that we remember some problems more than others. It would make an interesting psycho/techno study. People come to me on a regular and repeated basis for programming advice, clarification as to the purpose or use of a programming pattern, and occasionally for help debugging their programs. I am one of very few in my immediate workplace who came into programming from a CS degree rather than out of EE or CE; I suppose that makes me the academic among them. I don't complain about their requests, I help them to the best of my ability, and I don't consider them any less intelligent for asking, or even for admitting they blanked on a concept and requesting it be reviewed again. But there is a critical difference between them and you: they listen, they don't have agendas in attacking me when asking for clarification, they don't play HumptyDumpty games with language, they don't interject snide comments like "huh, Monica Lewinsky?", they don't pretend or delude themselves about their competencies, and they don't assume they already know the answers for which they're asking the questions. I.e. they're not self-important HostileStudent''''''s. I honestly believe most 'practioners' would be dreadfully insulted at the way you pretend to represent them, and that most of them would oust you as too belligerent, stubborn, and close-minded to be one of them. The idea that they're standing behind you is entirely your delusion. And, honestly, most academics wouldn't include me among their ranks, and I don't pretend they do. I don't do CBA, I'm not an IT person who maintains a server, but I also don't write white papers and teach classes for a living. I don't even have a Master's degree. The closest thing I have to being an academic is an academic's temperament and habit of being an EternalStudent. ''I have no way to verify what other people think of you. The most treacherous people often think that everybody loves them. G. W. Bush for example. If I ask them outside of your presence, they may have a different story about you. Before I closed my email, I used to get fair volumes of what could be considered "fan mail" from people afraid enter heated debates but glad that I press people to back their GoldenHammer claims. Even from at least 2 book authors. "OOP will save the world!", "Types will save the world!". You are just yet another with pretty much the same song and dance. Anyhow, we both think each other are aholes and I doubt that will change. -t'' ---- (page anchor: What-DB?) Re: "If you want to "sell" the idea of a Type-DB to the custom biz domain..." I don't know what a "Type-DB" is supposed to be, but I do know about relational databases with support for user-defined types. Here's an example. Using TutorialDee syntax, imagine a trivial database of employees and departments with no support for user-defined types: var employee real relation {name char, address char} key {name}; var department real relation {name char, city char} key {name}; Then issue the following query: employee join department Argh! It gives no errors! It ''should'' give an error message, because joining employee to department by matching employee names to department names is utterly meaningless. Yet, it's easy to accidently do in almost any query language, whether SQL or TutorialDee. So, we'll improve it, using user-defined types: type empname possrep {name char}; type deptname possrep {name char}; var employee real relation {name empname, address char} key {name}; var department real relation {name deptname, city char} key {name}; Then issue the following query... employee join department ...and get the following error: ERROR: An attribute named 'name' is found in both operands but differs in type. Line 1, column 24 near 'department' Hooray! Imagine the significant number of careless errors that users and developers can no longer make when creating queries. ''{Questions regarding "significant number" raised below.}'' By the way, structure types can be defined. A binary tree: type nametree possrep node {left nametree, right nametree} possrep leaf {name char}; A RelVar (i.e., table) with a tree-valued attribute: var myvar real relation {blah char, twee nametree} key {blah}; Populate it thusly: insert myvar relation { tuple {blah 'zot', twee node(leaf('zap'),leaf('zip'))}, tuple {blah 'zoz', twee node(node(leaf('zoop'),leaf('zaz')),leaf('zip'))} }; Select tuples from it: myvar WHERE twee = node(leaf('zap'),leaf('zip')) And so on. Obviously, operators can be defined to provide various operations on user-defined types. A few days ago, I used this to "sell" the idea of user-defined types to a senior DBA. Prior to seeing the above, he was quite insistent that user-defined types were useless in his domain. Now he is convinced that they are absolutely necessary. ''The tree example?'' Both, and the general concept. ------- Re: "Imagine the significant number of careless errors that users and developers can no longer make when creating queries." ''Why was my complaint about the frequency claim removed? Related: AlternativesToNaturalJoins. It's the kind of error that is usually caught immediately because it is outright wrong. The problem errors are more often those that work say 99.99% of the time. Or more often, where the developer didn't understand the domain rules. -t'' Because it was nonsense. The meaning was obvious. It's the kind of error that frequently makes it into late development & testing, and sometimes makes it into production, rather than being trivially caught at the point where the query is created, because although it is "outright wrong", it's easy to make and can be hard to see in a complex query. As for the "frequency claim", I changed the offending sentence slightly so there should be no need for complaint. I'll remove this bit, too, once you've read it, because all these interjections make the text difficult to read. Create another section below if you have something genuine to add. ''I will agree that "number" may not necessarily be the same as frequency. But I feel this needs to be explicitly pointed out because frequency of occurrence is generally more important than the quantity of error paths in practice. Also, I offer my anecdotal observation about the actual frequency of those kinds of errors. Such is useful information to practitioners, even if it breaks your types-save-the-world HobbyHorse world-view. Further, gumming up the schema with type declarations may make the schema harder to read, creating it's own problems. Type-ness usually has a verbosity cost. Further 2, if the type declarer is not careful, they may create overly-restrictive conditions. It's another activity that has to be monitored and managed. Developers/designers are not always automatically motivated to think things through. They may create type-spaghetti as job security, for example. -t'' Such errors may indeed be infrequent, overall, in simple schemas. In complex schemas (one example where I work has 40,000 tables), any mechanism to avoid ''any'' otherwise-easily-caught error represents a benefit. The notion that a schema becomes "harder to read" because it contains type declarations is nonsense, and I don't know what "overly-restrictive" conditions you're contemplating. "Type-spaghetti" as job security is pure paranoia on your part. Your arguments make me increasingly suspect that you are not a practitioner at all, but are, in fact, a largely non-technical middle manager, whose desire to eliminate abstractions, types, etc. -- and your insistence that programming be reduced to nothing but simplistic dynamically-typed procedural constructs -- owes entirely to the fact that you're afraid the developers under you will create code that you don't understand and therefore can't alter or micro-manage. ''40,000 tables smells like a design flaw. I'd like to see the justification for such, but probably will not get it from you. Anyhow, without specific scenarios to study, this will simply degenerate to a name-calling anecdote session. But would you at least agree that one could probably make "type-spaghetti" if they wanted to? In other words, heavy use of types by itself does not guarantee proper use of types.'' The system with 40,000 tables is a commercial enterprise data-management system for universities that covers ERP, CRM, HR, payroll, you-name-it, etc. According to one of the DBAs, the biggest design flaws are (a) a total absense of any relationship integrity at the database level (it's all done by the applications! *gasp*), and (b) a no user-defined data types, thus causing the sort of errors noted above especially when creating ad-hoc reports and custom extensions. The tables, however, are apparently appropriately normalised and reflective of the business requirements. I will certainly agree, however, that one can make "-spaghetti" out of any language feature , whether types, tables, scripts, procedures, indexes, loops, classes, methods, lambdas, variables, etc. Because a poor developer can mis-use feature is no reason to avoid , or we wouldn't be programming at all. I find it notable, by the way, that you have not disagreed with my assessment of your job role. ''I've said many times that I'm not a manager. My general experience is that only about a quarter of IT workers keeps a "clean" system, whether that be schema design or code design. This is largely because management does not understand such, or doesn't stay around long enough in one position to care, and thus does not value it. Heavy typing just gives an employee more tools to create job-security messes with. If your experience differs, so be it. You don't have to insult people who observe different situations. And it still smells suspicious when the DB has more tables than students. -t'' * I didn't ask, but I suspect the proliferation of tables is a result of being forced to use them to represent invariant types and structured values. * ''Perhaps a ConstantTable was in order.'' * A ConstantTable is a table! In many cases, they are used to represent invariant types. Incompetent or "job-security"-seeking employees can create messes with anything -- like the incompetent developer I knew who created 100 procedures differing only in name and the initialisation of a local variable, and invoked them sequentially instead of using a 'for' loop. Because procedures can be misused, is that a justification not to use them? No matter how good the tool, an idiot can use it to break something. ''But a wider group of people can keep an eye on such a thing. The more obscure or specialized the technique, the harder it is to scrutinize and monitor. Things just don't magically stay clean in most cases. There has to be an incentive and monitoring.'' Have you even ''looked'' at the example above? In what way is using types an "obscure or specialized technique?" Conversely, why not consider loops to be obscure or specialized? Beginners to programming certainly consider them obscure, as do many non-technical managers. How do you determine where to divide between "commonplace and intuitive (or at least well-understood)" vs. "obscure or specialized"? ''Let's just say its more things to master, leaving the merit issue aside.'' Procedures mean "more things to master" than a language without them. Why don't we avoid procedures? ------ Actually I'd rather have warnings, or the option of warnings, for certain behaviors rather than outright errors. Sometimes when one is in a hurry, they don't have time to surf the type graph to figure out what is going on. The boss is breathing down your neck and you need to deliver ASAP. It's easy to fall into the trap of over-classifying things. -t ''So... What you're saying is: You want to join VAR x REAL RELATION {Name C''''''ustomerName} with VAR y REAL RELATION {Name D''''''epartmentName}, and it turns out that someone has inadvertently defined Name in x as type C''''''ustomerName when it should have been type D''''''epartmentName (the only circumstance I can see that justifies your suggestion), and your boss is breathing down your neck, so you'd like to join them anyway? Fine, in DateAndDarwensTypeSystem, the components of a type are always exposed. E.g., given:'' TYPE C''''''ustomerName POSSREP {a CHAR}; TYPE D''''''epartmentName POSSREP {b CHAR}; VAR x REAL RELATION {Name C''''''ustomerName} KEY {Name}; VAR y REAL RELATION {Name D''''''epartmentName} KEY {Name}; ''You can effectively bypass the type definitions by using their possrep components (i.e., a and b, which happen to be the same type) and join them as follows:'' (EXTEND x ADD (THE_a(Name) AS N''''''ameForJoin) {ALL BUT Name}) JOIN (EXTEND y ADD (THE_b(Name) AS N''''''ameForJoin) {ALL BUT Name}) Okay, but it's still one extra step. I'd like to see more payoff for all that beyond just preventing unintended joins. That's not at all near the top of my problem list. ---- MarchZeroNine