Dan Muller's Wiki page

(Please add new messages at the end of this page.)

----
You can send me an email message via http://spookydistance.com/dmuller_mail.html. I no longer publish an email address due to address harvesting by spammers.

See also http://www.spookydistance.com/dmuller/ and http://grault.net/adjunct/index.cgi?DanMuller.

...created Dec 22, 2000.

I'm a software engineer with strong interests in many aspects of the profession, which is why I kept ending up at Wiki -- all of the most interesting topics I search for on the Web have links to this place.

Some of my favorite subtopics are programming languages and programming tools. (With respect to tools, I tend towards the UNIX-style command-line camp, and am not a fan of comprehensive IDEs.) I've been working with C++ for many years and feel that I have an above-average grasp of the language. I've also worked a lot at various times with Pascal, Lisp, FORTRAN, and Perl. Since 1998, my day job has required learning a lot about database theory, which eventually led me to read DateAndDarwen's "Foundation for Future Database Systems: The Third Manifesto", (renamed second edition of "Foundation for Object/Relations Databases: The Third Manifesto"). This book sent me off on another jag of ruminating about programming language design. See TheThirdManifesto.

Just for fun, here are the programming languages I've worked with to varying degrees over the years, in rough chronological order:
* BasicLanguage (just a wee bit)
* AssemblyLanguage (IBM mainframe many years ago, Motorola 68000 family, Intel x86)
* PascalLanguage
* PL/C and PL/I (PliLanguage)
* LispLanguage (cherished educational experience and rediscovered love)
* FortranLanguage
* SnobolLanguage
* ModulaTwo
* Makefile languages of various flavors
* Custom bit-sliced microcode
* IconLanguage (briefly, for fun)
* CeeLanguage (followed rapidly by...)
* CeePlusPlus (the bulk of my programming career)
* PerlLanguage
* RubyLanguage
* OzLanguage (for fun only -- very interesting)
* .NET's Intermediate Language (IL), a virtual machine 'assembly language' (sort of).
* CeeSharp

Notably absent from the above: SmalltalkLanguage, PythonLanguage, JavaLanguage.

Other background details to satisfy the daily deluge of inquiries about my exciting past:
* Married, no kids.
* Hobbies: reading, woodworking, computer gaming
* Have worked on process control, networking (client and server), drivers for various OS, bootstrapping, bookkeeping applications, etc. etc.
* Occasional project lead but usually just a senior contributor.
* Interesting companies I've worked for in the past: Combustion Engineering (nuclear power plant monitoring), Prime Computer (one of the big minicomputer manufacturers that is no more), Banyan Systems (one of ''the'' major network operating system vendors, that is effectively no more), Digital Equipment (briefly, when it was already in decline), Gerber Scientific Instruments (one-time industry leader in photo plotters and large-bed plotters)
* Bachelor of Science in EE/CS from UConn, 1984, cum laude
* Born in Detroit, grew up in Michigan, Switzerland, and Connecticut.

C2 Wiki pages I've started (very few):
* IdealProgrammingLanguage (where I invite people to discuss their ideal language, and describe mine -- but my views have changed considerably since then)
* AllFeaturesShouldBeSimple (where I disagree with the premise, which came from another page)
* DateAndDarwensTypeSystem (where I tried to give a basic description of their type system)
* NotesOnaCeePlusPlusRdbmsApi

----
[2005-01-22]

'''On DateAndDarwensTypeSystem'''

Costin thinks that this type system is too informally or incompletely described to be of interest. I'm not sure that I agree. I've done some reading in the texts on type systems that he has recommended, but haven't finished. Here are some thoughts that have arisen so far.
* Costin is correct, because he's comparing it with much more complete and much more formal treatments. That's not to say that ChrisDate is utterly incorrect every time he opens his mouth on the topic. It's just that Date only addresses a subset of the issues -- and Date has gotten some of those issues wrong for long periods of time, as he himself admits more recently. He's not a good authority on types, although many of his discussions are adequate if one understands their limitations. See...what's the c2 page/s for Cardelli et al? Anyway, Date isn't '''always'' wrong, but this topic really isn't in his area of strength and background, so he does tend to be run off into the weeds.

The referenced treatments of type systems seem to be based heavily on types composed from parts, i.e. structure-like and union-like systems. There seems to be an alternate school of thought on types based on behavioral interfaces, though, which largely treats the values described by type systems as atomic. I haven't read far enough yet to see how/if these views are reconciled. At some level, the two views have to touch; composed types are, after all, ultimately composed from atomic, built-in types.
* They do have to touch, but Date's view is not particularly informed by theory on that topic, despite the fact that he seems to like taking a theory-based approach in many other areas.

In D&D's type system, composed types are described entirely by tuples. Collections of values are largely described by relvars, as collections of tuples. (The possibility of other collection types is left open, but let's leave that aside for now.) Other types (all of which are considered ''scalar'') are considered opaque in structure -- and in fact D&D explicitly omit discussion of their internal structure and how to describe it. This, I believe, is one of the aspects that Costin most takes issue with. In this respect, for the domain that D&D are primarily concerned with, they don't really differ from built-in types, so I don't really see the lack of a formal means of describing their internal structure as a debilitating impediment.
* Well, yeah...as Date frames the discussion, he doesn't leave much room for anything other than new types being formed by composition of old types, plus some validation of what's allowable and what's not (which TopMind seems to have latched onto). That's not so much completely wrong so much as it is non-theoretically-based and ad hoc. It's kind of a 1960s approach to types, minus the 1960s understanding of topology and its derivatives (operators as mappings between spaces). It is, in short, highly unsatisfactory and overall uninformed and naive, even though it makes a lot of pragmatic sense in the ways that Date discusses it.
* Put another way, it's not necessarily debilitating to the pragma of an everyday SQL programmer, but no way in hell is it adequate theoretically; it's not a point of view that is even informed on the theoretical issues. That's not inherently the same as "wrong", but it does make it more difficult to distinguish where he is wrong and where he's kind of ok rather than on thin ice (or wrong).

The introduction of THE_* operators and their use as pseudo-variables (assignment targets) seems to imply structure in these scalars. But I think that the explanation of assignments to THE_* expressions as shorthands for construction of new scalar values followed by assignment to variable, updating the ''entire'' state of a scalar variable, is an adequate formalism that allows one to continue considering such values as indivisible scalars. The question of ''what'' computations, exactly, are done by such functions is not really at issue; they can be considered as built-in manipulators of opaque, built-in scalar types, without invalidating the rest of the discussions which focus on the relational system.
* ChrisDate says explicitly (AnIntroductionToDatabaseSystems, page # available on request if it's not handy in the index (I have notes)) that THE_* assignment is SyntacticSugar, that it decomposes to multiple assignments to the components of the domain in question -- DMe
* Sorry, but that's incorrect. Assignment to a THE_ pseudovariable is shorthand for accessing ''other'' components, and passing their values and the assignment source value to a selector (aka constructor), defining a value that is the source for a ''single'' assignment to the complete variable. Which exactly supports my point. See page 126 of the second edition. The fact that the book doesn't address the definition of new atomic types is immaterial to the topics it covers.
* Example from the book: '''THE_X(P) := L''' is equivalent to '''P := POINT(L, THE_Y(P))'''. -- DanM

* '''Second edition'''? He frequently refers explicitly to corrections of errors (factually incorrect statements, not minor typos) in older editions; it's probably time to buy a later edition. Due to huge volume of class use, you can usually get them dirt cheap in like-new condition (modulo back to school buying sprees, I suppose) 
* I don't see how that contradicts what I said; maybe I used a poor phrasing? Explicitly, Date said (section 8.3 ''Attribute Constraints'' p 252 of 7th edition, in footnote): "In other words -- and despite the fact that we did not mention the point explicitly in Chapter 5 -- THE_ pseudovariables are logically unnecessary! That is, any assignment to a THE_ pseudovariable is always logically equivalent to (and is in fact defined to be shorthand for) assigning the result of a certain selector invocation to a regular variable instead."
* Wait, do I just have brain freeze? You're '''only''' talking about the multiple versus single assignment issue?? Yep, I think I was badly confused. But I'm not clear yet, because now it seems like a terminology issues. Tuples can be "scalars", and assignments can treat them as atomic, but their components are in fact visible otherwise, not opaque...aren't they? You're saying otherwise? Why? Gack, I should've just thought about it and re-read what you already said and not replied until later in the day, but I'd already typed in the quote above and saved. Sorry if I'm being obtuse; I'm not completely awake yet. -- Doug
* Yes, '''second edition''' -- and the '''''third edition''''' is scheduled to come out very soon! Chapter 6 was extensively revised in the second edition, and the preface explicitly mentions revisions to discussions of read-only vs. update operators, selectors, THE_ operators, tuple types vs. possible representations, multiple assignment, type constraints, datatabase constraints, and relvar and database predicates. 
** Quick note on just this: oh, you're talking about TheThirdManifesto. As noted above (possibly added after the fact by a WikiGnome, I don't recall), I've been exclusively talking about AnIntroductionToDatabaseSystems...and the page number you cited (126) is very close to being accurate for the latter book in 7th ed. (pg 117 discusses THE_ and cartesian points) -- just a coincidence, and you really did mean TheThirdManifesto? Ok. BTW in that regard I found Costin's recent comment on FundamentalsOfObjectOrientedDatabases interesting, did you notice it? -- Doug
** Oh, criminy. Yes, my bad, you did mention the book you were using. I use TTM for these topics, because it contains far more relevant detail. Regarding CC's remark: I remember noting the comment, but don't recall its substance ATM. -- DanM
** I forgot to mention that I asked because of comments of yours I just noted on page TheThirdManifesto about OO, and Costin's comments on page FundamentalsOfObjectOrientedDatabases relate to that. For the similar reasons, you may want to look at ExtendedSetTheoryStorageModel (offshoot of ExtendedSetTheory) and InheritanceIsNotSubtyping -- Doug
** Having refreshed my memory: I'm skeptical -- I know that I've run across references to older writings by, at least, Darwen in academic papers. Also, I wonder how much academic writing on the theory of OO databases actually existed to be referenced when TTM was first published in 1998. I didn't run across any, and that was about the time that I was hunting for information on the topic. (Perhaps it just wasn't referenceable via the Internet yet.) It is certainly a potentially valid criticism if this wasn't addressed in the second edition. How relevant it is depends on the content of the academic work, however. I'll have to take a look at FundamentalsOfObjectOrientedDatabases, assuming it's representative. -- DanM

* My point is that the structure of scalars is not apparent; they can always be thought of as atomic types, analogous to built-in types. Component accessors can be thought of as functions. THE_ pseudo-variable assignments can be thought of as atomic assignments to the argument variable as shown in the example above. Scalar types can be treated in all ways as analogous to primitive, atomic, 'built-in' types. The type theory discussions I've seen always seem to assume that some base types of this sort exist as a starting point. Thus I don't see that Costin's objections are relevant; the definition of new scalar types is simply not important to the material discussed in TTM, and the examples of such definitions given in the text are clearly ancillary to the main topic material.
* Tuples, on the other hand, are ''not'' scalars. Tuple types are composed (is that the right word?) types in the relational model as described in TTM, and are analogous to the composed types in discussions of type theory. If there are aspects of D&D's treatment of tuples that is in conflict with type theory, that would be significant.
* You're right that tuples -- and even relations -- can be treated atomically in many contexts. But their components (attributes) have a special status that scalar components don't, because of the role of attributes in relational operations. But yeah, you can draw analogies between tuple attribute access and scalar component access. Is that important? I don't know. -- DanM

There is a little more detail on this topic in a paper titled "MULTIPLE ASSIGNMENT" by Date, available at http://www.dbdebunk.com under Publications, but you have to pay for access to it. The paper is not, I think, essential to understanding the topic, but goes into more detail on the semantics of multiple (aka parallel) assignments. Of necessity, it discusses the role and meaning of THE_* pseudo-variables in such assignments.

D&D's work seems to implicitly assume manifest typing throughout. (At least I don't remember an explicit discussion of this as a choice among alternatives.) I've been playing around with an implicitly typed relational interface, and don't see any particular problems with such a system from a logical point of view. There are issues with implementation and performance, of course, but they don't seem insurmountable at this point.

These thoughts represent a small snapshot of my current thinking. I intend to do more reading on formal type theory. -- DanMuller

----
'''On Relation-Valued Attributes (RVAs) in TTM'''

One thing I haven't fully grokked is Date's treatment of avoiding nulls by using nested relations. For one thing, Date advocates using the latter to avoid the former, but then he elsewhere advocates avoiding the latter when possible, which seems like blowing hot and cold from the same mouth -- and so I wonder if any real world system has implemented his advice (which I doubt), and if not, is it because he's hand-waving, and his advocated approach actually '''is''' self-contradictory? -- DMe

I think "avoiding nulls by using nested relations" is a mischaracterization. Can you elaborate on where you got that impression from? D&D merely point out that there's no reason not to allow relations (or tuples) as attribute values, and they suggest some operators to help work with such. They do recommend against using such an arrangement in base relations, a recommendation that makes good sense to me. I don't remember a direct connection between this topic and nulls. -- DanM

In Chapter 18 ''Missing Information'' section 18.5 ''Outer Join (A Digression)'' page 599 of 7th edition, in discussing an issue where an inner join can, loosely speaking, "lose information", where an outer join does not, but produces NULLs, Date eventually says "''relation-valued attributes'' provide an alternative approach to the problem anyway -- an approach that does not involve nulls and does not involve outer join either and is in fact (in this writer's opinion) an altogether more elegant solution."

But of course in Chapter 11 ''Further Normalization I: 1NF, 2NF, 3NF, BCNF'' section 11.6 ''A Note on Relation-Valued Attributes'', pg 372 of 7th ed., Date says "it is possible for a relation to include an attribute whose values are relations in turn...relvars can have relation-valued attributes, too. From the point of view of database design, however, such rerlvars are usually contraindicated because they end to be ''asymmetric'' <footnote_below/> -- not to mention the fact that their predicates tend to be rather complicated! -- and such asymmetry can lead to various practical problems."

<footnote>"Historically, in fact, such relvars were not even legal -- they were said to be ''unnormalized'', meaning they were not even regarded as being in 1NF [10.4]. See Chapter 5"</footnote>

I understand that he's absolutely against NULLs, and that nested relations are just one of multiple ways to avoid NULLs, and that he isn't absolutely against nested relations, but nonetheless, it's partially conflicting advice, thus...well, thus my original paragraph above.

Hmmm...maybe I'm being obtuse again. The point I suppose is closer to "as a rule of thumb, '''do''' use nested relations in derived relvars, but '''don't''' use them in base relvars", which means there's less conflict than I had been thinking. I'm still not entirely happy with that, though, because I like things to be consistent and FirstClass in general, and did in fact just mention yesterday that I like the idea of using views fairly universally, and this advice would definitely demote views to second class citizens. It pierces the (hypothetical ideal) veil making base and view tables appear identical. -- DougMerritt

Yes, I think it has the status of a 'rule of thumb'. Logically, there's probably no particularly reason to prefer one over the other, although I think some functional dependencies of attributes in a child record on a parent record's key may be more readily apparent in a base schema if it eschews RVAs. OTOH, using RVAs gives you 'cascaded delete' and 'cascaded update' functionality for free, in a sense.

RVAs would really be useful for applications. Consider, for example, how much simpler the code to load an object's state would be, when the object hierarchically contains other objects. Or how much simpler code to drive report generation could be.

In cases where a relationship between two tables wasn't strictly a parent/child relationship, using RVAs in the base schema to represent one of them would seem unnormalized, although I haven't considered if it would be according to the current formal definitions of normal forms.

-- DanM

Another thought on RVAs: One of the advantages I've seen touted for business objects over relation-based interfaces to databases is the ease of navigating relationships. I think that RVAs potentially negate this advantage by making navigation just as easy with a direct representation of tuples and relations. Details could depend on the programming language and typing system, though.

-- DanM

----
'''Regarding Off-Topic Pages'''

Although blatantly off-topic pages are sometimes interesting, there's a continuum ranging from interesting trivia (e.g. the pages about countries and movies) to topics that, to treat them with anything approaching sober discourse, require an entire wiki of their own. Those same topics also tend to draw the most yahoo-out-there rhetoric. This quantity and type of rhetoric detracts from the purpose of this wiki in exactly the same way that spam detracts from the purpose of my email inbox. The analogy to spam is very close; both are pushed by people who rudely ignore the purpose of the venue. Both email and wiki spammers might argue that moving such topics to a less visible venue would deprive their message of attention, and they're right, and I'll even admit that there's an occasional legitimate lost opportunity to evoke someone's further curiosity on an important topic. I have occasionally gotten interesting spam in my email; but on principle I delete them without investigating further. The more common loss is the time spent to wade through the crud to find the legitimate email, which has a higher likelihood of relevance to my interests.

In an age where ''attention'' is a critical commodity, spamming of all sorts (inappropriate use of a communications medium) amounts to attempted ''theft''.

Attentive wiki readers will note that I have occasionally made minor contributions to such pages in the past. I will refrain from doing so in the future. OTOH, I may continue to contribute to off-topic pages that I deem to be non-controversial trivia.

-- DanMuller


----
CategoryHomePage