One of the key advantages that I see in using a PersistenceEngine over a RelationalDatabase is the simplifying assumption that the application developer has an InfiniteAmountOfTransactionalMemory. Obviously this assumption is a lie, but I claim it is a UsefulLie for at least the initial stages of an application's life (possibly for the entire life of an application). I claim the advantages hold for applications that are OLTP ( http://en.wikipedia.org/wiki/Online_transaction_processing ) in nature rather than OLAP applications. For OLAP ( http://en.wikipedia.org/wiki/Online_analytical_processing ) applications, a RelationalDatabase's dynamic query optimization capability and the ease of mapping SQL to a user-friendly GUI are too advantageous to give up. So, I'd like to constrain the discussion on this page to OLTP applications. Furthermore, I assume that YouArentGonnaNeedIt, OptimizeLater, and OnceAndOnlyOnce are all worth pursuing. Key advantages to assuming an InfiniteAmountOfTransactionalMemory: 1. Assuming an InfiniteAmountOfTransactionalMemory provides all the advantages of ObjectIdentity. 1. Polymorphism is handled exactly to the degree that the execution language supports it. 1. The PersistenceEngine handles bringing objects in and out of the persistence store and all of the associated caching issues. 1. There is no application code required to map the in-memory data model to the disk based data model, thus preserving OnceAndOnlyOnce. 1. Depending on the transaction models supported, no application code is required to implement object locking or synchronization (YouArentGonnaNeedIt). -- MarkAddleman ---- And, due to the complete reliance on the host language for all data processing... ''there is almost no "ImpedanceMismatch."'' ''("almost no" because transactions and root objects impose some minimal inconvenience.)'' ---- ''I needed to think a bit to clarify what bothers me about this. I'm really talking about scaling of the memory space to cover every single object. If it's OLTP then I assume you mean multiuser with few locking issues since each session doesn't share much data. It doesn't take many users to thrash a virtual memory system since it must write to disk on transaction boundaries. These considerations don't apply to a heterogenous system such as in OODBs. I think OODBs are rapidly converging with relational databases so scaling issues can be mitigated. OODBs can present an illusion of transparency, but it is just that. I prefer to be more explicit with moving objects in and out of the persistent space. This reflects the real situation, helps focus programming, and doesn't distribute the associated persistence attributes outside the code, in some (proprietary) metadata. --RichardHenderson'' ---- * I'd be more concerned about potential concurrency issues, because OODB locking tends to be course grained (all the object instances that are physically on the same page). But what I've read says that you should cluster objects that are likely to be used together, which seems like a good idea for both locking and performance. * OODBs that are "page servers" would probably be more lightweight, and hence scalable, for any given server configuration; most of the work is being done on the database's client. Client app. defined locking would also tend to dramatically improve server efficiency and scalability - at some cost in client code design difficulty. -- JeffGrigg What do you mean ''these considerations don't apply to heterogenous systems such as in OODBs''? ''OODBs, at least the better ones, do not think that paging virtual memory is the same as a transactional PersistenceMechanism, i.e. they do a basic relational mapping into a structured file-system with a big fat memory cache. If this were not the case then there would not need to be metadata, because your memory would be transactional and persistent. I suspect you apply your thesis to OODBs. I disagree. I have used systems with truly persistent memory (super-reliable telco boxes), and they work for any language that can run, with some minor generic configuration. Most of my arguments follow from this distinction. --RichardHenderson'' ---- ''I prefer to be more explicit with moving objects in and out of the persistent space. This reflects the real situation, helps focus programming, and doesn't distribute the associated persistence attributes outside the code, in some (proprietary) metadata'' Under what circumstances does it help focus programming? I find the situation is exactly the opposite. Moving things in and out of memory gets in the way of focusing on coding business logic. Perhaps I haven't worked on a system with a well-defined and somewhat hidden persistence layer (btw, I actually probably haven't!). ''You have if you've used an OODB. They put things on disk with primary keys and indexes and all sorts, albeit somewhat hidden.'' As far as proprietary metadata... I agree that one of the bigger risks that you take with an ObjectOrientedDatabase is vendor lock-in. I've worked on projects migrating from one RelationalDatabase to another and that's trouble enough. And I'm currently involved in migrating a very proprietary persistence scheme to GemStonej. Fortunately, many of the issues that could crop up don't because the old system was written entirely in Java so, aside from putting in GemStonej specific transaction code, all we have to do is bring the system up in GemStonej, let it read all the old persistent objects and write them out through GemStonej. I doubt it would be this easy in general, though. ''I'm thinking more of having a single object with code all over the place.'' One of the underlying assumptions of YouArentGonnaNeedIt is that the ExpectedNetPresentValue cost of changing your mind later is equal to or less than the cost of making the decision now. It would be interesting to see an analysis of the costs of switching from one proprietary PersistenceEngine to another all else being equal. Probably the biggest 'all else equal' is keeping the implementation language the same. ''The problem (well one of them) lies in efficiency. A non-trivial system has to consider the efficiency of its slowest elements, usually the transactional/persistent bits.'' This is true. However, this is a problem with a system of any size regardless of the initial development assumptions. To design a system for efficiency from the outset violates OptimizeLater and is probably counterproductive (at least it is in the real world examples that I've seen). ''Fair enough. You qualified your initial statement well in this regard. Moving from a homogeneous space to a partitioned space is not too difficult, though it can be quite tricky if the original object graph is very coarse grained.'' What do you mean by coarse grained? A few large objects as opposed to lots of little objects? I find that when I don't consider persistence issues (particularly if a RelationalDatabase is not involved), my objects are much smaller and I have lots more of them. ''And what exactly are your criteria for the quality of a domain model (the analogue to database schema), be it with large objects or with small objects?'' -- CostinCozianu I suspect we want a page called ObjectModelQuality. I think the short answer is OnceAndOnlyOnce. In an object model, this applies not only to data, but to code as well. See CodeNormalization. ---- ''I suspect you apply your thesis to OODBs.'' Well, I was thinking of OODBs when I wrote it (or, more accurately PersistenceEngine''''''s), but my thrust is that an InfiniteAmountOfTransactionalMemory is rather nice and that to the extent that we can construct an environment to simulate that, our lives as application developers will be easier. ''I have used systems with truly persistent memory (super-reliable telco boxes), and they work for any language that can run, with some minor generic configuration.'' Apart from some Y2k work, I've never really examined embedded systems like that. Are these systems transactional? Do they use flash RAM or some other persistent chip-based technology or does the embedded OS provide you with a really fancy virtual memory system? ''Well, since you ask :). The big ones were switches for mobile networks. They were hotrodded SPARC boxes with redundant everything. Memory was battery backed with disk for persistent data, cached in memory. Transactionality wasn't really relevant since only one client could change the configuration. The memory was actually mirrored so that entire boards could be changed on the fly. Embedded versions use Flash memory. You've probably seen them in switch/routers and fast networked disk arrays. Nothing stopping a similar technology underlying our hardware. At the end of the day, however, granularity considerations for locking will tend to move one towards a more explicit persistence style IMO. -- RichardHenderson.'' ---- I guess I was the biggest sinner lately generating lots of ThreadMode discussion, thus I'll try to contribute to refactoring of the discussions related to ObjectRelationalImpedanceMismatch. On this page, I'd like to see some issues addressed distinctly. On the programming model itself: *Is the model described here the default programming model for object databases in general? *Are there exceptions in the object database world to this approach? Then let's try to distinguish some potential drawbacks of this model: *Losing database independence with respect to application. See DatabaseApplicationIndependence *Related to the above what is the quality of an object model and how can you make sure that DatabaseIsRepresenterOfFacts in the case of a object database? Relational databases have traditionally been the preferred solution of data modeling experts, and at a minimum have some well defined quality criteria - the ''normal forms''. Even when a DBA decides to break the normal forms introducing redundancy in order to gain efficiency, RDBMSes offer some workarounds in order to enforce consistency. How you do that in this model? *In order to achieve this model object databases usually have to resort to a modified runtime. This introduces further problems. *What happens if ''client code'' runs away; is it possible to constrain custom code running inside an object database? *OO languages have limited (that means none) query capabilities, and no operation on sets. This has been addressed partially by ODMG 3.0 and OQL, but MarkAddleman told me elsewhere that he's likely to write some hand made collections of references to support queries. *Limited concurrency control flexibility. This has been discussed so far. Maybe we should move some discussions to separate pages?? There are also some other issues, but these are the most obvious. -- CostinCozianu ---- ''In order to achieve this model object databases usually have to resort to a modified runtime. This introduces further problems.'' To the extent that this statement is true, it is no more a problem than relying on ODBC or JDBC. ''There's no comparison. A typical client database library will not modify your runtime, it's just like any other library you use''. Depending on the PersistenceEngine, this statement is not true. Voss, Tenacity, OmniBase, OzoneDb, and JydJavaPersistenceEngine do not run in modified execution engine at all. All but Jyd rely on some form of proxy mechanism (more or less transparent to the application developer). The JydJavaPersistenceEngine requires the programmer to add one of two method calls to each instance method. ---- There are several possibilities for this model to work: *automatic detection of updates through modified runtime (either VM modification or compiled class modification) *automatic detection of updates through reflection *manual notification by the programmer to the framework *generating a framework related abstract class encapsulating data, from which the programmer will derive and use its services like in CMP EntityBean''''''s which is also an illustration of this model. The most efficient by far is the modified runtime. I believe GemstoneJ is using this approach. The others have several drawbacks, including efficiency. -- CostinCozianu ---- ''OO languages have limited (that means none) query capabilities, and no operation on sets. This has been addressed partially by ODMG 3.0 and OQL, but mark told me elsewhere that he's likely to write some hand made collections of references to support queries.'' See OperationsOnSetsInSmalltalk Depending on the PersistenceEngine, you get either some or no query facility. This isn't an issue because we're not talking about performing arbitrary queries on a data set. We're talking about retrieving known objects from known locations using predetermined access paths. This is typical in OLTP applications. ''Not likely, or not what I've seen. At most an OLTP might need a predetermined set of templates for queries. This is valid for one application, for several applications accessing a database (remember DatabaseIsRepresenterOfFacts) you need the flexibility of a full blown query engine and a query optimizer. Not to the same extent as in an OLAP, but you need it.'' -- Costin I'm not ''likely to write'' anything. I have thought about writing my own index facility to support hashtable or b-tree indices on non-primitive datatypes. I haven't written it yet, because I don't need such a thing yet. But if I do write it, I only intend on writing it once. See OaooApproachToIndices. To set the record straight, GemStone and GemStonej have query facilities and indexable collections. The query facility supports SQL-like queries (whether they're OQL queries, I do not know). [DeleteMe]''At risk of being off topic. How are the indexes specified? Metadata schemas or in code?'' Speaking of Gemstonej queries, I couldn't help but notice that a pretty knowledgeable Gemstone user complained on usenet that it cannot use a damn report generator. So what really are the query capabilities of Gemstone, and how does this relate to DatabaseIsRepresenterOfFacts and DatabaseApplicationIndependence? ''Perhaps if the user's report generator weren't damned, he would have better luck ;)'' Sure, except that the message was posted on news://comp.database.object and no one of the object gurus hanging there could offer a solution, only palliatives. ''As for DatabaseIsRepresenterOfFacts and DatabaseApplicationIndependence, I'm not sure I agree with the assumptions or the usefulness of either one of these points. I'm still mulling over my thoughts on this. -- MarkAddleman'' Now, of course both of the principles are not waiting for Mark's approval, they are happy even without :) So if you are not representing facts, what do you represent? Oh, I remembered, you represent objects. Maybe you should define your position in DatabaseIsRepresenterOfObjects and the first question to ask is whether it is a DesignPattern or AntiPattern. -- Costin I see Costin's point, and yet I am overwhelmed by the beauty of yagni as it applies to this discussion. Write your application as if it is privy to infinite transactional memory. You are then effectively using an object database at minimal cost by default, and you don't need to really care about whether or to what degree that is true. Should you run into pain, such as a need to support a wide range of adhoc queries or to share data with an application that prefers to talk to a sql database, then by all means add code to transform data between your application's memory and the database, in a way that best suits your situation at that time. If you never feel that pain, you'll never have to make such an investment. Also, in this age of the Web, it is possibly more likely that applications will want to speak HTTP to each other (REST) rather than share a sql database (or for more speed, flexibility and asynchrony, a binary message protocol such as AMQP). That's arguably a cleaner approach anyway. -- Charlie O'Keefe {Really? How is that a cleaner approach?} Arguably :) My main reason, I think, would be fewer interfaces. An application with a shared RDBMS together make an application that exposes all its interfaces plus an additional SQL interface to the outside. Supposing it is a webapp, it has to speak both HTTP and SQL with the outside world. It strikes me as simpler to have an application that speaks only HTTP to the outside world. If it is well-designed, that interface can be made useful to both human and computer clients. (I'm fine with calling the RDBMS a pseudorelational database for those on this site that are concerned about the distinction.) This might be venturing into DatabaseApplicationIndependence. ---- Have you seen an interesting approach for this in http://www.prevayler.org? There are also a project, inspired on first, hosted by Java.Net (https://space4j.dev.java.net) See and have fun... -- Yuri