''This is a '''design''' exercise for Java...'' '''Introduction''' It is generally accepted that the collection '''hierarchy''' problem has never been definitively solved for object-oriented languages, especially for those object-oriented languages like Java that supports both primitive types ''and'' classes. The solution that probably comes closest to being definitive is the SmallTalk collection hierarchy. SmallTalk provides a wide variety of data structures (both abstract and foundational) along with a consistent idiom for iterating over collection elements. Each collection class has a set of HigherOrderFunction''''''s used to apply ''blocks'' to their contents. However, I am of the opinion that even the design SmallTalk Collection Hierarchy is flawed in several areas. For example, the Collection Hierarchy classifies a Linked Lists as a ''kind-of'' integer indexed sequence (i.e. by subclassing ''Sequenceable''''''Collection'' or the older ''Indexed''''''Collection''). This makes it difficult to create generic queues or stacks that can adapt the ''S''''''equencableCollection'' protocol without having to consider the implementation details of the concrete class that is implementing that protocol. There is a rule being broken here somewhere. After all, if I have to consider a concrete-implementation when using an abstract interface, something is wrong in the classification scheme. A client cannot assume that the random-access provided by the Sequencable protocol will ''behave'' the same for a list as it does for an array. Worse yet, even if the SmallTalk solution ''were'' perfect, it would still not be an appropriate solution for Java. After all, SmallTalk has a dynamic type system - ''everything'' is an object. In Java, we need to consider collections that are keyed from primitive types such as integers as well as first-class objects. This immediately complicates an optimal solution. The ''getAt'', ''addAt'', ''removeAt'', or ''indexOf'' methods of a Java array type will '''most naturally''' deal in the primitive ''int'' type and not the ''Integer'' class. The STL provides a great solution for GenericProgramming. Unfortunately, it is not an OO solution. A GenericProgramming solution has little use for establishing common interfaces using inheritance. Instead, it can use genericity with ParametricPolymorphism to enable common interfaces using different types. This is why an STL approach in Java (such as the JavaGenericLibrary or JavaAlgorithmLibrary) is usually unwieldy, overly complex, un-extensible, and inefficient. Unfortunately, even if Java did support ''parametric polymorphism'', most of the proposals on the table cannot use both primitives and classes as template types. This exercise need not result in a good Collection Hierarchy design. Maybe the exercise will discover that the object-oriented design possibilities for Java are compromised by its type-system. Hmmm.. I guess we already know this. However, my guess is that it will at least give us some hard evidence or make more concrete the reasons why this sort of ''straddle the fence between dynamic and strong'' type-system has weaknesses in an object-oriented language. If nothing else, it should show the complexity that Java's support for both primitive values and objects adds to a data structures foundation. ---- '''The Exercise ''or'' Your challenge if you choose to accept it...''' Design an extensible class hierarchy for a set of ''foundational data structures'' and ''abstract data types'' that is suitable for languages like Java. You don't need show all the interface members - just enough to demonstrate how the class tree and its interfaces work. You won't need to implement any of the interface members; just declare them. Furthermore, we don't care about names - this isn't an exercise about class naming. What I '''am''' interested most in is the '''inheritance''' and '''aggregation''' relationships and their impact on interface based programming. For example, what happens if a doubly-linked list is the subclass of an integer indexed sequence? In the Java collection classes, you can use a ''List'' interface. However, this spells bottleneck if you then write the following code based only on the interface: void some_func( List list, int at, String spec ) { if ( list.get( at ).toString().endsWith( spec ) ) list.remove( at ); } You see, while ''L''''''inkedList'' and ''A''''''rrayList'' share the common ''List'' '''interface''', they actually end up requiring a different programming approach to solve various problems. To me, this is a blatant design flaw. The common interface deceives the programmer into thinking he is dealing with the same abstract when, in fact, a list and an array are completely different abstractions with disparate behavior. '''Requirements:''' The three requirements are to (1) create an extensible hierarchy with basic as well as abstract data types, (2) allow for intrusive as well as non-intrusive collections, and (3) assume HigherOrderFunction''''''s for iteration rather than decoupled GangOfFour or STL-style iterators. '''The first requirement''' suggests the need to represent both ''foundational data structures'' (i.e. lists or arrays) and ''abstract data types'' (i.e. sets, stacks, or maps). Some people consider that classes in the second category are''adapters'' for the classes in the first category. This results in a collection hierarchy that uses aggregation as well as inheritance. Others decide this distinction is not important and put the class from both categories into the same hierarchy. Which approach you take is up to you, as long as there are examples of each and new data structures can be added using the existing interfaces. '''The second requirement''' in its most simple form assumes we will have a traditional object-oriented linked-list where the node is an object-wrapper class that is hidden in the implementation of the list ''as well as'' an intrusive list that exposes the node interface and requires that elements subclass this type. Here is an example of each: ''class I''''''temList'' public void add( Object el ) { Snode node = new Snode( el ); node.setNext( m_end.getNext() ); m_end.setNext( node ); } ''class S''''''nodeList'' public void add( Object el ) { Snode node = (Snode)el; node.setNext( m_end.getNext() ); m_end.setNext( node ); } These two classes are the minimum one can do to satisfy the second requirement. Of course, it would be nice if the collection hierarchy included (or at least ''allowed'' for) intrusive and non-intrusive lists that are singly-linked as well as doubly-linked. Also note that you may want to consider the Item List an abstract data type and the Node List the foundational data structure it is implemented with: class I''''''temList implements Linear''''''Collection { private static class Link extends Snode { private Object m_item; Link( Object item ) { m_item = item; } } private Snode''''''List m_rep = new S''''''nodeList(); public void add( Object el ) { m_rep.add( new Link( el ) ); } } In this scenario, the foundational structure like ''resizeable array'', ''doubly-linked list'', and ''singly-linked list'' need not even be part of the ''primary'' collection hierarchy. '''The third requirement''' is for InternalIterator''''''s. While you do not ''have'' to show the implementation of the internal iterators, you can definitely '''not''' rely on ExternalIterator''''''s to abstract away the differences between the data structures. The rationale is that we want a natural collection hierarchy. We don't want big coupled classes that simply hide the details of each collection. Not only is this non-optimal, it also just isn't as clean as the internal iterator approach taken by SmallTalk (see: InternalizeExternalIterators). Whether it is true or not, this exercise assumes that '''external iterators are a sign of weakness in object-orientation''' (See: ftp://ftp.netcom.com/pub/hb/hbaker/Iterator.html). ---- '''Why a List is not an Array ''or'' Observing the true Nature of a Collection''' Just as there are many DesignPatterns, each with different behavior and applications, so are there many data-structures and algorithms. A set is not a bag and an array is not a tree. Many of us will take the time to understand the difference between a BridgePattern and an AdapterPattern but know very little when it comes to the best uses of a list over an array. Object-oriented programmers are particularly weak in this regard and many never go beyond the use of an Array (or ''O''''''rderedCollection''). Many just don't grok the situations where a list is more appropriate than a vector. Why is this? Why do we celebrate the differences between DesignPatterns - and discuss the appropriate usage of each - but yet we are happy to hide away those same differences in data-structures and algorithms? Just as with DesignPatterns, it is in their differences that their strength and power can be found. Most of us know that arrays provide random-access to its elements in constant-time. We also know other stuff about arrays, such as inserting an element requires that all elements trailing the insertion point be copied. Furthermore, we know that random-access in an linked-list performs only in linear-time. For example, if we want element N of a linked-list, we must iterate from the start of the list to N elements. This is the case for positional insertion, removal, or access. Consider the following: public Object at( final int nIndex ) { return ''detect''( new Block() { int nItem = 0; public boolean is( Object each ) { return ( nIndex == nItem++ ); } } ); } This method ''attempts'' to provide a member that allows its clients to ''believe'' that this linked-list has the same ''random access'' features as an array. But '''this is not the case'''. The truth is that '''random-access is not in the ''nature'' of a linked-list''', at least not as it is for an array. Random-access in a list has different behavior than random-access in a linked-list. However, most collection hierarchy designers are happy to have both Lists and Arrays descend from the same interface that provides random-access. This fools a client into believing that they can simply use this interface without regard to what concrete class it is instantiated with. Because a linked-list is a ''linear'' collection, we '''can''' access its elements by ''position'' as shown in the list implementation of #at. However, this is not that same as ''indexed-access''. We have to ''make'' a list look like it is integer-indexed by writing ''adapter'' code - we adapt the nature of a list (linear links) to support random-access features. Despite this fact, both SmallTalk '''and''' Java subclass their array and list classes from an indexed collection protocol. This means if you are a programmer with only a reference to this indexed protocol, you cannot use it generically and expect the same behavior. If a list isn't an indexed-collection, what is it? Well, it is a linear ''sequence''. If it is a singly-linked list, it is a Forward Moving Sequence, and if doubly-linked it is a Bidirectionally Accessed Sequence. This allows a list's elements to be accessed positionally, but not ''indexed''. In a list, we are not accessing the item #at '''n''', we are accessing the '''nth'' element. I hope this difference is clear. Contrast the previous list implementation of #at with this implementation for an array: public Object at( int nIndex ) { return m_contents[ nIndex ]; } In this example, the integer argument '''is really an element index'''. This is a ''natural'', constant-time operation for an array type. There is no way to get around it - '''lists are linear, they are not random access'''. It is my opinion that providing a common interface whose concrete classes have wildly different behavior is a design flaw. Generalization does not mean we should ''trick'' a client into believing that ''at(int)'' on a list is behaviorally the same as ''at(int)'' on an array. The common interface for a list and array is not an indexed-collection (such as ''java.util.List'') with members like ''addAt(int,Object)'', ''getAt(int)'', or even ''removeAt(int)''. Instead, it is a ''Sequence'' with members like ''addFront(Object)'', ''addBack(Object)'', ''getFirst()'', ''getLast()'', ''removeFront()'', ''removeBack()'', and various InternalIterator''''''s. It is simply an error to index a linked-list and expect it to have the same behavior as a indexing an array. We allow our clients to produce better code when we ''enable'' them without violating ''abstraction'' and ''generalization''. We should always keep this tension between '''natural''' and '''artificial''' members in mind when designing classes. Think about the '''behavior contract''' for each data structure - are we doing something that can mislead our clients? Try to create a difference in your head between ''abstracting or generalizing'' and ''obfuscating''. '''When we design, we want to abstract, not obfuscate'''. It is interesting to note that in generic programming, libraries like the StandardTemplateLibrary do not even provide a positional helper method for linked-lists. To access an element by position, the programmer has to iterate until the element at ''N'' is found. Only vectors, and other truly random-access collections, support the ''getAt(int)'' interface member. This results in preventing the programmer from making incorrect choices. The interface members do not betray the programmer. '''One Failed Attempted''' Now, the simplest way for me to think of all of this was to make ''Array'', ''I''''''temList'', ''N''''''odeList'', and even ''Map'' all subclasses of ''Indexed''''''Collection'' - i.e. a kind of collection whose elements are accessed by some type of index. For ''Array'', the index would be an Integer. ''For the (D|S)''N''''''odeList'', the index is doubly-linked or singly-linked ''Node''. For ''I''''''temList'', the index was a node-type cast to an immutable ''Index''. And finally, for ''Map'' the index type was the association's ''key'' value. There are big problems to this approach for the JavaLanguage. First, object creation is very expensive. To make the ''Array'' use Integer '''objects''' instead of primitive ''int'' values incurs unnecessary overhead. Randomly accessing 1,000 elements would require the client create 1,000 objects!! Instead, we would like to use ''int'' indexes for the ''Array''. Unfortunately, this breaks the class tree since ''Map'' and ''List'' want ''Object'' indexes. Fooey! Personally, I was not willing to hoist the totally unnecessary overhead of creating an object onto my clients for each time s/he needed to index an array element. ''--RobertDiFalco'' ------ '''Discussion:''' ''No, no, Robert - Tell us what you '''REALLY''' think! ;->'' Just a random thought: Maybe the "intrusive list support" of an object would involve the object implementing an interface. Perhaps a convenient way to do this would for the "intruded upon" class to have an inner class "node" that inherits from some common base class and supports an interface for fetching next node, previous node, and data item. Hmmm... Offhand, this seems like an interesting mix of intrusive and non-intrusive implementation techniques; a class could participate in several lists at once, by having several inner classes. ''(I'm not making claims that this is all "right" - I'm just thinking out loud. ;-)'' -- JeffGrigg Jeff, I may be missing the point, but to me that ''is'' the big difference between an intrusive list (nodelist) and a non-intrusive list (itemlist). The nodelist requires that the client's elements implement the node interface - i.e. that elements ''are'' nodes. The itemlist deals with nodes internally and allows items to be of any object type. However, I probably placed too much importance on this side issue of intrusive versus non-intrusive data structures. The real design issue is how to best organize the fundamental collection types in a language like Java where you have typed methods and overloading, both Object and non-Object types, and no parametric polymorphism. I didn't mention the Eiffel Collection Hierarchy only because it seems the worst of the bunch. In SmallTalk, for example, I could ignore the differences between using a Node for the Index in Linked Lists, an Integer for the Index in growable Arrays, and any hashable object for the Index in Maps/Dictionaries because everything is an Object. I wouldn't make the same mistake that smalltalk did by assuming ''#at'' and ''#indexOf'' in the Sequenceable Collection protocol be an ''integer''. Then, because List and Array are both Indexed Collections, I could make Stack, Deque, Queue, or Priority Queue simply adapters that take an Indexed Collection as a construction argument. Unfortunately, doing this in Java would kill performance since you would have to create an Object index for Arrays, or use an integer index (an ''nth'' lookup) for linked lists. Let's consider the following. Say I create an Indexed Collection interface that used both Object indexes ''and'' integer indexes. Something like the following: interface Indexable extends Collection { Object at( int index ); Object at( Index index ); Object remove( int index ); Object remove( Index index ); Object append( int index, Object item ); Object append( Index index, Object item ); int nthValue( Object item ); int nthReference( Object item ); Index indexOfValue( Object item ); Index indexOfReference( Object item ); ''...and so on...'' } Then, say I did something like the following for linked list types: public Object at( final int index ) { int count[] = {0}; return detectFirst( new Block() { public boolean is( Object each ) { return ( index == count[0]++ ); } } ); } public Object at( Index index ) { return ((Node)index).getItem(); } And this for integer indexed types like dynamic arrays: public Object at( int index ) { return m_contents[ index ]; } public Object at( Index index ) { return at( ((ArrayIndex)index).intValue() ); } Besides being VERY complex (two versions of everything for each indexed feature), it does work for the adapters stack, deque, or queue. Why? Well, the adapters would have to make an arbitrary decision - do we implement ourselves based on integer indices or object indices? If the latter, we have to create a lot of unnecessary objects for dynamic arrays. If the former, we end up exponentially decreasing the performance of lists since we have to do a look up for each integer indexed access. See the problem? Now, I know Wayne would tell me that I shouldn't be concerned with performance. However, the ''reason'' this wouldn't perform well is ''because'' the design is overly complex and requires clients to make arbitrary decisions - i.e. ''Do I use integers or objects when I have an Indexed Collection but do not know its concrete type?'' ''-- RobertDiFalco'' ---- '''Questions:''' 1. ''What is the semantic difference between a ''key'', a ''node'', and an ''index''? What are their similarities?'' 2. ''What is the semantic difference between a ''dictionary'' that contains ''associations'', a ''map'' that puts and retrieves a ''value'' at a ''key'', and a ''map'' that puts and retrieves a ''value'' at a ''value.hashCode''? What are their similarities? Are there different classes here or are they all one class?'' 3. ''In an O-O system, is it possible that a List or an Array actually implement a ''Stack'' interface instead of the traditional approach where a ''Stack'' is created using a ''List'' or an ''Array'' instance?'' ---- Robert - you have done an excellent job of defining the problem. I have a feeling (but I don't do excellent job's, so I haven't given it enough thought) that there is a danger in mixing functional requirements with implementation approaches. If we consider only the functional requirements of the collections(keyed, ordered, access [fast|other] by key/position/rank, etc.), and leave unspoken any potential implementation strategy) we might get a different cut on the interface requirements and class relationships. ''You're right. Do you think I should take out all the implementation strategy suggestions and analysis? I have no trouble doing that.'' '''Goodness no!''' ---- OK, my attempt at a solution. I would start defining a couple of interfaces that simulate Smalltalk block objects: public interface Predicate { boolean isTrue(Object o); } public interface Operation { void performOn(Object o); } Then we have an interface implemented by unordered collections: public interface UnorderedCollection { void add(Object o); void remove(Predicate p); void iterate(Operation op); void iterateSome(Operation op, Predicate p); } ...and for ordered collections: public interface OrderedCollection extends UnorderedCollection { void addAtBegin(Object o); void addAtEnd(Object o); void addAfter(Object o, Predicate p); void iterateForward(Operation o); void iterateBackward(Operation o); } The class hierarchy would be so structured that all you typically use are the interfaces. This makes the hierarchy flexible. Perhaps the classes can even be hidden behind a Factory object. -- StephanHouben So, would you create an array that used objects for indices instead of ''int'' values? i.e. ''addAfter( new Integer(10), "foo" )'' instead of ''addAfter( 10, "foo" )''? If you support both, it is confusing to the user. For example - ''When should I use '''Integer''' instead of '''int'''. Can I use '''int''' indexes in the Linked List class? Why are there so duplicate methods?'' And so on... The more I think about this problem, the more I wonder whether there even ''is'' an elegant O-O solution for a language like Java. I think it raises interesting doubts about its hybrid type-system (i.e. both objects and non-objects). -- RobertDiFalco ---- It seems to me that the 'collection classes' in Java2 are very well-designed (very few flaws). See . The addition of Java Generics should get around the SimpleTypesAreNotObjectsProblem. -- KeithRay Well, we will have to agree to disagree. I really, '''really''' dislike the Java Collections. While I think the genericity promised for 1.4 is great, it will not fix design-flaws. You will have to read this page to understand why I think the Java collection classes are so poorly designed. However, you need only benchmark them to see the effect of this incorrect design. There are many other issues I can get into besides what I mention here, but this page goes into the major ones. -- RobertDiFalco Have you looked at TheEiffelLibrary? It uses a mixture of multiple inheritance and genericity. Yes, I have. Unfortunately, this wouldn't work for a language like Java that has both objects and non-objects. And, as far as I know, GJ doesn't allow one to mix both. ----------- Most tree taxonomies turn out to be flawed or arbitrary in the long run. There are many '''orthogonal''' characteristics to "data structures". Trees tend not to work very well with orthogonal traits. You can make each level represent a trait, but that is stretching things IMO. For one, how do you determine which trait gets to be higher in the hierarchy? The importance of each trait tends to be relative to needs, not absolute. ''There's nothing that requires an interface hierarchy to be a tree. In fact, in Java, the ArrayList class implements RandomAccess to mark random accesses as fast even though its primary interface is List.'' Another way to study data structures is perhaps to identify features rather than IS-A type classifications. For example, a relational view would be to consider a DictionaryDataStructure a two-column table with one column being a unique index. IOW, how many columns, how many keys, and the nature of those keys (unique, non-unique). It is more of a smorgy-style way to look at things: "A little of this, a dash of that". Perhaps some people grok the world better if they can find a way to view everything as a tree. But not me. ''Are you stating that you don't like class hierarchies in general? I don't think this exercise assumes an is-a or has-a preference. The different data structures can be modeled with inheritance, aggregation, decorators ala STL, and so on. Of course, I think some set of common interfaces are important if you ever want to return some hing other than a concrete class. Otherwise converting between data structure implementations can have a lot of repercussions throughout a code base.'' How about an example from practice. See DedicatedStructuresVersusRdbms. ---- What do people think of the Java version of Foundation Framework? The ObjectiveCee version was part of NextStep/OpenStep, and now is part of CocoaFramework. But the Java version is part of WebObjects, and includes collection classes like NSArray, NSDictionary, and so on. ---- A rather design is available at: http://www.brpreiss.com/books/opus6/public/classes.gif as part of the book Data Structures and Algorithms with Object-Oriented Design Patterns. ---- See also: DesigningBetterJavaCollections ---- CategoryDesignIssues CategoryJava CategoryHierarchy