An architecture for distributed systems based on the concept of reliable message queuing. Messages are queued asynchronously between applications and systems. * Multiple providers can post messages to a queue. * There can be multiple message consumers attached to a single queue. * The queuing infrastructure ensures messages are delivered exactly once. * Messages can be successfully submitted to a queue even if the message consumer(s) for that queue are not running or are unreachable. ''(Alleged?) Benefits:'' * Scalablity (due to multiple posters/readers) * Reliability (Some systems can make queues persistent.) * Abstraction since producers and consumers aren't directly interconnected (rather similar to a loosely coupled PipesAndFilters architecture) ''Examples:'' IBM's MQSeries: http://www.software.ibm.com/ts/mqseries/ MicrosoftCorporation's MSMQ: http://www.microsoft.com/ntserver/techresources/appserv/MSMQ/MSMQ_Overview.asp or http://www.microsoft.com/mind/0398/queue.asp Tibco's Active Enterprise: http://www.tibco.com/products/active_enterprise/index.html ---- One would suppose this architecture is suitable for wrapping legacy systems and gluing them together. But it's not clear to me how this type of architecture fits into a large distributed OO system model. What would be the equivalent of IDL for a system built along these lines? -- some guy Actually, in one system I've worked on the IDL that you used WAS IDL. That project spent a great deal of effort building a CORBA Event services layer on top of a Message Queueing system so that they would have both IDL and reliability. Conceptually, it's not that much of a stretch, and it worked out well for that project. P.S. One more thing about reliability -- Queues are persistent only in the case of so-called "store and forward" messaging systems. Many messaging systems support this capability, but in *every* delivered system I've ever seen the developers had to turn that capability off at deployment time to achieve the needed transaction rates. Thus the system as a whole becomes exactly as "reliable" as the machine the queue is running on. In the case of an OS/390 box, that's pretty darn reliable, but in the case of an NT, Solaris or AIX box, your guess is as good as mine. -- KyleBrown The utility of Message Queuing is orthogonal to the OO-ness of the distributed model. The basic idea is this: Invoke a method (send a message), and forget about it. This allows a more loose coupling between disparate systems, and loose coupling generally means (a) simple to implement, (b) simple to maintain, (c) scalable. So where do they meet? For example, suppose you could insert a queue between the client and the remote object it is using. Obviously, not a general application, but suitable in some circumstances. Now the client can instantiate the remote object (via proxy), manipulate it via a well-defined interface, and then deactivate it. All of these operations can be queued, if you submit to some of the limitations, eg: your logic must not depend on the results of prior method invocations. For an illustration of the combination of OO and message queueing, see ComPlus, specifically Queued Components (but also see DnaVsOo for a discussion of how OO is DNA (and Com+)). Also, I believe OMG has attempted to include some form of asynchronous comms into CORBA, but I must confess that I haven't followed it (has anyone?). -- DinoChiesa, 12 April 2000 The two 'orthogonal' aspects of messaging and OO are combined nicely in the OMG's Typed Notification Channel. This provides strongly-typed asynchronous communication: ideal for distributed systems. Quality of Service parameters can be used to request message persistence if the service needs to be reliable. Bare in mind, this works primarily for operations which don't require a response. -- Brenton Camac ---- '''On Reliability and Persistence''' The appeal of a system such as IBM's MQSeries is flexibility. People used it for a distributed comms mechanism, just to get data or requests, or messages, or whatever... from one place to another. In many cases, persistence of the messages was not a requirement. In other caes, though, people used the queue to buffer transaction requests; here persistence would obviously be required. It is true that persistence can be turned off at deployment time, and that your message throughput will go thru the roof if you take the disk i/o out of the pipeline. But this is not the only way message queues are deployed. There are many successful implementations of message queueing middleware; some use persistence, and some don't. -- DinoChiesa ---- Kyle, or anyone: If a message queue is used without persistent queues, then is there a difference between such a system and a "transaction processing monitor"? Is a TP monitor *more* reliable than a "non persistent message queue"? Thanks -- PatrickLogan ---- DistributedComputing architectures are usually one of two types -- either you have RemoteProcedureCall''''''s across a network, or you have message oriented systems, usually MessageOrientedMiddleware like MQSeries or Tuxedo. Message oriented systems have a lot of advantages over RPC systems, but they're also more complex to maintain. The problem begins with Ellen Ullman's unattributed quote in CloseToTheMachine: "I never saw a person with two systems who didn't want to connect them together." It's not simple to get two systems which use different messaging protocols to talk to each other: this is the whole point of MiddleWare. -- WillSargent ---- I'm going to try to add to (clarify?) the remarks by WillSargent to better answer Patrick's question. A TP System (as Will states) can use either an RPC mechanism or a Message Queue to move data in and out of itself. However, the communication mechanism is not the 'same' as the TP system, it's a technology that is 'used' by the TP system. The TP monitor is concerned with handling the transaction context and propagating that context around to all of the processing nodes and programs that deal with the data involved so that the transaction can either entirely succeed, or fail in such a way that consistency is preserved. -- KyleBrown ---- A TP Monitor is more than a distributed transaction coordinator (which implies a communication mechanism and a commit coordinator). The class of products called ''TP Monitor'' evolved to fulfill the requirements of large-scale distributed systems, many of which were doing transactions (and some of those transactions were distributed). Though the term is not rigorously defined, TP Monitors typically provide: * a scalable distributed communications mechanism * security features, such as: * encryption of messages * authorization of requests * audit capabilities * integrity guarantees (hash, checksum) * and where necessary to support the above, Authentication * a distributed transaction coordinator (following X/Open's DTP model) * load balancing of requests for scale-out * load buffering or throttling * data dependent routing of requests * a directory service (a la CORBA naming and/or query) * concurrency services, and threading * application lifecycle services TP Monitors did not focus on providing web integration or pretty front ends. As a class, these products evolved long before the browser became ubiquitous. In general, another shortcoming of TP Monitors was poor development tool support. Basically, it was: "here's a bag of APIs you can use to get at our TP Monitor services, grab a text editor and go to it!" Contrast this to the EJB-based app server market, where it seems all vendors recognize the importance of both the tooling and the runtime. CORBA tried to standardize many of the common but proprietary application services available in TPM's. Some of the CORBA offerings tried to reach the established TP Monitor space; for example Iona was marketing Orbix as an Object Transaction Monitor (OTM): you get an OO design and programming model but all the services you like from the TP Monitor model. In any case, it doesn't quite make sense to ask, is a TP Monitor more or less reliable than message queueing with non-persistent queues. The comms is but one part of the TP Monitor product. -- DinoChiesa, 12 April 2000 ----- Surely there must be patterns for such an old software architecture. Is anyone collecting them? -- some guy I think any asynchronous interaction requires a queue of some sort, even if it's a single entry buffer. So the "pipe/filter" paradigm would fit wouldn't it?. -- Richard Henderson. Some common paradigms: MessagePipeline, WorkCrew (parallelism), MessageRouter and PublishAndSubscribe. -- RossLonstein More patterns... * http://www.eaipatterns.com/ ---- Counterpoint. There is no such thing as a "message queueing architecture." There are systems in which all components are required to communicate via message passing, and there are systems in which all IPC is required to pass through message queues. However, these two constraints are not enough to define an architecture of significance. While these constraints say something about the connectors of the architecture, they say nothing at all about the components. Without that, you have described somewhat less than an architecture or architectural style. ---- As a RelationalWeenie, I would be inclined to use queries or stored procedures to put messages in the queues (tables). Then other apps can do whatever they want to with the messages, including delete them, mark them "complete", etc. Since most large orgs already have RDBMS infrastructure in place, it seems easier just to use that rather than add a dedicated message queuing system. * ''This is an easy way to totally cripple your application - message passing like this is normally extremely high volume, with lots and lots of very small updates happening constantly. The relational model isn't especially useful for handling this sort of data and the performance of a general-purpose database will be cripplingly slow, with enormous memory and CPU usage overhead. There's a possible place for relational tools within the message dispatch, if you're doing publish and subscribe, but not in the actual implementation of the queue itself. If you've got nothing better to do with your hardware and are dogmatically dedicated to relational systems (or, less provocatively, if you don't neither the expertise to write nor the money to buy a real messaging system), then you can toss one up on top of a database.'' * What is an example UseCase of such a high volume of messages? The only MQA's I have encountered more or less did the same stuff that stored procedures often do. * ''Also keep in mind that messaging systems are commonly used to provide resiliancy to a distributed system. Each machine in a distributed system runs a messaging daemon that provides the store and forward functionality. The daemon deals with network outages and forwards messages when the destination is available. Not everyone wants to put a DB engine on every machine, and accessing the DB over the network creates a dependency on the DB machine, not to mention a bottleneck.'' ** Oracle has a lot of tools for fallback and reliability. Are you suggesting that MQA's use an Internet-like approach to find alternative routes when a node goes down? ** ''Some MQA's might use an Internet-like approach, but that's not specifically what I was getting at. More specifically, an MQA provides applications a fire and forget type of asynchronous communication. For example, suppose we have a distributed system in which each node is asked to perform a subset of some large processing task. We might have a central server dispatching the workload to the individual nodes. The distributed nature of the total system, however, means that network outages are a real possibility. In addition, we would like the ability to temporarily stop processing on a specific node--even the central server--for house-keeping without affecting the rest of the system. Given these requirements, we might have a database on the central server to track overall progress, and use an asynchronous MQ system to dispatch the jobs and receive status updates to and from the distributed nodes. This approach decouples the processing from the database and the network. Each distributed processing daemon simply reads a work request from a local queue, does the work, and queues the result back to the central server via the local MQ process. If, for any reason, the central server is unreachable, the MQ process stores the message for delivery later. In addition, if a worker node is down for maintenance the central server doesn't necessarily need to take special action. It can fire off work requests with the knowledge that when the remote node is restored its work requests will be waiting for it in its queue. Meanwhile, all other nodes in the system continue to work on their current tasks completely oblivious to the maintenance done elsewhere. Note: The architecture I've just described is real. We use it at my place of work to manage a "cluster" of about 20 nodes, each with four CPUs constantly churning through data 24/7. The messages sent via the MQ system do not contain the data, rather they just contain the commands. The data is usually stored on local drives at each node, but is sometimes accessed via a shared filessystem.'' ** A database does not rule out putting commands in tables. To do this for a database setup, I would probably have the clients poll a central server(s) periodically and/or when they are done with any current requests. Maybe each client has a list of servers to use in case the primary one is down or too overloaded to take requests. I helped build a modem-based system to do something like this once. Each client was given a list of modems to try. The lists rotated the priority such that different clients would try a different modem first. They would keep trying the modems round-robin until either they got a confirmation message from the server that it received the sent info, or until a set time-out period was reached. ** ''Yes, that works fine for many situations, although I sense a client/server bias in the description. It might help to think of nodes in an MQA as closer to a peer-to-peer model than a client/server model. The central table approach changes command distribution from a push to a pull model; and with a pull model comes polling, which may not scale well. Also note that an MQA allows us to push multiple commands to each node so that they are waiting for pickup, even if the controlling node is taken down after the commands are sent. In this case, a processing node can complete one work item, post the results to a queue, and pick up the next item, all while the controller is down. When the controller comes back up it picks up all queued results, starts pushing out more commands, and everyone is happy.'' ** If it is asychronous and distributed and we don't care about things such as transactional integrity, then why not use '''email''' to send messages? By the way, polling is probably used anyhow even if not explicitly programmed by the application arranger. ** ''You could use e-mail. In fact an MQ system might use e-mail as a transport, but hide that from the application. I have written two MQ systems for different employers. The first supported multiple transport mechanisms, one was e-mail, the other was files sent over an NFS. The second MQ system I wrote used direct TCP/IP connections for transport and local files for persistance. The point of having an MQ system is not to invent some new method of moving data, but to abstract it from the applications in the system and make guarantees about delivery and availability. Polling: The first MQ system I wrote did use polling because it had to dial the modem every few hours to check for e-mail. The second system, however, allows clients to connect to the local MQ daemon and request a message from a queue. If no message is available the application blocks on a network read and goes to sleep until the daemon wakes it up by sending data. There may be polling going on in the TCP/IP implementation, but it's all local to the box and low cost. From the application's point of view, however, it's a blocking call.'' ** Well, most shops are familiar with '''email, RDBMS, and FTP''' (at least the ones I deal with). MQA introduces a foreign concept/protocol into the mix. Unless it is trivial and/or introduces a great benefit, I say go with one of those 3 (or a combo). At least one is bound to come pretty close to what is needed (although once-in-a-blue-moon needs do happen with anything) and at least two are open-standards with OSS solutions. ---- See also: DistributedComputing, MultiCaster, TupleSpace, MessagingPatterns, FlowBasedProgramming.