There are many different types of computer programs, most of which are mentioned in Wiki in various places (but nowhere are they listed in one place). By "type of computer program", we refer to a subset of programming which requires differing skills, methodologies, and tools than other types. A programmer can be familiar with one, several, or all of these types. The taxonomy presented may be a bit arbitrary, and certainly could use some fine tuning. The categories are not intended to be mutually exclusive -- a program can fall into more than one type. There are probably also some useful programs out there which don't fit into any of the types listed below. ------------- '''Business Programming'''. Also known as "business logic", "enterprise applications", and by a host of other names. This type consists mostly of custom software which is used to support an organization's management information system. This software is typically written in-house (or by contractors hired for the purpose), is usually not distributed outside the organization that it was written for, and is often highly integrated with the IT infrastructure and business processes of the organization. Typical characteristics of this software: * Extensive use of a RelationalDatabase. Whether a BigIron database like SqlServer or OracleDatabase, a database integrated into a programming environment (a la FoxPro), or a "toy" database like MicrosoftAccess, a database engine of some sort is almost always involved. * Performance usually not a major concern. This software runs on powerful servers and workstations; much of it is I/O bound user interface code or network-bound web or client-server. While CPU performance is important, optimizing away every last cycle is a waste of (development) time. Optimization time often pays off better by optimizing network I/O than CPU usage. * Rapid development is very important. Long compile times are unacceptable, as are programming environments which make development and debugging difficult. * Must operate in a distributed, heterogeneous environment. Even in a "single-architecture" shop, upgrades are typically deployed in stages. * Often maintained/upgraded in piecemeal fashion. The capability to replace one component with a newer version without breaking things is essential. ModuleDependencyProblem''''''s are unacceptable. * Cost of failure is moderate in most cases. Cost can be higher when implemented in database schema (an incorrect schema can lead to corrupt or inconsistent data in the database). Examples of this type of software: * "Business rules" -- code which describes and constrains transactions. Can be written as part of a database schema, or as code written separately that uses the database. * Data entry and report tools. CrudScreen''''''s, forms, etc. Various ways of entering and accessing business data. * E-commerce software. A specialization of this category (maybe it deserves a category all its own), software which powers an organization's website is increasingly important. * See CategoryBusinessDomain for more examples. Language features which benefit this category: * GarbageCollection. * Interpretation or VirtualMachines. Both increase portability; the former cuts down on compile time. * DynamicTyping. However, a language with a carefully designed static type system can work well (JavaLanguage is a key example). Large-team development seems to favor static/strong-typing than smaller teams or individuals. Another factor for typing style is whether financial transactions are involved or merely internal tracking and reporting. Strong typing is seen to reduce errors, but make development and changes more expensive. * Support for serialization, introspection, etc. Languages features which are detrimental to this category: * Compilation to native code. Reduces portability; couples code to the underlying platform. Also increases compile times. * PrematureOptimization. Many optimizations/assumptions made by typical C/C++ implementations make piecemeal deployment difficult or impossible -- by introducing the FragileBinaryInterfaceProblem, for example. ------------- '''Glue code/Scripting''' In many ways, this is similar to the business software above; many of the same design considerations apply. One major difference is the absence of a database here. Typical characteristics: * Often very simple operations which must be developed quickly. Rapid turnaround time is very important. Any compile-time greater than several seconds is detrimental. * Must function (and be developable) in limited resource environments--such as when the hard disk is dead, you're booting off of floppy, and the only editor is vi. * Extensive string/text processing. Examples: * CGIs for web sites, login/init scripts under Unix, etc. Language features which benefit this category. * Intepreted. If a VM or native binary is generated anywhere in the process, it essentially should be a "caching" operation and shouldn't be noticed by the developer under most circumstances. (PythonLanguage is a good example of this.) * Extensive string- and regular-expression handling capabilities. * Simplicity for simple applications. The shorter "hello, world!" is, the better. * Simple programming model. DynamicTyping, GarbageCollection, etc. Language features which are detrimental: * Premature optimizations of any sort. * Requirement for a GUI/IDE or similar environment. (If available, these can be useful) ------------- '''Realtime number-crunching'''. This type of software is about ''realtime data generation'' or ''digital signal processing'' in which computations are done piecemeal on data which is "streamed" in. Typical characteristics of this software: * Often deployed on hardware selected/designed for the computational task at hand (though not always--audio/video codecs are in this category, but are expected to run on garden-variety PCs and the like). * Performance is important to very-important, especially on low-end hardware. For DSP applications (especially those executing on digital signal processers embedded within consumer goods), squeezing out every last cycle (in order to use a less expensive part) is a key concern. * Typically deployed as a unit. * Often CPU-bound rather than IO-bound. * Often have RealTime constraints. Examples of this type of software: * Codecs * 3-D game software * Speech synthesis and recognition * Image processing Language features which benefit this category: * Compiled to native code. The overhead of VMs or interpreters is unacceptable. * High-quality optimization, including those selected/specified by the programmer. Of particular importance are optmiziations in CPU instruction scheduling. * Behavior of mathematical functions which is predictable and/or discoverable. How conditions like overflow/underflow, loss of precision, etc. are handled is very important. (Throwing an exception is often not acceptable; many algorithms take advantage of deterministic overflow behavior). * ManifestTyping. The overhead of dynamic typing is unacceptable here; and the ''programmer often wants control over the exact size and precision of numerical types''. Types which correspond nearly or exactly to those supported by the underlying processor are essential. (Static TypeInference might work in prototyping/early development; however explicit type declarations will probably be introduced at some point as an optimization step ''What? Inferred types are no less efficient than manifest types.''). Languages features which are detrimental to this category: * VMs, interpreters, dynamic typing. Too much overhead here. * Forced allocation from the heap, GarbageCollection. Memory usage patterns often are conducive to allocation objects from the stack; the overhead of a general-purpose heap is not acceptable. Likewise, unbounded GarbageCollection cycles are unacceptable here. ------------- '''Heavy batch computation''' - ''numeric analysis'' and ''symbol manipulation'', similar to the category above but differs in that usually the data is known in advance and there are no RealTime constraints. * characteristics mostly like in the previous category; however code performance is not at all as important. * often also done on high-end hardware with relatively high level languages (e.g., LispLanguage); often the estimated cost of optimisation exceeds that of buying more efficient hardware. * correctness is often as important as efficiency. Examples of this type of software: * Software for weather prediction, mineral exploration, other simulations (computational fluid dynamics, finite element modeling) * Large statistical analysis programs * Programs dealing with huge datasets (e.g., language corpora) Language features beneficial to this category: * StronglyTyped, whether dynamic or static * implicit parallelisation (e.g., Sisal, FortranLanguage) ---------- '''High performance data processing''' Another "high-performance" category; but here I/O performance and memory bandwidth are key design criteria, rather than efficient CPU utilization. In this sort of programming, throughput is very important--often a banner spec--and optimizations are often made to improve it. Typical characteristics of this software: * Domain independent. * Usually I/O bound, and highly sensitive to the performance of the underlying memory and I/O subsystems (including things such as processor caches, virtual memory, and network bandwidth). * Often developed by expert developers, and deployed as a third-party or free software product. Staff programmers in MIS departments do not work on this stuff generally (though Apache is a major exception). * May be "tuned" to a particular system, or take advantage of specific knowledge of a particular system to improve throughput. * High cost of failure--as this sort of software is often used as infrastructure, failures can cause widespread damage (especially if high-value application software runs on top). * Often operate on large datasets. Examples of this type of programming: * Database engines, web servers, and other middleware. * Compilers, linkers (maybe these deserve their own category; as these are more "offline" tools.) * Virtual machines, language runtimes. * Domain-independent libraries for such things as GUIs, etc. Language features which benefit this category: * Native binary support, static typing. Like the high-performance number crunching; the overhead of a VM or interpretation is frequently unacceptable--likewise with dynamic typing in many cases. * Extensive I/O capabilities, including such things as asynchronous I/O, pending on multiple descriptors, or other advanced features. * User control of memory management, including manual storage allocation/deallocation, stack-based allocation, and user-definable memory pools. * For some subcategories, type unsafe operations (forced casting, reinterpret_cast) are necessary. (Not all subcategories have this requriement--this is useful to a virtual machine; far less so to a web server). Language features which are detrimental to this category: * In some cases, DynamicTyping and GarbageCollection. (A language runtime, for example--if writing a garbage collector; you need explicit control over memory management). * Rudimentary I/O packages without advanced features. Definition issues * Isn't this considered "Systems Software" (also used below)? Or perhaps "System Tools"? ----------------- '''Monolithic Applications''' In many ways, one of the most complicated types of software--and one which some folks feel shouldn't be written (or at least, should be refactored into smaller parts). The monolithic application combines elements from many different categories above, into one big ball of wax. Characteristics of this category: * High degree of complexity, many LinesOfCode (or insert metric of your choice). * Requirements/use cases/user stories that are frequently complex and with lots of "special cases". User is often a computer neophyte. * User interface is a key concern. * Performance fairly important. While not a "banner spec"; program must appear to be responsive to the user for most operations. * Cost of failure usually low (annoyed user). * Due to complexity, can have long build times. Examples: * Office suite software (word processors, spreadsheets, presentation software, groupware) * Internet clients (web browsers, e-mail clients, messaging software) * Games (including 3-D games, which were also placed in the "number crunching" category). * Media authoring/manipulation software (Adobe PhotoShop, FinalCutPro, etc.) * Graphics programs / CAD / 3D modeling * text editors Language features which are beneficial: * Type-safety. The DynamicTyping/StaticTyping debate is less crucial here, but type safety remains important. If StaticTyping is used, though, good reflection capabilities are useful. * Ability to interface to other languages. AlternateHardAndSoftLayers is a good design technique. * Compilation to binary, or a fast virtual machine implementation. MooresLaw has made virtual machines a reasonable choice for this category (even five years ago; VMs were still to slow for much of this). * GarbageCollection. Many programs of this type written in C++ often have a homebrew or third-party memory management system included (an instance of SturgeonsLaw with Lisp refactored out, perhaps)? * Good support for use of external libraries * Hierarchical namespace * Fast compliation times. * Lots of error-checking in the compiler. Due to the large size of these programs, and the cost of fixing errors once deployed, UnitTest''''''s (and other runtime techniqes) are probably not sufficient. (Necessary, yes.) Language features which are detrimental: * PinkyAndTheBrainLanguage''''''s. * Manual storage allocation, other manifestations of a low-level programming model. The complexity of this software makes this unworkable. * Interpretation (unless only used for the highest level in the software). Too slow in many cases. * Languages optimized for small-scale software. Definition Issues * MonolithicDesign ------------- '''Data format converters''' like batch text-processing / image processing systems, protocol stacks, compilers, many Unix utilities, printer drivers etc. ------------- '''Systems programming''' Systems programming is that which a) interfaces directly with hardware (beyond peeks/pokes to RandomAccessMemory), and/or b) operates in the "kernel mode" of a microprocessor. In other words, code that runs on the metal. Characterstics of this category: * Direct access to hardware--either external devices located on some processor bus (which may or may not be a random-access memory bus), or the CPU's own registers (in particular, those which control a CPU's memory management unit). * Operates in kernel space, in particular when done on a "protected" OperatingSystem. * Limited runtime niceties--stack or heap space may be constrained, advanced language features may not be available, debugging support in particular is limited. * Incorrect behavior frequently results in a system crash (rather than a process being killed in a controlled fashion) * Throughput/speed often important; often there are RealTime concerns. * Generally not portable--often tied to a particular architecture or operating system. * Need to be concerned with processor's memory management system * Need to be concerned with hardware interrupts. Examples of this category: * Operating system kernels and related modules. * Device drivers * Bootloaders * Some embedded systems (see below). Language features which are beneficial: * Low-overhead runtime * Convenient way to access hardware registers and advanced processor features--either within the language or "dropping to assembly". * Limited memory footprint, user control of allocation strategies. * Static types, particularly those which model underlying hardware datatypes. * Physical aggregates--the ability to declare structs/records which match exactly the layout of a hardware descriptor, disk record, etc. (C structs come fairly close to this). * Ability to disable certain language optimizations (such as speculative loads/stores, caching of data in registers) when needed--the C/C++ "volatile" keyword for example. Language features which are detrimental. * Large runtimes * Mandatory GarbageCollection * Types which are opaque (no language-defined translation to hardware). ------------- '''Embedded systems''' Perhaps this category should be further subdivided, as there is a wide variation (compare a 4-bit microcontroller running inside a toaster with a PDA or set-top-box), but the embedded system is a key category all its own. The name of the game here is limited resources, especially memory. Software which runs on a modern PC or server will have many megabytes (if not gigabytes or even terabytes) of physical memory available--backed up by oodles of virtual memory on disk. And if a program ''still'' runs out of memory, the user has the option to go to FrysElectronics (or wherever) and buy more. In the embedded world, there usually is no disk, RAM is far more limited, and in many cases memory is not expandable. Key characteristics: * Limitations on memory. Usually no mass storage, RAM is limited, and expansion is often unavailable. Reducing both code size and dataset size is a key optimization. No virtual memory. * Slower CPU as well. In some embedded applications, CPU cycles are at a premium--8 bit micros (and even 4 bit ones) are often used in high-volume consumer devices. * In many applications, power consumption is also an issue. Which can affect software. * Often times, no way to expand or replace software (without special test harnesses not available ot consumers) * Wide variety of complexity of application. * Debugging may be difficult. * Many problems similar to systems programming. Examples At the high end (lots of processing power, high user interactivity): * PDAs * Cell phones * Set top boxes * Videogame consoles In the middle (lots of processing power, low user interactivity) * Automobile engine controllers * Controllers for things like printers, photocopiers, etc. At the low end (dirt cheap, user many not even know there's a computer in there) * Consumer electronics (toys, kitchen appliances, etc.) * Watches, calculators, etc. Language features which are beneficial: * Efficient memory use--small code size; efficient data model. * Ability to eliminate unneeded functionality from language runtime. * Other features dependent on application * Languages/implementations with cross-compilers and remote debugging capability; many embedded systems cannot host a development environment. Language features which are detrimental: * Large runtimes, especially ones which cannot have superfluous things removed. * For deeply-embedded systems, GarbageCollection. * Languages which assume the resources of a desktop system. * For some applications, DynamicTyping (especially deeply-embedded systems where this results in too much overhead). ---- The next three types aren't really categories all their own, independent of the above. Rather, they are particular sub-categories of the above which impose specific requirements--many of which are incompatible with many modern programming languages. '''Real-time systems''' A RealTime system is a system where the timeliness of a computation or side-effect is just as important as it's value. In other words, a late response is a wrong response. Note: here we are mainly referring to ''hard'' real-time; where the program fails irreversibly if a time constraint is not met; not systems where failure to meet a time constraint merely results in reduced quality of service. Also note that real-time doesn't address the severity of the consequences of failure (it could range from nuking a city or killing a patient, to something minor as ruining a printed page), just it's definition. Characteristics of this category: * Timing deadlines must be met, or system fails. Deadlines are specified as part of requirements. * No mass storage, virtual memory, etc. In some realtime system, even caches are a no-no (as proving bounded behavior in the presence of a cache is much more difficult). * In many cases, correctness of solution must be proven. * Often, a RealTimeOperatingSystem is used. * Might include a non-realtime component. In a printer, for example; the printhead controller needs to be realtime; the software that runs the UI and interacts with a host computer does not. * I/O (other than necessary to operate/monitor device(s) under control) very limited. Examples: * Many controller applications (engine controllers, printhead controllers, avionics, medical equipment). Language features which are beneficial: * All operations take a determinstic or bounded amount of time. * As much memory allocation as possbile on the "stack". Language features which are detrimental: * GarbageCollection. Outside a few hardware-assisted implementations which have been developed by researchers, this is incompatible with real-time. * In many real-time applications, even manual memory heap allocation (other than a bounded amount during system initialization) is unacceptable--many heap allocation strategies do not have bounded behavior. * Lazy evaluation--adds nondeterminism. ------------- '''Mission critical systems''' A mission-critical system is one where the ''consequences'' of failure are catastrophic. An airplane may crash, a patient may die, a multi-million dollar transaction will get bungled, etc. Often times, true mission-critical systems are subject to governmental regulatory regimes--any avionics software installed in an airplane (in the UnitedStates) must be developed with acceptable methodologies, reviewed, and approved by the FederalAviationAdministration, for example. In order to avert these catastrophies from happening, in many cases ProofOfCorrectness of an implementation is required--demonstrating an absence of defects through QualityAssurance, UnitTest''''''s, and other techniques which involve running the program and observing its behavior is insufficient. Characteristics of mission-critical systems: * Catastrophic consequences of failure. * Proof of correctness frequently required. * Sometimes, use of computational models less powerful than Turing machines (to get around the GeneralHaltingProblem) * Use of redundant/failsafe hardware. Examples: * Avionics, especially flight control software * Financial transaction software * Medical systems Language features which are beneficial: * Very strong static-type systems, such that ''all'' type errors are caught by the compiler. ** Erm, ErlangLanguage is used a lot for mission-critical systems but is very dynamic in its typing. Erlang urges the programmer to write the application as a layered model where a lower layer can always restart a higher layer if it crashes. * All language semantics defined. Language features which are detrimental: * Dynamic typing. DoesNotUnderstand is not allowed. * In many cases, any form of runtime polymorphism is a no-no. (In other words, no invoking a method through a base class reference--the compiler ''must'' know exactly what method is being invoked) * UndefinedBehavior--''all'' language semantics must be defined, anything that is undefined must be rejected by the compiler. Very few languages meet this requirement. * Heap-based memory allocation. The total memory consumed by a program should be provably bounded; out-of-memory conditions are never acceptable. * In many cases, recursion (unless it can be proven that recursion always terminates). ------------------- '''Data warehousing applications''' - databases, filesystems, VersionControlSystem''''''s, prevalence layers, transaction systems, etc. '''Secure/"Trusted" Systems'''. These two are opposite sides of the same coin; but they are listed together. A ''secure'' system is one which can safely execute unknown or untrusted (and possibly hostile) code (in a sandbox or some other limited environment). A ''"trusted"'' system is one which allows code to be executed in a hostile environment while preventing (or at least detecting) any attempts at tampering. Dealing with malicious programmers requires a lot more attention to design than merely securing a system against accidental defects. (Note: "Trusted" systems are ripe for abuse by the PowersThatBe...Microsoft would love it, for example, if our PCs were incapable of booting any OS other than Windows. And there already is such a PC--the XBox. Hackers have managed to defeat the security and make it load Linux; but expect that loophole to be closed.) Characteristics of this category: * Consequences if security is breached (theft of information, economic loss, data corruption/destruction, damage to property, or even injury or death in some cases). * Desired functionality is such that SeparationOfDataAndCode is not a sufficient design technique. Examples: * Applets, other software loaded by a web browser and run on a user's computer. * Applications run on a cell phone (in particular, to keep malicious CuteWare from surreptiously dialing Zimbabwe and racking up charges on the user's bill--a big concern for cell-phone vendors) * Multi-media clients designed to allow licensed playback of content while preventing piracy. * Smart cards Language features which are beneficial: * Virtual machines/execution environments (or interpreters). Much easier to secure one of these against hostile code than native code. (Generation of native-code sandboxes is possible; but most OperatingSystem''''''s don't provide anywhere near the security granularity to do this correctly). * All undefined or suspicious behavior is trapped; preferably before the program is run (such as ByteCodeVerification) * DynamicTyping everywhere (prevents type-attacks; where two different modules delivered at two different times have different assumptions about the type of an object embedded in the code. However; note that Java solves the problem with a StaticallyTyped language). ''Note: Java doesn't solve the problem 100% and many of the solutions are somewhat dynamic (load time) rather than static (compile time). Dynamic linking can allow various type attacks.'' * GarbageCollection. * Mandatory synchronization of multi-threaded constructs (or at least well-defined RaceCondition''''''s). * Defined semantics everywhere. Language features which are detrimental: * PointerArithmetic, stack-based objects, manual memory allocation. All can be used to generate WildPointer''''''s. * Type-unsafe casts of any sort. * Compilation to native code; difficult to secure. ---- "... EmbeddedSystems are a different universe ..." How many different "universes" are there for programming software? * software that will be installed and used by millions of people * software that will be used by millions of people, pre-installed on some piece of hardware (EmbeddedSystem''''''s) * software that will be used by a few people at one company (Is this what "bespoke software" means?) * software that will be used only by the original programmer Another way of categorizing software: * software that will be used once or twice * software that will be used a lot Are there other useful ways of categorizing the kinds of work programmers do? "Five Worlds," article by Joel Spolsky 2002-05-06 [http://www.joelonsoftware.com/articles/FiveWorlds.html], mentions five categories. (He immediately mentions that some of them are "basically the same", so really only 3). ---- RefactoringHint: This page is well structured, but too large. I think, that the types of programs deserve their own pages (partly these are already there and should be merged). This is very OnTopic, but not yet appropriately linked. ---- CategoryProgrammingLanguage