A computer industry writer once remarked that when animals are wounded, they often can still limp around in an attempt to find shelter in order to heal or evade a predator long enough for the heard to help out. However, when computer programs are "wounded", they often die a complete death. One binary digit gets flipped, and the result is often chaos or total and instant failure.

Related to this, once some colleagues and I had a fierce debate about whether report software that was to show averages should halt processing if it encountered a zero divisor, or continue on assuming an average of zero. Whether the divisor would ever be zero under "normal" circumstances was hard to determine from the requirements. Officially it wouldn't, but I suspected that such might slip through on occasion. Letting it continue with a zero average at least allows the report to be available. Sometimes an imperfect report is better than none when the user is in a hurry and there is no time to trace bad divisors. Ideas such as putting a large warning message on it were also considered.

''Some report software (such as Cognos' products) expect divide-by-zero situations to occur.  Cognos products continue processing, but show the result as "/0", telling the user that a divide-by-zero was encountered.''

* If a process feeds to another system that doesn't recognize that, it can crash.
* "/0" is not very user-friendly. Ideally a tool gives a choice, such as "Div/0" or "Div-Zero".
* That feature should be optional. In some situations, you want it to crash (give error message and stop).

In a more general sense, if the CPU or interpreter encounters certain fatal errors, in some circumstances it may be better to try to "push" them through. For example, I once had some old packaged software that did a CPU timing loop to know how to calibrate graphics events. When faster CPUs came along, the timing loop crashed with an overflow error, preventing the application from starting. (At least this is what user-groups had ascertained was happening.) If there was a way to perhaps say "Continue/Ignore" on such error, then at least the application might run, perhaps with jerky graphics, but run nonetheless.

Suppose software ran your pace-maker and you were out hiking away from civilization and the pace-maker had a fatal floating-point error of some kind. Would you rather the software just stop, or that it push through a zero and try to continue?

''That's a huge leap from generating a report or assessing CPU speed for gameplay to life-critical embedded software. No single solution applies across the range of software.''

I did not mean that it applies across the board. It is a question to ask, not an answer.

'''Hmm.''' This seems to be a discussion on one facet of graceful failure recovery. Are there pages here already dedicated to that? Let's take a gander around; perhaps this page can be merged with one already started on this topic. 

''Good Idea'' FailFast, FaultTolerance, FaultIsolation, 

----------------

This page reeks of ArgumentByAnalogy. A computer with a bad bit is not a wounded animal, nor does that have any real bearing on a pacemaker. Mission critical systems in the real world are made of independent redundant individually-FailFast systems, not a single fault tolerant system.

To correct the analogy: if a bone breaks in an animal, the animal can limp away, but not because a single system is fault tolerant. That animal lost its bone, that system. It doesn't limp away on the broken bone; it uses its other 3 legs to limp away. It abandons the broken system and relies on backups. The animal has a large of redundancy of systems to accomplish this. It's an organic system, horribly inefficient, but very resilient. Most computer systems are not organic with many interrelated backup systems. They are generally made of a very small set of horribly brittle systems with little to no redundancy whatsoever. Just like an animal each individual system is brittle, but unlike an animal there are very few to no backup systems.

If you want to take ques from nature, for a mission critical system bring along 3 computers, each with their own hardware and independently written algorithm, and bring along a voting-holding machine which takes acts on the most popular vote and reboots the machine which makes an unpopular vote. 

A resilient system in the real world is not a system made resilient all the way through. A resilient system is a collection of brittle systems with many backups. It's not to say you can't make a single system resilient the whole way through; it's just much more expensive to do that, both in nature and in programing. It's simpler to abandon a broken system and rely on backups until the "main" system can be fixed. To try and make a system resilient all the way through requires a exorbitant among of error checking code, most of which is nigh impossible to test, thereby actually increasing bugs and brittleness. 

''While redundancy is part of it, it's not the whole story. Another strategy in organic systems is the ability to adapt. If an animal loses some of it's vision, it may ''learn'' to rely on it's nose more, and visa-versa.''

--------


I usually hear the terms used as follows:
* '''robust''' - difficult to break
* '''graceful degradation''' - can continue to provide useful service, albeit degraded, after a partial failure
* '''resilient''' - can recover or be recovered; how readily service can be restored once cause of failure alleviated. 
* '''reliable''' - uptime/totaltime; affected by all of the above.

So, where you use the word 'resilient', above, it strikes me as GracefulDegradation, and you've completely neglected true resilience - i.e. the analogy to how the animal will 'heal'.

As to whether you claim that a "system" is fault-tolerant to some class of faults, vs. "made of fail-fast parts", really depends on how you circumscribe the "system". A fault-tolerant mission-critical system might be constructed of fault-tolerant parts just as easily as fail-fast parts. Animals with a broken limb will still use that limb unless it has taken too much damage. (There's a difference between a hairline fracture and extensive shattering.)

Computer systems use many mechanisms for tolerance to fault: RAID, bad-block tracking for HDDs and SSDs, CRCs to both correct and identify faults, watchdogs, heartbeats, etc.. 

The LimpVersusDie opening seems to be remarking upon StateOfTheArt GeneralPurposeProgrammingLanguage''''''s at the time. TypeSafety has pursued 'robustness' to ever greater degrees, but it would be nice if our languages also facilitated - made it easy - to achieve graceful degradation and high levels of resilience. Languages that help us declare fallback systems, that implicitly perform caching to protect against disruption, that have advanced mechanisms to restore communications after loss, etc. PersistentLanguage covers resilience for a small but relevant subset of faults, namely disruption, but doesn't by itself imply GracefulDegradation.

One reason for the weakness of languages today is that they're running "locally" by default. That was a good default assumption twenty years ago, perhaps, but the world is now wired together, and "program systems" tend to cross multiple architectures and systems that can fail independently. We need programming languages that recognize this, continue providing service even as individual components fail, that help identify failure conditions to turn what would otherwise be 'subtle' failures into FailFast subsystems that can be rebuilt after full failure, and that resiliently regenerate after temporary failures or after the faulty nodes are replaced. 

Redundancy is an important ingredient in achieving these properties, I'm sure. Data redundancy is needed in case disks or nodes fail, for example, and is required to regenerate systems when they recover. Redundancy also supports load-balancing and increased capacity while running normally, so serves as a basis for GracefulDegradation.

-------
A financial transaction either happened or didn't happen.  There may be a myriad of reasons ''why'' it didn't happen (application error, bad data, communications error, gateway error, bank rejection, etc.), but there's no room for fuzziness or limping along when it comes to whether or not it ''did'' happen.  Sometimes it just makes sense to 'die'.  Often, if not usually, in fact, it's better to fail than to proceed with invalid data.  If you don't shoot that horse, you're going to be in for a lot of trouble.

''That may be the ideal, but in general it ain't true. Fraud, failure, comms loss, shipping loss, etc. have all in the past been serious partial-failures for financial transactions. ACID properties for financial transactions attempt to achieve something better than what earlier systems ever offered, but even those are rare between banks, where all sorts of compensatory protocols are involved that don't achieve ACID properties.''

-------
All of this limping along is still relying on ArgumentByAnalogy. A computer program cannot limp. 

Limping is one of two things in an animal: 1- Not using the broken leg and using the 3 backup systems, aka the other legs. 2- Being gentle with a leg with a hairline fracture, being careful to not put a lot of weight on it. 

A computer program is not 1. It is not composed of independent backup systems. For some programming languages, the slightest misbehaving component can kill everything. 

More importantly, a computer program is not 2. A bone when fractured decays in properties and performance in a linear fashion. When a bone has a hairline fracture, it still has the properties of a bone. It can still support weight. Computer programs are vastly more complex and show chaotic behavior. If you have the slightest logic flaw in your program, or a cosmic ray twiddles a bit, then your program will not degrade in this linear fashion. The slightest unexpected "butterfly wing flap" will drastically change the characteristics of your program, very much unlike a bone. 

A computer program is not an animal! Stop arguing by analogy! (ArgumentByAnalogy)

However, a robust and reliable computer system is like that of an animal. It is a collection of individually brittle systems with backups. Each process is protected from each other, so while each individual program is quite brittle, the collection of these systems can be made robust and reliable. 

''I think your appeal on this subject is lacking perspective. Computer programs CAN and DO achieve a variety of behaviors that are very much analogous to "limping". Exponential backoff in TCP. Thrashing of vmem during large GCs or when too many apps are open. Continuing in the face of an unremedied exception. Undetected bit errors in UDP transport.''

''Computer programs are often composed of subsystems that achieve self-healing and backups, or that detect and recover after bit errors, or that can simply tolerate strange observations that occur as a result of errors. Sure, '''certain''' bit errors might result in critical and catastrophic breakdown or wild misbehavior. But animals and humans are not so different from computers in that regard. A heart palpitation or blood clot can kill a human or animal. A small breach or bit of damage to the spine can kill or paralyze permanently. One can choke on a small bone and die. Sometimes people even become allergic to themselves due to a tiny error by one t-cell somewhere in their body - a condition known as 'auto-immune disease', and reasonably common.''

''As far as the distinction between "computer program" and "computer system" - that's weak and artificial. At best you can distinguish between hardware and software, and otherwise within a ''specific'' software model (such as the Unix software model vs. Win3.1 software model). Consider: ErlangLanguage processes are also protected from one another, but a program often consists of more than one process. Same is true for Unix processes, once you start working with PipesAndFilters. Programs can also be made robust, reliable, even resilient. Windows 3.1 software could and often did thrash memory of other programs, and even later (Windows 95, 98, ME) it was quite common for one to see the "blue screen of death" - a point I make only to enforce that your "multi-process" and "each process is protected from the others" distinctions are at best incomplete and narrow minded. In the broad view, it's all one software system. In the broad view, that extends even to the Internet.'' 

We're arguing over the validity of the analogy. I'm not sure I should continue. However, exponential backoff is not limping to me. That's par for the course. Limping to me is the result of an unexpected really bad thing (tm), not normal operation like exponential backoff. You seem to disagree with me on my assessment that computers are vastly different than animals. Computers programs / systems are many magnitudes more complex than animal body parts, and they lack enough similarity on the relevant properties. Any such analogies of a bone to a complicated computer program are automatically bunk as a basis of an argument. (Again, see ArgumentByAnalogy.) That was my entire point. I'm not saying you should kill a process on every error. I'm merely noting that most of this page is BS. A computer cannot have a hairline fracture. Exponential backoff is not a hairline fracture. The animal is not built with a hairline fracture in mind to solve a real world problem. A hairline fracture is never expected nor normal course of operation. Exponential backoff is an expected situation in normal course of operation. Animals and computers (as they exist today, and as commonly programmed today) are sufficiency different to make any argument based on analogy entirely null and void. 

''I suspect you'll find most large long-lived animals, including humans, suffer hairline fractures in at least some bones during the normal course of their lifetimes. In that sense, these sorts of partial failures are "expected situations in the normal course of an animal's operation." That they can heal from these failures indicates that they are accounted for. Are you assuming that failures that are anticipated and accounted for don't qualify towards resilience and robustness of a system? That is, are you assuming resilience and robustness can't be achieved by design?''

''Sigh. Regardless, your conclusion is utterly non-sequitur. The LimpVersusDie page doesn't actually have an ArgumentByAnalogy. That is, the LimpVersusDie page contains no argument of the form: "X is true for animals, and programs/computers/whatever are like animals, therefore X is true for programs/computers/whatever". Since there is no ArgumentByAnalogy, what is it you've been railing against? Oh, yes, what you've ''actually'' been objecting to is the ''use'' of analogy to ''explain'' a property that we might wish to achieve to a greater degree in computing. Explanation by analogy is not ArgumentByAnalogy. But even if this page did have an ArgumentByAnalogy, your objection to this analogy is constructed of various invalid premises. For example, you just now said "computer programs / systems are many magnitudes more complex than animal body parts", but any programmer with a comprehension of biology who has studied cells, DNA, ribosomes, enzymes, ATP, protein construction, etc. can tell you that ''even a single cell'' is more complex than many computer programs.''

I suggest at least reading ArgumentByAnalogy. I'm not suggested it contains any literal argument by analogy ala Plato. What it does contain is explanation and reasoning rooted in analogy with whose applicability I very much disagree.

''I suggest you stop patronizing me by assuming I have not read that page. If you don't like an explanation, '''but you understood it''', then you have no room for complaint. Explanations don't need to be "liked". The only purpose of an explanation is to help an audience achieve understanding.''

Computer systems when faced with an unexpected problem generally either die asap or behave in a chaotic fashion. 

''That's simply untrue. I suspect your belief is rooted in '''confirmation bias''' (http://en.wikipedia.org/wiki/Confirmation_bias). Fact is, computer systems - when faced with unexpected problems - often behave quite well. But you don't notice when the computer system behaves well. You only notice when the computer system dies or behaves in a chaotic fashion. Therefore, you fail to notice when the computer system behaves well in the face of the unexpected problem. Indeed, you're unlikely to even know that the problem exists until after it is severe enough to muck things up.''

Animals sometimes display this when presented with a problem: e.g. your DNA example: a small error has chaotic effects on the system. However, the limping analogy applies to the other aspect of animal systems, that they tend to decay in a linear way to unexpected or abnormal conditions. I firmly believe this is a property which computer systems do not have in the same way as animals. A limb when injured gracefully decays, whereas a computer program with a single bit twiddled, or a single error case unhandled, can behave quite chaotically. (However, expected errors can be dealt with gracefully.) 

''Your firm belief... is wrong. Computer systems often degrade gracefully under abnormal conditions. Animals and computers are quite similar in this regard: each can fail catastrophically under some failure conditions, each can degrade gracefully under others. Whether it's "expected" or not shouldn't be part of the analysis or analogy unless you wish to consider whether the failure modes of animals were subject to 'expectations' of evolution or intelligent design. Expectations certainly weren't part of the analogy you're railing against.''

An analogy is useful only insofaras everyone understands and agrees with it. If someone disagrees with the analogy, do not persist with it. It only brings more confusion and arguments over the applicability of analogy instead of discussion of the actual technical issue. 

''No, an analogy in explanation is useful if everyone merely understands it. Agreement is not required. It's hard to build an explanation or analogy that resonates with the audience. But even an utterly stupid, offensive, disagreeable analogy can go a long way towards establishing comprehension. Anyhow, if your wish is to focus on the actual technical issue, then your decision to focus on applicability of an analogy smells of hypocrisy. Choosing to not "persist with" the analogy is fine, but railing against it - especially after it has successfully served its purpose - is utterly pointless.''

I also believe as a matter of empirical fact that it's "better" to make computer systems robust by making the individual components fail fast and brittle in the face of unexpected errors, like sanity check asserts of internal class invariants.

''What does "individual" component mean when talking about computer systems? A bit? An expression? A function? A library? A module? A language object? An OS process? A full OS with all processes? A full machine (CPU + local memory)? A domain of connected machines on a LAN? Of what elements are "individual" components constructed? You need a proper definition for empirical analysis, which makes me doubt your assertions about empirical fact. Add that to the confirmation bias, described above.''

* Whatever portion of the program which makes sense for isolation. It can be a collection of classes in Java published together aka a package, a process in C++, an entire OS, a dedicated piece of embedded hardware running some program, etc., whatever makes sense and is properly isolated. It's been my observation that when people try to handle "errors" which I would traditionally handle with a "killing-process assert", things go very badly. 
* ''Do you '''also''' mean to say that even an OS shouldn't handle the death of a process because they're part of the same 'computer system component' in some view? If not, then your statement is logically inconsistent from any objective viewpoint. If you merely wish to say that "errors shouldn't be handled locally unless you can recover from them locally", that's reasonable, but it does nothing to prevent an 'individual component' of a computer system from being robust against even unexpected errors. That is, the computer system might not anticipate that particular ''error (cause)'', but might know how to deal with its ''symptoms (effects)''.''

I also believe that errors which can be expected can and should be caught and dealt with gracefully and correctly. "Better" in this context is a measure of good for the company. Ideally, one would like to make the program robust the whole way through, but this is much more costly from a business perspective than making a computer product as a collection of systems which present robust external interfaces but internally use brittle fail-fast error handling like asserts. The idea is that such asserts allow you establish class invariants which allow you to better and more quickly reason about correctness of code, and eliminates error handling code for "impossible" errors. Such error handling code for violated class invariants is basically dead code, (near) impossible to test, and thus much more likely to be buggy. Either your test suite becomes inordinately large, or your tests lack code coverage. Also, your code becomes much larger and more complex than what it needs to be to reach the goal. See GatedCommunity, AssertionsAsDefensiveProgramming, ComplexityIsBugs. 

''I certainly agree that FailFast has its place. The LimpVersusDie page does not suggest the contrary; indeed, it opens by discussing a debate about which design results in a more robust system overall. But the ideal is not "robust the whole way through". Robustness itself is a means to an end, like reliability, predictability under composition, and survivability. Resilience (self-healing) can pick up where robustness is missing, and is somewhat more flexible in its ability to handle hardware failures. FailFast for predictable subsets of the system can improve resilience by making it easier for other parts of the system (including humans) detect and repair or replace the failing subsystem. They can also help limit damage... like a fuse getting burned out to avoid damage in the fine electronics.''

I think we're agreeing about how to write programs (mostly) but disagreeing over analogy. I'll leave it at that. 

''I'm only playing DevilsAdvocate on behalf of the analogy; I don't care for it one way or the other. My objection is with your objection to the analogy - in particular, that you cried fallacy where there was none, that you've repeatedly made untrue assertions (likely from cherry-picking) about the relative nature of animals and computers, that you've let your beliefs about what 'should' be shape your beliefs about what 'is', etc. If you had provided a valid objection to the analogy, I'd let it stand.''

My problem with the analogy is and always has been that computer systems do not react gracefully to unexpected errors. Animals can. 
* ''Sigh. If you mean to say: "computer systems '''never''' react gracefully to unexpected errors", then you are wrong! If you mean to say: "computer systems '''sometimes fail''' to react gracefully to unexpected errors", then you've FAILED to make a logical distinction between computer systems and animals. Either way, the premise for your objection to the analogy is invalid.''
* [Well actually, if the code you programmed purposely reacted gracefully to the error, then the error was expected. Even if it was expected only a small amount for the person to know to write the error trap code, that means it was somewhat expected. An unexpected error can't really be programmed for. If it is unexpected entirely, then you can't even build code to recover. If it was somewhat expected, then you can. This is actually a nit pick and a problem with the word "exception". I don't think exceptional unexpected errors can be programmed for, I think only expected ones can. Even if you just reset the system when there was an "unexpected error", that's really expected since someone still put the code in there for the system to reset. You may have had a low expectation for that catastrophic problem, but you still had the expectation.]
* ''Oh, so your argument is: "Computer systems don't react gracefully to unexpected errors, which I hereby define as any error to which the system does not react gracefully", right? Nice tautology. Perhaps we should apply the same definition to Animals. But I do not believe it is what the author of the rant against the analogy intended. And it would be incorrect regardless: computers recover from or continue under conditions (states of being). Not all of these conditions it can continue under and eventually recover from are considered 'ideal'. But consider a bit-flip in an iterative algorithm to compute arbitrary-precision floating-point division: a bit flipped in the result would just result in many more iterations to reach the answer.  When errors move the system into a continuable or recoverable conditions, the system will often limp along, perhaps even recover gracefully. This is true even when the error was completely unexpected. The fact that many errors result in similar partial-failure conditions means that programming for one expected error may handle (and even hide!) hundreds of unexpected ones.''  
* [My wording was missing some important details. I meant to say that if you ''put in at least some code to react gracefully'', then you expected the error. Computer systems do react gracefully by luck and depending on how serious the error is, too, all by themselves. Actually, sometimes it is the operating system that helps here, since they can recover problems that the programmer didn't take care of, and this is graceful recovery by someone else (consider that in Windows 3.1 a single program could bring the whole OS down since the OS programmers didn't think of adding grace to the OS).  My point was that unexpected errors are near impossible to program for if they are "unexpected", because if they were slightly expected you could still program recovery for them. There are always cases where without programming a recovery for unexpected errors, the program limps along anyway (but don't think that the OS designers didn't add some grace to protect from bad programs!). The program can shoot itself in the foot, but shooting in the foot doesn't always disable the program from continuing on. But is shooting in the foot graceful, one has to ask? A lot of programs have bizarre errors and continue on after the person closes the little error box. Is the bizarre error box graceful, or is it actually frightening to the end user, even if the program does work after he clicks off that bizarre message? Programming for unexpected errors might be another topic all together but I thought I would mention my views on unexpected vs expected. There's also the problem with what "grace" is defined as, I guess. Extremely ungraceful programs could be ones which limp along for hours, and then you find out your entire hard drive is corrupted only after being "graced" by it not crashing many hours before. What is Grace really.]

This is a result of the wildly different properties of the systems. Compare "malformed input data causing a buffer overrun to take over a system" vs "I broke my ankle and I'm now limping". The analogy hides the difference, assuming that all errors are recoverable and that a "good" computer program should limp along in the face of all errors, expected and unexpected. The very nature of computer programs and animal bodies is so vastly different that trying to draw maxims from one to another is a horribly bad idea. 

''The examples you're conveniently ignoring are things like "malformed input data causes interpretation failure and causes fallback" vs "damage to leg causes blood clot which rushes to brain and causes death". To be logical, you need to consider all the data, not just the data that supports your conclusion. There is a word phrase for your behavior: it's called cherry picking (http://en.wikipedia.org/wiki/Cherry_picking). It's an easy trap to fall into.''

* If I can cherry pick examples, then the analogy does not hold as a basis of argument or as a rationale, perhaps only as expose or explanation. My claim is not "the analogy is always invalid". My claim is that some cases fit the analogy, some do not, and the analogy only obfuscates the technical issues under discussion by making people argue the validity of a comparison, the similarity of properties, etc. The things under this analogy are sufficiently different that no one ''should'' try to draw any parallels. At best, it happens to work out. At worst, people believe in false conclusions from a bogus analogy. 
* ''The fact that 'some cases fit the analogy, some do not' seems in this case to strengthen the analogy.''
** Wait what? 
** ''When you said "some cases fit the analogy, some do not", you spoke of the analogy to computers dying and animals limping, did you not? But the truth is that animals can also die or limp. Animals can, and do, die of trauma that would be relatively minor when measured in terms of physical trauma. A single bit flips. A single blood-clot forms and clogs some vessels. A buffer overrun causes a computer subsystem to crash. A flashing light causes epileptic fits. To note that some cases fail to match your "computers die, animals limp" expectations is to note a greater '''similarity''' between computers and animals, '''not''' a difference. And similarities, fundamentally, strengthen analogies.''
* ''After all, animals can also die due to relatively minor damage or a physiological hiccup, just like a computer system that breaks down after a single bit is flipped. And who has been drawing invalid conclusions? I don't believe any reader has been lead by the opening into believing the analogy should be pushed to its outer limits.''
* AllAbstractionsLie. The question is how much do they lie. In this case, my opinion is that the abstraction of a computer system as an animal is so far out there that the abstraction is way beyond any useful threshold besides colorful metaphor. I just want people to stop saying "Computer systems should limp like an animal" as some sort of goal, as using animals as an abstraction of computer systems is horribly not useful. When you say that no reader should take it to extremes, yes I agree. That's true of any abstraction. However, ''good'' models generally let us know where that boundary is. Animals v computers has no such boundary. It's just gray. 
* ''Were '''you''' seriously confused about how far to take the model? Can you present anyone (even one person in the audience) who has overstepped the limits of application for the analogy (outside of humor)? If not, then your argument that the model might be taken too far is just unsubstantiated pessimism about what ''might'' go wrong. Explanations, like anything else, are written for an audience. English is already sufficiently vague that you should get off your pedestal about animals vs. computers and just tackle the whole language and how people "might" misinterpret things - I'm sure you can find much more evidence for that cause than for this particular analogy. And, based on how you keep bringing it up, I suspect you're mixing up two subjects: (1) your beliefs regarding how computer systems "should" behave under various classes of partial failure; (2) your objection to the analogy.'' 
* [Can someone present anyone that has overstepped the limits? Yes indeed.  Ward C., Alan Kay, and many OO people claim that programming is just like biology, and human cells (or that we should be heading in that direction, using biology as a perfect system to follow). Anyone else that has overstepped the limits? Linus Torvalds (videos on Youtube) thinks that programs "evolve" rather than become designed by intelligent beings (even though evolution takes thousands if not millions of years, whereas programs can be drastically changed and created from scratch in 1 year or less). Or maybe you also haven't read much of Dijkstra, who thinks that artificial intelligence is sick and useless. Possibly you also didn't read Dijkstra's articles about anthropomorphism and how much he thought the programming community had overstepped and went too far with them, giving programs human traits like real life things (and how much he thought OO was silly, due to it being an anthropomorphism). Maybe you also did not read about how Dijkstra disliked greatly when people started making comparisons to programs as if they were humans, one part of the program talking and speaking, another part of the program being "that guy over there" and the other part of the program being "this guy over here". Or maybe you missed his article on how he was angry when someone enjoyed the question "do submarines swim?" worrying that some people actually wondered and thought it was important to consider if submarines could be animals, not machines.]
* ''Maybe you just now failed to demonstrate any "overstepping". Maybe you're unskilled in English, so you're unaware that "evolve" has a meaning much older than and outside of speciation. Maybe humans and animals can be considered biological machines, so we aren't giving machines "human traits" but rather giving metal machines "meat machine traits". If the problem is of assigning traits, then the question "do humans limp" or "do humans swim" should be just as problematic as "do computers limp" or "do submarines swim". Why would such assignment of traits be important?''
* I say what I mean. I have no problem with describing a computer as limping per se. My problem is trying to determine useful properties or goals of computer systems by examining animal systems. 
* ''Why would that be a problem? If I were inspired to create a 'survivable' network based on, say, the interconnected root systems of certain forests, or perhaps the hibernation/crystallization prowess of the waterbear, why would that be a problem? If I were to develop a learning computer system based on the nervous systems of animals and weakening/strengthening of neural connections, or perhaps build optimizing heuristics based on pathfinding observed in ants, why would that be a problem? Your argument just now went from nonsense denying the potential robustness of individual program components to rejecting all inspiration for "useful properties or goals of computer systems by examining animal system." You did "say what you mean", right? I'm not at all impressed with your justifications thus far. Can you justify this even harder-line position? I think not.''
* You keep misconstruing, or I keep misspeaking. Probably a combination. Just because animal X has such and such properties, which are desirable for animal X, does not mean that we ''should'' model computer systems based on animal X. Limping is a desirable property of an animal. It may not be a desirable property of a computer system. 
* ''Where has someone suggested that "limping is a property of an animal '''therefore, it *should*''' be a property of computers"? I'm under the impression that your objection is to a non-event.''
* Attempting to draw direct analogies on such vagueries as this page does does more harm than good. Your example "model neural networks" is immediately bunk. It's a StrawMan: Yes if your goal is to model animals then you should take the properties directly from animals. 
* ''Perhaps it wasn't clear. The "goal" stated above was to "develop a learning system". The strategy or inspiration was "based on the nervous systems of animals and weakening/strengthening of neural systems". I did not intend to suggest a goal of modeling neural networks anywhere above, though I can see how the words were interpreted that way.''
* Also, trying to learn how ants do pathfinding is quite different than an animal limping on a broken leg; I agree that the former makes some sense. However, to argue based on example is still missing my point that any actual similarities are coincidental or purposefully contrived. Either you picked an animal and a property which happened to work out, or you specifically picked an animal which had this property which you already knew worked for computer systems. You should not pick an animal at random and a property of that animal at random and then say a computer should have it. The analogy does not always work. 
* ''Clearly, monkeys have long, prehensile tails that act like hooks to let them hang from trees; therefore, so should computer programs - have hooks to hang them from process trees, that is. Can you find anyone whom you can accuse of seriously suggesting computers *should* have properties *merely* on the basis that an animal has it? LimpVersusDie is certainly NOT an instance: the opening clearly introduces the analogy '''after''' having considered the question of whether a property analogous to limping - in particular, continuing through a divide-by-zero error - was desirable.''
* Computer systems are radically different than animals, and analogies, especially in this situation of drastically different things, are little more than teaching tools, not a basis for a search of desirable properties. I think you disagree with that, though I'm still not sure I'm getting my point across. (The paragraph below starting with "However the difference" below written by someone else is a good attempt.)
* When I say the analogy is bad, that computers don't limp, people immediately bring up examples of computers "limping". The term is sufficiently vague that any behavior in the face of an error is considered limping, whereas it conjures in my mind something more serious than handling an expected error. 
* ''Limping does not 'fix' the broken leg; it is not "handling" an error. It's continuing in the face of error. This is the image it conjures in most persons minds, and is common to the examples presented. What does it conjure to your mind?''
* The impreciseness of terms is getting to me. Also, limping is a loaded word in this context. It's immediately seen as a good plan; it's much better to limp than die, even though real computer systems are nowhere near as simple. In some cases, a missing database connection is not a good reason to die. You should "limp along". However, sudden your hashmap contains 15 entries but your stored size variable says 16. Oh shit, somethings bad, maybe I should just die now and go into a debuggable state. However, someone will say that it's not "limping" and instead dying, which immediately brings up bad connotations. I want to avoid such connotations when discussing technical issues. The connotations come from an understanding of animal systems where it's always better to limp than die, but that analogy does not hold when discussing computer systems. Computer programs should not strive on in favor of all other ends. It's a mismatch of characteristics: animals are "designed" to self perpetuate at all costs. Computers are not. [Insert a bazillion other very important differences here.] These bazillions of differences make (nearly?) all analogy between animals and computers horribly misplaced and misleading. 
* ''If the FalseDichotomy is getting to you, you could have productively objected to it rather than claiming ArgumentByAnalogy. And I agree with your point against 'connotations', but I don't believe the connotation that limping is always better than dying among animals has significantly influenced the discussions on this page (thus references to 'fierce debates' and the advantages of FailFast). I don't believe that there are as many differences between computer-systems and animals as you seem to suggest, especially when dealing with larger scale computer systems (i.e. operating systems, virtual machines, networks) at which point specific computations become analogous to cells in the body. There is much precedent in biology for the body killing off cells that appear to be malfunctioning in order to let the whole system operate in a degraded state while healing or recovery occurs. And there is no particular connotation that this is a bad thing, either.''

[We've all had situations where a computer limps along for a while, and then could even get injured more later on if the problem is not addressed, similar to animals. Take for example Windows 98 which after a few serious errors, you could still use it for a few more hours. It limps along if you are careful.  It may even repair itself if you close some programs but leave others running. Or take a button that is disabled when it shouldn't be (broken leg) but you still have another leg to limp on (other shortcut keys or menu items).]

[However the difference between animals and computer programs is that programs do not have physical parts that can wear out. The computer has physical parts that can wear out.  This is one reason why some people object to the term "computer science" since computers are physical, and programming is about "computing". Another difference between animals and programs is that we can directly control the program to the most exact levels. One can change every little bit about the program at the lowest levels. We cannot directly control a running animal - we can only influence it by teaching it and training it. That may change when we are able to hack with DNA, however DNA is still something we are hacking and not actually in full control of where we can create it from thin air (programs can be created from thin air, animals cannot). A computer program can be changed without it being "taught", whereas animals require being taught or influenced in order to change slowly. A program could be designed so that it could be taught, but it is not a requirement for the program to change behavior - one could just edit the source directly and change it that way. With animals editing the source can not easily be done, unless we advance our DNA source code knowledge and figure out more how DNA works and how to program it (which might even be so dangerous that we'd better stick with computer programs and forget I even mentioned that) - but it does beg the question, about whether or not evolution and DNA has some programmer or source code behind it, even if it is all accidental and not a God] 

------

Cancer is an example of the horrible things that can happen when a system does not fail fast and decide to "limp", cancerous cells are nothing but cells that are worn out (or "limping") (after many divisions and/or damage by carcinogens), and that, instead of committing suicide like good behaved cells, decide to live and reproduce without control. The only reason we believe biological systems are more resilient than our software systems is the extreme redundancy (a typical biological systems has millions of cells) and therefore it seems to heal, when in reality it "micro-dies" or "micro-fails-fast" and then "micro-replaces" the damaged elements of the systems with new ones... if defective elements fail to "micro-die" then all the system is compromised and eventually destroyed. Same thing applies to software, sometimes, the best solution for a problem is just to fail and start over, instead of continue to run in failure mode, and eventually poisoning the whole system.

''Although it's still a nascent field, from what I've read, there ''are'' many protections against run-away cells; and many, if not most, division-related mutations are eventually dealt with. But some slip through the cracks. Too much complexity in order to prevent all possible division-related mistakes itself may contribute to problems. It's kind of like government auditing: at what point does the cost of auditing exceed the savings of prevented problems? Biology doesn't "like" cancer and puts up a strong fight against it, including slowed metabolism and replenishment later in an animal's age, which is why Michael Jordon had to eventually retire. In the end, entropy wins. Past a certain point, biology decided that it's best to let natural selection, instead of more self-auditing, "fix" the problem. Billions of species over billions of years couldn't find a work-around for aging, only compromises. I suspect software also has a similar complexity limit where the cost to manage the complexity exceeds the benefit of having high complexity, and alternatives such as dividing an app up are the better bet even if it costs some duplication. -t''

I suggest reading ''The Greatest Show On Earth'' by Dawkins to dispel some myths you hold about evolution. I spotted no less than 2 flat out inaccuracies in your description of evolution by natural selection. "In the end, entropy wins." Yes. The second law of thermodynamics is pretty solid. However, Dawkins makes a great insight (one of the things I loved most when reading the book) that evolution by natural selection is the only known natural process (excluding intelligent design) which increases "information". (This is using not quite the technical definition of information. A counter example is a body radiating energy. It is is decreasing its entropy, and thus it is increasing its information as defined by information theory. Let's just use a slightly more colloquial definition of information in this context.) I strongly disagree with your characterization that entropy wins in the end in this context, that evolution cannot find a way around a problem because "entry always wins in the end", which leads me into my second point. "Past a certain point, biology decided that it's best to let natural selection, instead of more self-auditing, "fix" the problem. Billions of species over billions of years couldn't find a work-around for aging, only compromises." I will forgive the personification of biology and evolution; it's standard prose and we all do it. However, do specifically note that there is no intelligent design behind evolution. No agent decides what's best for the organism. Your observation that they "couldn't find a work-around for aging, only compromises" misses the point of evolution. Evolution will not drive species towards "ageless" or any other Platonic ideal. Instead, evolution drives a species towards the best replicator, and thus it does prove pretty well that ageless is not a good quality for a replicator. It does not prove that ageless is impossible or even impractical. 

''"In the end" refers to the scope of an individual only, not life in general.''
* Your quote was "In the end, entropy wins. Past a certain point, biology decided that it's best to let natural selection, instead of more self-auditing, 'fix' the problem." Admittingly, it wasn't the most clear sentence, but you seem to be tying "entropy" to "natural selection", and natural selection works on the population level, not the individual level, so you can hopefully understand why I was confused. I think I now see that you were trying to say that eventually the entropy of a single organism is too much for "nature" to handle, and decides it's best to let the individual die in favor of the next generation. 
* ''Let me see if I can restate it. Evolution happens on many levels, not just the population. For discussion purposes I will focus on genes here. Genes that survive (in the population) will tend to be those that give an individual the greatest reproductive advantage (RA). An individual that lives a very long time seems like it would have the best RA. However, there are other factors that could come into play. The first is possible overhead in resources to prevent mutations at the organ level as organs refresh themselves, and the second is the "alpha male issue" (AMI). I'm not an expert in cell biology, but suspect there is a metabolic cost to DNA copy error protection. If the cost is relatively low, then it seems that it would be worth it because a long life would make up for a slightly lower RA rate for a given year.''
* ''But, AMI complicates this. If only the alpha male is able to reproduce, then a slight disadvantage caused by copy error prevention overhead would be greatly magnified. Being Number Two all the time is not good enough. You would almost always be locked out. It's best to be a reproductive rock-star: burn bright, burn fast, die young. This is one reason that human females live longer than males I believe.''
* ''Similar to AMI is predator/prey competition. If the lion is slightly faster than you, then you are dinner. A small edge can be magnified due to this. Being 2% slower due to having error correction hardware may increase your chances of becoming a lion's dinner by say 20% for each encounter. And, a similar thing on the lion's side: if you are slightly too slow to catch dinner, you starve. Similar to AMI, "almost" doesn't cut it.''

''As far as the intelligent-design-like statements, I was using a colloquial style. (See WhatIsIntent if you wish to toy with the idea that evolution has "intent".) As far as "ageless", a longer surviving adult would potentially have more opportunity to mate and reproduce. Thus, all else being equal, evolution should favor a longer life. I'm trying to illustrate that evolution generally selects a path that seems to favor an agressive prime over long life.''

It varies with species.  Mayflies are an excellent example of your thesis - one day full of sex, and poof!  However, many reptiles continue to breed throughout their life (post sexual maturity), which contradicts your thesis.  It probably depends on the amount of resources in the current environment - a world with a steady surplus means that the next generations will not be limited by their parents continuing to breed, while a resource-poor environment might favor parents dying after breeding.

''Of course there are exceptions to the rule. But also note that long-living reptiles also tend to have very slow metabolism.''

Of course.  Evolution tends to find local maxima, as opposed to global ones.  Some species are a solution to the problem in one fashion, and others in a different fashion.

''As a simple example, perhaps it requires less complex DNA to have an individual mature to sexual maturity quickly, but as a side effect cause the individual to die of old age (local vs global maxima.) Another example might be that longer living individuals slow down the rate of evolution. If your great x20 grandfather was still alive, then your population is changing at a much slower rate than that if your ancestors are already dead, and populations which evolve quicker tend to be better replicators. Perhaps there are other factors in the optimization problem which we don't know. So my original point stands based on my original two arguments: 1- Evolution favors local not global maxima, so the lack of ageless individuals does not prove that it would be difficult to engineer such a thing. 2- Evolution favors the better replicator, and I have yet to be convinced that ageless replicators are better. If anything, I have made a much more compelling argument in this paragraph which argues that ageless replicators would be ''worse'', not better.''

Better replicator means more offspring, period.  The factors affecting that are a) onset of sexual maturity, b) length of reproductive span, and c) success of offspring.  "Rate of evolution" is meaningless - if the individuals continue to live, they are fit, and their genes are good, and therefore will persist in the population.  We know that most animals live longer in captivity, and breed for longer as well, which shows that the genes for long life are not selected '''against''', just less visible in the wild.

''First, you are ignoring many other selection pressures which aren't "onset of sexual maturity, length of reproductive span, and success of offspring". The biggest glaring missing thing is sexual selection, aka being attractive for a mate. There's also in-group selection bias, which depending on who you talk to matters / exists. That is helping out closely related family members increases the chances of your genes being passed on, otherwise known as a potential source of kindness, compassion, teamwork, and self sacrifice. (Optionally we can use Game Theory to explain some of that, but not self sacrifice.) I think I might be missing a couple other selective pressures, but I think I've made my point that it's not as simple as you make it out to be. Also, what do you mean that genes for long life are not selected against in the wild when animals live longer after a couple generations in captivity? This seems like a contradiction in a single sentence. If they don't manifest in the wild, but we can breed for them, then nature does select against these traits, at least to the degree that they're not as strong as they are under selective breeding. Selecting against doesn't mean that the trait is bad as a Platonic bad. It means in this particular case, that trait is too expensive in the optimization equation of evolution, and it is not chosen. A classic example is why do some animals have renewing teeth and some don't? Surely renewing teeth are always better? No, apparently they are not. Non-renewing teeth may be a local maxima problem, or, more likely, it's an expense problem. Having renewing teeth requires more calcium, for example, which may be better spent elsewhere, or maybe it would up the food intake requirements making that individual actually less successful. Either way it is a complex thing that one cannot simply armchair reason about. Finally, your pithy attack "rate of evolution is meaningless" is both ignorant and lacking content. You state your position as fact, as though this defeats my argument without even addressing it, and you do not address it. There's a reason sexual selection is a bit more common in the "higher order" organisms than asexual cloning. Sexual reproduction leads to sexual selection, which is almost always a negative selection pressure for the fitness of the species. It's also much less efficient than simple asexual cloning. Why do so many animals use sexual reproduction then? The increased rate of mixing of genes, aka the higher rate of evolution among the species which reproduce via sexual selection, allows them to evolve at a faster rate, and thus be more fit and successful.''

[Let's not forget that having more offspring isn't necessarily better in the first place.  Overpopulation can lead to a catastrophic drop in population.]

''Again, this is a misunderstanding of evolution. Evolution by natural selection does not 'care' about Platonic or utilitarian ideals. It doesn't 'care' if the population dies. If some behavior or quality is self destructive to the species, but beneficial to an individual, then evolution will favor that quality, up until the very last member of the species dies. There is no such thing as a prudent predator. Lions and tigers and bears, oh my, will hunt themselves to extinction if allowed. If a better hunting lion came along, and it killed more prey, then evolution would favor it, resulting in less prey, until eventually all the remaining lions would starve. Luckily, evolution also tends to make the prey more resilient as well, and generally in large enough systems the number of prey and predators goes round in a cyclic relationship, so those better hunting lions become less in number because less food stocks are available, until the food stocks rebound in X years. Aka the large system can handle the small shocks.''

* [What have I misunderstood?  I didn't say natural selection 'cared', so that can't be it.  If some behavior or quality is so self destructive to the species that it results in its extinction (even if it's only carried by a subset of the population), then evolution selects against it via the extinction of its species.  That agrees with what I said, so that can't be it.  I certainly attempt to be a prudent predator, by making my food choices as sustainable as my budget allows, so while we disagree, that's hardly a misunderstanding about evolution.  Lions, tigers, and bears, like most predators, don't go around killing everything in sight.  While I'm certain they don't consciously make food choices based on projected supplies, evolution has resulted in a system where the choices they do make don't result in their extinction even in the absence of external restraints.  And finally, the cycles you refer to are not generally the result of "shocks" to the system.  This might be a disagreement about what a "shock" is, since I consider shocks to be either something like the catastrophic change in population I mentioned earlier, or a sudden external change in the environment.  BTW, what happened to my question about Dawkin's insight?]

* ''And I say again, no. Evolution will not favor the predators which happen to be prudent, whether they are prudent by instinct or choice. This is a misunderstanding of evolution. Evolution will favor the predators which replicate the best individually, which generally precludes being prudent. If in a system where the usual predator-prey cycles can not happen, and an equilibrium cannot be established, then the predators will hunt themselves to extinction. To borrow an example from Dawkins, let's imagine the forest of friendship. In the real world, trees are usually constrained by available light. Trees grow high to get the available light, shutting out other trees. However, growing high has its own costs, like keeping the trunk alive, the costs of having a strong enough trunk (and this cost increases with height), etc. Thus, forests tend to reach an equilibrium in height where any tree growing taller pays a higher marginal cost than the marginal benefit it would receive of more sunlight. The forest of friendship is the hypothetical example where suppose all the trees happened to be prudent. Each agreed to only grow 10 feet tall. Thus everyone is better off. However, as soon as we had one aberrant mutation, one tree which broke this covenant, it would reap enormous benefits, and it would reproduce much better, until eventually this kind of tree would dominate. The same reasoning and logic applies to dispel the myth of prudent hunters. Suppose we had a bunch of lions which agreed, whether explicitly or by instinct, to hunt prudently. As soon as one aberrant lion came along which broke this trust, it would get more food, and replicate better, until only its progeny remained, possibly hunting themselves to extinction. The prudent lions would not survive because either A- the selfish lions would simply out-compete them, or B- the selfish lions would hunt the food supply to exhaustion, thereby starving all the lions. Again, evolution by natural selection selects on the level of the individual (or more correctly the gene for sexual reproduction), not the group nor species.''

* ''Also, what happened to the question of Dawkins insight? It was a misquote by me, so I fixed up my post with the correction and disclaimer, and removed the now irrelevant question. Was I being a bad WikiGnome?''

''I suspect this won't be settled until sufficient simulations are done to determine the effect of individual-level entry on evolution. For example, allow test subjects to have cell-level metabolism happen with more error correction that has a metabolic cost, without a metabolic cost, and levels in-between. (Some bacteria have extra error-correcting mechanisms, but they are not free.) It would be nice to know if evolution "lets" entry happen for reasons such as population thinning, or because it has too high a metabolic cost, such as sufficiently reducing the chance of being the alpha male. There are a lot of factors involved such that it's hard to know what is over-powering what. Tinkering with various factors and seeing the results is thus in order. -t''

Maybe it is just a a too complex problem...even for something as capable of slowly creating amazingly complex systems  as evolution is... maybe the problem is the explosion of communication paths... imagine a biological body as a team... the formula for communication paths is that if you have n people on your team, there are (n^2-n)/2 connections... if we take each cell in our body as a team member... the number of communication paths is huge! too huge to deal with, so evolution is a attacking it the best way we know how.. by creating specialized teams that know a lot about how a task is done, but do not really what the whole body is doing (your liver can not do what your brain does). But, while that specialization and distribution of works makes it possible to create amazingly complex (and for a time apparently perfectly working) systems, it also means that communication is compromised, and some parts of the body are just unable to prevent others (and themselves) from making small mistakes, that eventually sum up and end up destroying all the organism... --LuxSpes

''"Don't make copy errors" is not necessarily a system-wide problem or cause. What does the spleen have to do with copy errors in the big toe? A summary of possible reasons for the "allowance" of copy errors is as follows:''

* Too expensive to prevent nearly all copy errors - would render the organism to unable to compete for mates, food, and evade predators who take the shortcut of less copy protection

* Evolution hasn't "learned" how - doubtful because some bacteria have found sophisticated error correction mechanisms.

* Some other ecological or genetic phenomenon, such as saving resources for the newer generation in order to speed evolution

* Some combination of the above

''I'm not really sure where this conversation is going. It seems the last couple of posts are sort of interjections, each marginally related to the previous post. However, I will add to the previous poster that evolution is "The non-random survival of randomly varying replicators.". If the replicators do not vary, aka if the replicators do not have copy errors, then there is no evolution. (However, even the most "perfect" copy scheme will have occasional errors, so practically speaking any replicator will evolve.) "Copy errors" as you put it is required for evolution. Of course, too many copy errors would be selected against as it would not be a fit replicator, but evolution does favors a rate higher than 0, aka evolution favors an appreciable rate of evolution as that will lead to more fit individuals.''

So maybe death (failure) of individuals is the price for evolution? If a replication archieves perfect copying, it stops evolving, and its outcompeted by others, and so, the advantages of never getting cancer are negated bye the fact that is now unable to evolve?

[Can we maybe talk about fault tolerance, please?]
-------
See: FailFast, FaultTolerance, FailureIsInevitable, FaultIsolation, PersistentLanguage, GracefulDegradation, GatedCommunity, AssertionsAsDefensiveProgramming

CategoryException