Can science help us select among multiple tools/methodologies?

Science used for understanding and predicting the physical world is for the most part uncontroversial among WikiZens. You observe nature, form hypotheses/models to explain or predict nature, test these models against observations related to the differences in the models, and the best-fitting model is labelled as such. If multiple models match roughly equal, then the simplest model gets preference. If all the candidates are poor, then keep looking for new or reworked hypotheses.

But on this wiki we are often talking about tools, not nature. The question is '''how can science determine the best tool'''? What exactly the model is supposed to best "fit" is contentious, especially for software. Some want to view it like math and find the most "elegant" solution, perhaps using some kind of parsimony metric. Others lean toward improving the productivity of the "user" of the tool, which is typically a maintenance programmer. But "productivity" is tricky to measure and often gets into economics, organizational dynamics, and WetWare.

As a starting point, I propose this as the "scientific method" for testing tools:

* 1. Describe the factors being optimized for
* 2. Rank the factors by importance, carefully describing the reasoning for such importance rankings.
* 3. Perform the measurements of the factors
* 4. Show the results of the measurements and weigh them against the importance rankings in 2.
* 5. The tool with the best score is considered the "best known tool".

As usual, the devil's in the details.

--top

''The set of rankings in Item 2 is inevitably based on -- and biased by -- individual perspective.  An HR director may consider human resource costs to be the most important, whilst an engineer may consider reliability to be the most important.  An accountant may consider return on investment to be the most important, whilst a medical doctor may consider health and safety to be the most important.  And so on.  As such, you've defined "best tool" by an individual set of rankings.  In other words, if I do Step 2, it only qualifies what '''I''' think is the best tool.  How do you determine the "best tool" across '''all''' the -- potentially highly variable -- possible rankings?  In short, given a set of individual Step 2 rankings, who gets the most weight -- the HR director, the engineer, the accountant, or the medical doctor -- and how do you decide?''

Ultimately the owner of the "shop" that owns the software has the final say. (Software user satisfaction is sometimes used instead.) Ideally we'd spend a lot of time interviewing everybody and have many meetings to determine and negotiate the factors and weightings among ''all the parties'' you listed, with the owner(s) making the final decisions. '''In practice''' such an extensive interaction process is NOT going to happen, and thus software design analysts will have to make their best guess about many owner preferences on based on experience and indirect clues.

I will describe what I believe the owner would care about if they had the time to investigate, based on my experience in the industry about what they like and don't like. True, my statements of observation will probably differ from other WikiZens, as we have seen. That's fine: all "sides" give their opinion about what trade-off profile the owner will likely/typically want, and then we can LetTheReaderDecide whose opinion and analysis they value most or best fit the style of their own organization. No need to make it into something personal. We can still create "models" where the input parameters (weightings) are supplied by readers. Thus, an incomplete model may still be of use.

But some try to steer the discussion away from owner preference and suggest ArgumentByElegance should play a bigger role since estimating the owner preferences is difficult. But that's SovietShoeFactoryPrinciple in play: just because, say parsimony (code size), is easier to measure than most factors doesn't mean it's the ''most'' important factor. Unfortunately, the importance of factors and their ease of measurement appear unrelated. That's just the ugly truth of complex tool measurements in our universe and God/Mother-Nature is not going to make it easier for us. -t

''How does determining "what ... the owner would care about if they had the time to investigate" differ from the question I identified above: Who gets the most weight -- the HR director, the engineer, the accountant, or the medical doctor -- and how do you decide?''

I don't understand your question. I suppose EverythingIsRelative, but in a mostly-capitalistic system, it's typical to use '''the people who pay you''' as the reference point. After all, if you don't give them what you want, you (eventually) won't get paid. That's the working assumption I am going to use. If you want to optimize it for the gardener of the building, or whoever, that's your prerogative, but few are going to pay attention to the results.

''I agree, if you don't give the GoldOwner what he/she wants, you won't get paid.  However, that doesn't mean the GoldOwner has any idea whose views amongst his/her employees -- the HR director, the engineer, the accountant, or the medical doctor -- should get the most weight. The people who pay you aren't smarter than you, and often they're relying on your expertise -- in software and IT -- that they do not possess.  As the hired expert, I've been asked for my recommendations, advice, or even asked to make the decision on behalf of the GoldOwner.  What he/she usually wants is the greatest success with the least risk, but determining that is usually a matter of deciding whose (presumably) expert views amongst the team gets the most weight.''

''What you call ArgumentByElegance, by the way, is generally shorthand for aspects of software and systems -- reliability, provability, composability, security, reusability, flexibility, readability and so on -- that directly impact factors important to business, like time-to-market, maintenance costs, customer satisfaction, product quality, etc.''

Yes, they are factors; nobody's disputing that. But we are balancing them against each other and against time and costs. Related: IncompatibleGoals

''They are important factors.  Sometimes, you appear to deprecate them.''

Sometimes you appear to artificially inflate their value.

''Sometimes you appear to disregard their genuine value.''

Get better and clear evidence for "genuine value". Your evidence is often round-about and indirect; the kind Rube Goldberg would use if he were a scientist. Or, detached from economics and/or reality in an AssumeCanOpener like fashion.

''Clear evidence has frequently been provided for "genuine value".  Mostly, you quibble about it, or introduce irrelevancies like the ill-defined ad-hoc use of qualitative-opinion-with-quantitative-badges that you like to call "economics".''

Demonstrate they are "irrelevant". Difficult to measure, yes, but being difficult to measure is not the same as being irrelevant. SovietShoeFactoryPrinciple. For example, symbolic parsimony is relatively easy to measure, but that doesn't mean we should always favor symbolic parsimony over other factors, such as grokkability even though grokkability by developers is an important factor by most accounts. It appears you often favor the "math view" over grokkability JUST because grokkability is difficult to analyze as math, and are thus committing the mistake described in SovietShoeFactoryPrinciple. If you prove grokkability doesn't matter, I will ignore it. If you prove economics doesn't matter, I will ignore it. '''You indeed have made a good case that they are difficult to measure, but NOT that they are irrelevant factors''' (or even minor factors). I don't dispute that grokkability is difficult to measure such that pointing that out over and over, such as above, doesn't add anything new to these discussions. That's not the point of contention, yet you keep repeating those complaints. That's annoying and textually wasteful. Please stop. --top

PageAnchor grokkability-01

''Until there is some definitive indication that your economic concerns are relevant, they are mere water-cooler talk, and mainly a statement of personal preference that you use to deprecate certain programming paradigms.  Of what possible relevance is your personal preference?''

What ARE the relevant concerns then? DontComplainWithoutAlternatives. You are just as obligated as me to show that your favorite factors are relevant. '''There are no defaults'''. And I don't know what you are talking about regarding "personal preference". You appear to be cross-confusing subjects. Grokkability is not the same as preference. (Although, preference can be a short-hand ProxyFactor for grokkability when direct measurement is not feasible.) -t

''The relevant concerns are reliability, robustness, optimisability, durability, isolation, flexibility, composability, expressiveness, parsimony, provability, performance, resource consumption, extensibility, coupling, cohesion, etc.''

Okay, but how does one '''measure''' those in a practical way and with practical meaning? For example, all else being equal, one wants "provability" over non-provability, but it must be weighed against other factors. If obtaining provability affects other factors, such as cost (economics), then it's not helping the net goal of the person/org that wants the software. Provability by itself is not something the end-user is concerned with because the existence of provability by itself does not deliver characteristics the end-user can see. It may increase the quality of the software (fitting requirements) but is not the only way, and may hurt other factors in the process.

''Defining metrics is typically done on a study-by-study basis, because there are no absolute metrics for reliability, durability, etc.  However, in an individual study these can be precisely defined.  For example, a casual study some years ago compared the durability of a number of commercial DBMSs by running a SQL script to insert a batch of data and then pulling the plug on the machine.  They were then compared by how quickly they returned to normal use (or whether they did -- several didn't) and how much, if any, data was lost.''

''The weighting of various factors, if multiple ones are measured at once, is up to the reader.  Such weightings almost invariably depend on individual requirements.''

And you left out code grokkability from your list, which probably affects about half of your list, at least. Software that cannot be grokked by existing staff is not very "flexible" in practice, for example. '''You appear to want to only analyze symbols instead of human reaction to symbols,''' which is missing the mark because the primary PurposeOfProgramming is to communicate with human beings, not machines nor mathematicians (those are secondary concerns). It should be optimized for the human head, not for symbol-only-sake. '''Symbols and formulas don't maintain software, humans do'''. Optimizing symbols for say parsimony will not necessarily translate to improvement of the step of human communication (maintenance). 

''Grokkability is subjective (e.g., Java seems more grokkable than BrainFuck, but try measuring how much) but perhaps the easiest factor to refine -- if you produce syntax that is difficult to grok, someone will suggest better syntax.''

It's ''not'' subjective, just difficult to measure. And as a reminder, difficulty in measuring is not necessarily related to importance level of the factor. And "suggesting better syntax" is often how these kinds of arguments start: what is "better". Further, grokkability is not just syntax, but also aggregate code construction.

''What's grokkable to me might not be grokkable to you, and vice versa.  That makes it subjective.  Of course, we can test large populations to see whether they consider some software to be grokkable or not and therefore see if, on average, it is perceived to be grokkable.  Similar studies are currently used to determine user friendliness.  However, low grokkability may mean using the software requires expertise.  A significant proportion of the population would find flying a fighter plane to be difficult to grok, but because we need fighter pilots, the lack of grokkability is considered acceptable.  The same applies to software, and as with the other factors, the weighting of grokkability vs the other factors is up to the decision maker.''

We can say that measuring grokkability is objective within a group or group profile. (Note that you shouldn't typically be writing paid software that ''only'' you can grok.) And I do agree there are domains or projects where other factors outweigh grokkability. To keep to a workable or semi-defined scope, I'll consider typical custom in-house business applications, rather than say life support software for NASA missions or even SystemsSoftware where quality trumps speedy grokking. The trade-off profiles for those are different. -t

-----
A common point of contention is that based on my experience, organizations typically value PlugCompatibleInterchangeableEngineers far more than "elegance" and parsimony. "Excess" abstraction causes them staffing problems. See GreatLispWar and StaffingEconomicsVersusTheoreticalElegance for more. Some WikiZens claim I'm wrong. I suspect their specialty/background in intricate statistical and optimization-like projects biases their observations about staffing, but perhaps my background does also. That's fine, '''let both viewpoints live'''. -t

''Indeed, I suspect your speciality/background in record-keeping and reporting for non-IT organisations biases your observations about staffing.  We're probably both right -- for the type of organisations with/for whom we've worked and work.''

Perhaps, but roughly 90% of organizations are not directly IT-related; thus, if I am biased because of that, then it affects a relatively small percent. ("Record keeping"?)

''100% of IT products are based directly on ComputerScience, so the issues that you claim "affects a relatively small percent" actually affect everyone.''

Bullshit! The industry selected what it "liked" among academic experiments and discarded others. Computer science '''almost NEVER''' proved software tool A was "better than" tool B, outside of machine performance issues. (Somewhere there is a similar topic.)

''Who said anything about ComputerScience "proving" (science doesn't do that) software tool A was "better than" (define "better") tool B?  That's like using science to prove hammers are better than screwdrivers, or that Stanley screwdrivers are better than Craftsman screwdrivers.  The former doesn't make sense, and the latter nobody cares.  Every industry product is based on ComputerScience whether it admits it (or recognises it) or not, and has fundamental qualities (or not) of reliability, robustness, performance, flexibility, extensibility, etc., as a direct result of its adherence (or not) to the discoveries and principles of ComputerScience.''

* In terms of machine performance, I agree. But not in terms of ''software'' maintenance. If you disagree, please provide a good example.

* ''"Software maintenance" has nothing to do with ComputerScience.  It is an aspect of SoftwareEngineering.  Search scholar.google.com for "software maintenance" to find about 80,000 citations.''

* The topic is "science", not necessarily "computer science".

* ''That's fine, but my response was to your comment that "'''Computer science''' almost NEVER proved software tool A was "better than" tool B, outside of machine performance issues."''

* Most of the heated debates on this wiki are about whether one tool is "better" than another. If "computer science" is not designed and/or intended to answer that, then it's unhelpful and off-topic. I didn't bring up the topic of "computer science" here. It harkens back to the original questilyingon: '''Can science help us select among multiple tools?''' If yes, how, and what are examples in and outside of IT? If no, what do we do instead to select?

I will agree that academia has been a great engine for idea production, but has done a piss-poor job of providing rigor to compare these ideas in terms of productive (economic) tools.

''You can't really say ComputerScience has done a "piss-poor job of providing rigor to compare these ideas", because it hasn't done it and never will.  It's not what ComputerScience is about.  At best, it's a ProjectManagement concern, or a business concern.  Take university business schools to task for not rigorously studying the business productivity impact of using C# vs Java or HigherOrderFunctions vs not using HigherOrderFunctions; to ComputerScience and SoftwareEngineering they're irrelevant.''

That's why a multi-disciplinary education is important. Those who only study X will see the world only from X's perspective.

''A multi-disciplinary education is an excellent idea, but economics no more belongs in ComputerScience than religion belongs in physics.''

The topic is "science", not necessarily "computer science". (I'm not sure "computer science" has a clear meaning anyhow.)

''Again, search scholar.google.com for "software maintenance" to find about 80,000 citations.''

If that resource has something specific to a given debate topic, then by all means, bring it up. Note that in general I see a dearth of actual '''in-field testing''' of a lot of suggestions or models of maintenance. In other words, there's not a lot of direct road testing of whether technique X actually makes a shop more productive/profitable than X's competitors. That kind of research is expensive and difficult, and simplified proxy studies in place of road-testing should be taken with a grain of salt. A lot of it is also vendors trying to sell you consulting services via buzzwords and catch-phrases. -t

''Did you try searching scholar.google.com?  You might be surprised what you'd find.  A lot of it isn't vendors selling anything.''

* Yes.

''As for comparing productivity of tools, there is a morass of complexities and obstacles to making such research come anywhere near valid, and the results will almost inevitably be weakly conclusive.  Comparing BrainFuck to JavaLanguage would almost certainly produce fairly conclusive results in a productivity study.  Haskell vs C# almost certainly wouldn't.''

Yes, comparing complex tools is usually not easy. That's not news. The real issue is what to do about it on this wiki.

''The best thing to do on this wiki is not do it.  Comparing tools only leads to ReligiousWar.  And let's imagine a series of studies definitively show that Haskell is 32% more productive than Java.  Does that mean we should stop discussing Java because we should all use Haskell?''

* But it's a '''key issue of SoftwareEngineering'''. SoftwareEngineering is about making decisions, and those decisions involve comparing tools/techniques/paradigms. (Note that I include methodologies among "tools", not just languages.) Just because it's a difficult question doesn't mean we shouldn't ask. -t

* ''Indeed, but getting a trustworthy answer is magnificently difficult and expensive.''

* I see no viable alternative such that we are stuck with messy ProxyFactor''''''s for now.

{Haskell and Java are different enough that this isn't a good analogy, but sometimes a tool is better for all purposes than another tool, and so the old tool is abandoned. Does anyone write in ALGOL anymore?}

''Why aren't Haskell and Java comparable -- they're both general purpose development tools?  Was ALGOL abandoned because it wasn't good enough, or because fashion changed?  How do you define "better for all purposes"?''

'' '''Any''' comparison of tools will raise questions like these, and none of them can be answered with logic, and all lead to fruitless debate.  Ultimately, we choose the tools we prefer, or the tools someone else prefers are imposed on us.  Preference is powerful and illogical; even the most potent logical argument in favour of product A will not sway a user to use product B if they like B.  Such arguments are like trying to convince a person who doesn't like broccoli to like broccoli because it's healthy.  How often does that work?''

''Telling someone that they shouldn't use tool A when they like tool A is even worse.  That doesn't work at all.''

''At best, when we find a tool we like, we should tell others that it exists and tell them why we like it.  Maybe they'll like it too, or maybe they won't, but that's all we can do.  If they use tools we don't like, there's no point in even telling them -- they're not going to change their minds.  There are better things to do, and better things to discuss.''

Remember that one is typically writing software for ''other'' maintainers, not oneself. Your own preferences may not migrate to others. -t

--------
See also: FakeIndustryCanon
--------
CategoryScience, CategoryPhilosophy
----
DecemberFourteen