(Based on material in ValueExistenceProofTwo)

(C/I = "Compiler or Interpreter")

I/O Profile is a working term or working concept I have been using which describes an equivalency metric or to '''test equivalency''' of two programs or compilers/interpreters in terms of their '''externally observable behavior''', which generally excludes implementation specifics. I created the topic to factor the description and questions to a single place. Basically, it theoretically checks that all the instances of output match byte-for-byte per corresponding instances of all possible input sources. In practice we use an approximation of "all", which is described later. For compiler/interpreter (C/I) testing, the source code and any system resources available to the "language", such as OS clocks, are usually also considered "input".

    // pseudo-code example IP-1A for interpreters or compilers
    for d = each possible data input file/stream
    . for s = each possible source code input file
    ... x1 = outputFromRun(modelFoo(d, s))  // the target test model or system
    ... x2 = outputFromRun(actualLanguage(d, s)) // the standard or reference model or system
    ... if (x1 != x2) then 
    ..... write("Model Foo failed to match I/O Profile of actual language.")
    ..... stop
    ... end if
    . end for s
    end for d
    write("The I/O Profile of model Foo matches actual language.")

(Dot's to prevent TabMunging.)

In practice we cannot test every possible combination such that '''spot-checking''' (sampling) is done instead. This reflects the practical side of software QA testing in general and science in general in that we cannot practically test every combination of input or make every possible observation of every atomic particle that ever existed. We can approach the ideal asymptote of "full" testing, depending on the level of resources we wish to devote to testing (an economic decision), but we can never actually reach 100% testing. Thus, in practice it's not a "perfect" test, but so far a perfect ''and'' practical alternative remains elusive. All non-trivial interpreters/compilers probably have bugs or "unexpected oddities" because of similar "testing" economic patterns inherent in nature.

Also, it's '''not a test of machine speed or resource usage''', although could be modified for such, such as putting upper limits on execution time or resources as considered "equivalent" per target usage.

Further, pass/fail doesn't have to be Boolean. It could be modified to produce an "equivalency score" based on say the number of failed instances of I/O sets and/or the number of differences in the output streams. But what's considered GoodEnough depends on intended use. For example, two compilers for the "same" language may produce slightly different values for floating point results for some computations, but a language user may otherwise not consider this sufficient to say they are not the "same" language.

Error messages are often different per brand or major version, which can complicate testing. To simplify testing, generally the existence of an error message can be consider equivalent regardless of the content of the messages. Thus, if vendor A's interpreter triggers an error for input set X and vendor B's interpreter also produces an error message for input set X (and matches any output before the "crash"), then we would consider that test instance "equivalent". (Most of the common languages have unfriendly error descriptions anyhow. The quality of error text could be considered a "soft criteria".)

--top (AKA TopMind)

--------
'''Diagram of Model Testing Parts''' for Interpreters/Compilers

  [1. Data (bytes), optional]
    |
    |
    v
  [2. Model Foo] --------> [3. Output (bytes)]
    ^
    |
    |
  [4. Source code (bytes)]

----------------
'''Discussion'''

{Note that d and s are infinite sets, which precludes using the above for any practical purpose.}

The "spot-checking" paragraph covers that. It is indeed practical and indeed is the very way production software is often QA'd. Spot checking is a large part of what professional testers do, for a ''practical'' living. Perhaps what you meant is that "testing every case is not practical". While true, '''we don't have to test every case to get a useful result''' from such a comparison tool. In practice we often live with an approximation. The alternatives have not proven themselves in practice for non-trivial tools.

Ponder this. You are a professional IBM compiler tester and IBM is releasing their version of C-sharp. What's the best way to verify the IBM version of C-sharp "runs properly" in terms of running existing source code written for MS's C-sharp compiler?

{Your claim is that your I/O Profile can test language equivalence.  It can't, not as you've written it, because your description defines d and s as infinite sets, such that the supposed "I/O Profile" cannot exist.  In practice, verifying a compiler is done using standard ValidationSuite''''''s -- e.g., http://www.plumhall.com/ZVS031_mp.pdf -- and by running all your existing code on it and re-running all the UnitTests if you've got them.  That still doesn't prove, or even demonstrate, equivalency between compilers.  At best, it merely demonstrates that your new compiler probably works on some of your existing code.  (In practice, you chase obscure bugs for years; I speak from grim experience.)}

In practice we use an approximation of the ideal "I/O Profile", perhaps '''using unit tests and existing production applications from various "old" C/I users'''. I'll try to rework the wording a bit [in progress]. And while unit tests are helpful and useful, if you are trying to match another ''existing'' compiler/interpreter, such tests should '''also''' be ran against the original C/I, creating the very kind of code samples and test setup that I/O Profile typically uses.

{What, in practice, do we use "an approximation of the ideal 'I/O Profile'" for?}

That was already explained. If it's not clear to you, I have insufficient info about where your head interpreted it different than my head such as to correct such mistakes. I've reviewed it and adding a few more specifics. I hope that helps.

{As for running unit tests "against the original C/I [compiler / interpreter?]", I don't know what that would demonstrate other than what we already know: All the tests pass. The "I/O Profile", therefore, would consist only of "true, true, true, true, true, true, true, true, true, true, true, true ..."}

How do you know the existing behavior of another C/I until you test? Please don't tell me you trust your personal interpretation of the manual enough that you skip empirical testing. That would be silly.

{So far, your "I/O Profile" seems to be nothing more than some source code recognised by a compiler or interpreter, along with some input and output of the resulting programs.  I don't know what we're intended to do with that, other than it sounds like a rather obfuscated way of saying, "Here are a bunch of programming examples for language <x>."  Examples are obviously useful, but I don't know that calling them an "I/O Profile" helps anyone.}

No, it's not about libraries of examples. I don't know where you got that idea from. It's comparing two or more compilers/interpreters to ensure they work the same (produce the same results given the same input and source instances). If you can recommend a different or better approach to compare two or more compilers/interpreters to ensure they "work the same", please enlighten us.

{Apparently, you pass the "I/O Profile" through two or more compilers/interpreters as per your "Diagram of Model Testing Parts for Interpreters/Compilers", and if they exhibit identical behaviour, you consider them equivalent.  Is that not the case?  If so, the "I/O Profile" itself is obviously a collection of examples.  It's a language test suite, like that at http://www.plumhall.com/ZVS031_mp.pdf }

Collections of samples don't test themselves. It's a '''process''', not ''merely'' a library of files. And the code "samples" may be actual application code bases, not just testing snippets. Input instances/samples are indeed part of the "system", but it's NOT just a static library of files. Thus, "a bunch of programming examples for language <x>" is '''misleading'''. Stop writing misleading stuff. 

''How is it misleading?  That's precisely what it appears to be -- a library of files.''

* That's like saying "science ''is'' a series of tests". The tests are NOT pre-defined, that is unless you have a time machine and can visit the future (and change birth announcements in Hawaii newspapers).

* ''If the tests are NOT pre-defined, when, where, and how are they defined?''

* As-needed. As in science, we don't have the resources to explore everything everywhere, so we use a combination of prior test results and interest (practical and/or intellectual) to determine what to explore next or explore further. For example, if your shop doesn't use a lot of floating point calculations in its programs, then you may not spend a lot of time analyzing or fine-tuning the modelling of floating point of a language. However, a CADD shop may use them heavily and thus focus more resources on that area.

* ''Then how does "I/O Profile" differ from the usual reductionist approach to debugging, i.e., using I/O to identify behaviour (e.g., "it outputs a 3 and I expect a 4") and subdividing the system to narrow in on the area of interest (e.g., "it must be where the numbers are added")?  Of course, for a reductionist debugging approach to be useful, it requires knowledge of language semantics.  If you're suggesting that IoProfile disregards semantics -- which it appears you've done elsewhere -- then one of the inputs to an IoProfile, the source code, is effectively treated as an opaque string of symbols.  If that's all the source code becomes, then all the IoProfile lets us do is determine that one program may or may not be the same program as another, and as has been noted elsewhere, all it takes is a minor change in the name of an identifier -- or even in formatting -- to cause the IoProfile to regard two programs as different despite being semantically identical.  How, then, is the IoProfile meant to be useful?''

* Semantics may indeed be used for testing implementations and/or models. However, that's a different process. IoProfile is a verifying technique, not an idea-generating technique. It may tell you that two models are different, but not "why". That's where human intelligence typically comes in.

* ''How do models enter into this, and what is it that is modelled? I thought the IoProfile was for comparing language snippets or implementations.  From the top of the page: isn't it "an equivalency metric or to '''test equivalency''' of two programs or compilers/interpreters"?'' 

* Models can be seen as a form of implementation, but often for study purposes instead of production. For example, they may skip efficiency- and speed-related constructs to simplify the system, or make it more "human-friendly". One can use IoProfile to verify a model closely matches an actual interpreter/compiler.

* ''But if a model is implemented, it's not a model; it's an implementation.''

* It's possible to be both, it seems to me, but that gets into deeper vocabulary philosophy.

* ''Are you sure you're not conflating "model" with "reference implementation"?  It seems reasonable to use IoProfile to verify that a new implementation is equivalent to a reference implementation, but that's usually known as testing with a ValidationSuite.  I'm not sure how IoProfile could be used to verify that an implementation meets some abstract model without the abstract model being implemented as a concrete implementation of the same language as the language being tested, at which point it's hard to say whether it's an implementation of a particular abstract model or not.''

* There are multiple reasons to create and use a model, which are probably beyond the scope of this topic.

* ''Using and creating models is reasonable.  I'm still not clear how that relates to your IoProfile.  How, for example, would you use your IoProfile with a conceptual model?  I can see how you use it with implementations, but then how do you verify that an implementation is an accurate reflection of a conceptual model?''

* Comparing mental or conceptual models by themselves is probably beyond the intended scope of IoProfile.

* ''I see.  Then the IoProfile is essentially a way to verify that one language implementation works the same as another implementation?  To, for example, verify that the gcc compiler works the same as VisualStudio, or that OpenJDK works the same as Oracle Java?''

* No, it's a concept, not a tool.

* ''It's a concept that demonstrates or explains what, exactly?''

* It describes measuring a tool by its '''interface''' rather than its implementation. If you have a better way to describe measuring interfaces, I'd like to see it. Some have proposed comparing via "semantics notation", but it appears to be a poorly documented and/or complicated technique.

* ''What does measuring a tool by its interface rather than its implementation tell you?  How do you measure a tool by its interface using IoProfile if there's only one implementation?''

* You mean such as, "Does the I/O act like I expect it to?" Without some context or situation, that's hard for me to answer.

* ''No, that's not what I meant.  I'm trying to understand what IoProfile, as a concept, is intended to do or what purpose it serves.''

* I thought I already gave such. I have no new words to add at this time. I'll ponder a different way of explaining it. How about "the behavior of a computer program in terms of input and output".

And note one can use the process to test applications for "equivalency" or at least "approximate equivalency", not just compilers and interpreters. But keep in mind it's being presented as a concept at this point, not as a How-To for testing. Certain people kept asking, "what exactly do you mean by I/O Profile?", so I created this topic to hopefully better explain it.

----
'''What is I/O?'''

There can be disputes or fuzzy areas in terms of what is considered input and output. Generally file I/O, such as used by READ and WRITE statements is not a source of dispute, and often easiest to test. However, binary transfers and debugging I/O may be subject to controversy or be situationally of importance. For example, vender A's version of an interpreter may internally store numeric bytes high-to-low (left-to-right), while vendor B's interpreter does the reverse. If say an ODBC driver grabbed the bytes as-is from RAM, then vender B's interpreter will have a different I/O profile than A's if we compare assuming that kind of direct access is part of I/O. (And the ODBC drivers won't work as expected in one.) -t

Generally in comparing tools the intended use and environment of the tool matters. For example, you are in a hardware store and see two hammer brands, A and B. They are the same weight and price. But, you notice this on one of the tags: "Not designed for use on Mars". (Suppose the store is near a NASA building such that those things matter to some.) Since you never expect to go to Mars, that difference is not a difference to you. For ''your'' intended use, hammer's A and B are "equivalent".

-------
'''Byte-for-Byte and Literalness''' 

(Based on material from TypeDefinitionsSmellBadly)

[In the IoProfile page, you state that the comparison are byte-for-byte. Is it really typical for the developers you know to do byte-for-byte comparison on compiler output? Is it really typical for the developers you know to use anything that could reasonably called an "approximation to all" inputs when trying out a new compiler?]

"Byte-for-byte" is the ideal, but [typical shops] are using an approximation of IoProfile. They may merely eyeball screens to make sure they "look right" as an ''approximation'' of "byte for byte". Generally, if it doesn't match byte-for-byte, problems that matter are discovered, with the possible exception of slight floating-point-related differences for non-monetary apps. "Does X match Y within shop-acceptable tolerances" may be another way to state it. The algorithm I gave is an illustration simplified for easy reader digestion, but not necessarily intended to be taken 100% literally. It can be "refined" per shop standards/expectations. For example: 

Before:
  if (x1 != x2) then  // per original algorithm at top of page
After:
  if (! similarEnoughForOurShop(x1, x2)) then
In practice, a human will probably be making the final judgement call. Automated tests may perhaps identify differences to be investigated, but human judgement will analyze it to see if it matters or not to that shop. For example, if report X under compiler A displays a ratio as 1.38372634, but report X under compiler B displays 1.38372635 (last digit different), then a given shop may say, "that difference doesn't matter to us". (The displayed value should probably have been rounded to 4 or so decimal places, but that's an application design issue.) 

I've done similar for interpreter "rewrites" from vendors to check that they ran our shop apps, and this included checking the Web, trade mags, and bulletin boards (some of this was before Web) for related complaints and opinions from others. Thus, '''one is not just using shop apps to judge the matching accuracy'''. For example, dBASE rewrote their interpreter foundation between version III+ and IV from assembler to C, and ColdFusion rewrote their interpreter foundation from C++ to Java around 2000. Based on in-shop tests and outside opinions, early versions of the rewrites were buggy, but they eventually got pretty close to matching the original. Minor app-code tweaks had to be made for some apps, but nobody would consider such to be sufficient grounds for calling them "different languages" in a general sense. (In one case, a binary-level plug-in didn't work on the newer version, but a replacement was found. See "What is I/O" above for a note about binary-level integration compatibly.)

And typically shops do a smaller-scale version of the same thing for minor version interpreter/compiler upgrades: test your apps against the new version by comparing output (expected per gray-matter memory and/or as documented). -t

''Your IoProfile, as described above, requires "each possible data input file/stream" and "each possible source code input file".  These are infinite.  Thus, running a validation suite -- which is necessarily finite -- on your interpreters or compilers is, '''by your definition''', not an approximation of an IoProfile.  Any finite approximation of infinity (which is itself a contradiction -- more accurate would be to describe "any finite approximation of infinity" as "any finite quantity") is infinitely less than infinity.  Anyway, use of validation suites is nothing new; why describe it as an "I/O Profile"?''

When I use IoProfile, I generally mean a "'''practical approximation''' of IoProfile". As far as finding a better name, let's propose some and evaluate the merit of each. I personally think IoProfile is a decent and nicely-compact description of the concept, but WetWare varies. I'd like feedback from about 3 other WikiZens before changing it. "Validation suite" is too open-ended, and I'm not necessarily referring to a "suite". It could be eyeballing screens by somebody familiar with an app for the rougher approximations such that it's not about a particular software product or even a general category of a software products. It's a concept, not software.

* I have such trouble keeping up. I never got close to grokking the io profile thing. Validation suites, maybe I grokked. But now neither is what it was - now each is just an approximation of itself. No wonder I have trouble. -- ChaunceyGardiner

* Let me ask you this: how can one verify that two implementations (production and/or models) can be used interchangeably (performance aside)? -t

** Testing?

** [Which involves?...]

** A plan and tests.

** [Uh, I still find it non-specific]

** Well in the interest of balance, I find the io profile non-specific. I would like a tool that could do cross-language comparison of behaviour in a manner which analyzed the source somehow. If that existed perhaps we could also use it to convert all that assembler code in all those libraries into LispLanguage. That would achieve a lot - programmers could talk one language, cpus could be built to run Lisp only, ....... 

* ''Use a ValidationSuite.''

* Below are further possibilities. I'm not sure how to better explain it at this point. It's kind of an idealistic or abstract concept, the same way a geometric plane is. There are specific techniques to measure whether points are on a given plane, but one must be careful not to mistake specific techniques for the concept of "plane" itself.

--------
'''In Practice''' these are often used together to verify IoProfile fits expectations:
* A formal test suite (ValidationSuite)
* Running existing shop code (does implementation X' behave the same as X)
* Word of mouth (the "grapevine") from other practitioners and shops. This is essentially informally sampling others' IoProfile tests.

But keep in mind IoProfile is a concept, not a direct tool, such that this list may not be exhaustive.
------
See also: ScienceAndTools
----
CategoryMetrics

JanuaryFourteen