I was reading TheSourceCodeIsTheDesign and LightweightDocumentation and
thought it was strongly related to my LoRD principle. I wrote it up on the
latter page, but decided to move it to its own Wiki page. I've written about
LoRD many times on Usenet and mailing lists over the past 3 years or so, but
I think this may be one of the few times I've written about it publicly in a
more persistent forum.

This is my second attempt at writing this up here on the Wiki. The first
attempt solicited much discussion and comments, particularly from
MichaelFeathers, but also from KielHodges, BetsyHanesPerry, BillTrost,
and RonJeffries. This rewrite is an attempt to incorporate that
discussion, as well as a much earlier one over a year or so ago with
TimOttinger on the Usenet newsgroup news:comp.lang.c++.moderated.

I am greatly indebted to all of these individuals for helping flush out
and clarify important points that were lacking in the previous
discussion, or else not clearly communicated. I don't expect this
rendition to be perfect, but hopefully it is a vast improvement.
Unfortunately, its also much longer (about 3-4 times longer), though
still shorter than the original if you include all the clarifying
comments that were made.

-- BradAppleton

-----

'''The LoRD Principle -- Locality breeds Maintainability'''

	 :	''initial draft: January 1997''

A few years back, I coined a principle that I call ''LoRD'':
LocalityOfReferenceDocumentation. It tries to address the problem of
keeping code and documentation consistent with one another and up-to-date.
The principle may be stated formally as if it were a Newtonian Law of sorts:

	 :	The likelihood of keeping all or part of a software artifact consistent with any corresponding text that describes it, is inversely proportional to the square of the ''cognitive distance'' between them.

A less verbose, less pompous description would be simply: ''Out of sight; out of mind!''

Therefore, it is desirable for us to try and minimize the cognitive
distance between artifacts and their descriptions! This is not
without its presumptions and caveats (which are described later).

Ostensibly, LoRD refers to software documentation (all types of
documentation, not simply design), but its obviously more general than
that. It's not strictly limited to the relationship between docs and
code. Locality of reference (without the 'D' :-) can apply equally to the
relationship of code with comments (though comments are easily viewed
as a form of documentation), use-cases with requirements, requirements
with design, use-cases with test-cases, test-cases with test docs, etc.
(the list goes on and on).  For this discussion however, I'm focusing
more on the relationship of source code artifacts to any kind of
documentation (formal or informal, end-user or design, etc.).

''It should be noted that LoRD does not suggest anything regarding how much documentation should be created'';
This is still left to your discretion. LoRD suggests ''only'' that it
is desirable to keep the documentation as close as possible to the code
(whatever that means, we'll delve further into that a bit later).

It is also important to notice that LoRD does ''not'' refer exclusively,
nor even primarily, to design documentation. It refers to ''all''
documentation, where possible, including end-user documentation
(manuals or user's guides, even if they are only available on-line).
Although it is often applied for design documentation, it is easier to
keep such design documentation minimal, mostly within the code itself
(see below for how this might be done). Many of the more noticeable
applications of LoRD are concerned with documentation of things like
programmatic interfaces and end-user documentation.


'''Cognitive Distance'''

The phrase "out of sight, out of mind" gives a vague indication of what
is meant by "cognitive distance" (there is a similar concept in the world
of user interface design named "visual distance"). It doesn't refer to a
Euclidean notion of distance, but to the amount of conceptual effort
required to recognize that the corresponding "item" exists and needs to
be changed to stay "in sync."
It also spans the effort required to navigate from one to the other in
the developer's workspace. The navigation effort may seem more "manual"
than conceptual, but it is significant because it relates to the
interruption of "flow" of the developers' thoughts between the time
they first thought of what they needed to do, and the time and effort
expended before they were actually able to begin doing it.

As an example, suppose I edit a source file and I now need to update a
corresponding document (or perhaps Ive edited a ".c" source file and
need to update the corresponding ".h" header file). If the other file
isn't in the same directory as the one I just modified, I'm less likely
to remember that I need to modify it. Once I do realize this, suppose
it takes 2-3 commands to leave the editor (or current editor buffer)
and navigate to the file in the other directory, and then open it up
for editing. The effort to enter the commands and perform the
navigation may well interrupt my train of thought, and impact my
productivity.


'''Applying LoRD in Theory - Locality Fantasies'''

How can we keep documentation "cognitively close" to its corresponding
source code? How do I keep source declarations and interface specs
"in sync" with their corresponding definitions and implementations?

Well - if you have a really spiffy development environment, capable of
visually conveying impact or dependency analysis of related items and
information, then it would make it very clear to the user that the docs
needed to be updated after artifact was modified, or the ".h" file
needed to be updated when the ".c" file was modified in a certain way.

What I'd love to see is a ''true'' visual programming environment.
Something that links the ''physical'' design (of files and directories)
with the logical design (e.g. methods, classes, and class categories).
I envision maybe a tool like Rational Rose where I can double click
on an object to see its subparts in more detail. Double click again
on a method name and browse/edit the code for that method (or a
data-member). And I could do the same for the method docs
(or object docs) too with textual notations.

Take it a step further and I can simply "change views" from development to project management and see a diagram of tasks and people and resources and milestones. I might double-click on a completed milestone and perhaps zoom in on the version-label and all the file-revisions that make it up, and from there browse the code; or maybe use a mouse-click to navigate from a revision to the task that changes was made in, and maybe continue on to the corresponding bug-report that prompted the development task. Switch "views" again and I can go from a piece of code to its requirement or test-cases by clicking on some hyperlink. Right now we are being told that XML may well make this all a reality before very long.

At least for the "development view,"
Smalltalk VisualWorks comes very close to attaining this in many
respects, so does Genitor, which is billed as an "Object Construction
Environment" for C++, and someday Java (see http://www.genitor.com).
I've also seen some amazing things programmed into Emacs to come
astoundingly close to such an environment (complete with color-coded
pretty-printing, automatic formatting, and syntax directed editing). 
Of course they still leave some things to be desired. ;-)

If you are using such a development environment, not only does it
require less conscious effort to apply LoRD (because the environment
supports it in many respects), but it can even require less
documentation. By this, I mean not just less "formal" documentation,
but less informal docs and comments. This can happen when the code
itself is clearly written with useful and readable naming conventions
(some of which are mentioned in SimplyUnderstoodCode,
TheSourceCodeIsTheDesign, and various ExtremeProgramming practices).

'''Applying LoRD in Practice - Locality Realities'''

When lacking such a development environment, however, we typically have
to rely upon the proximity of the related artifacts in the organization of
the source tree or project folder. There are several approaches that are
commonly used:

1. ''The document is in the code''

This is the approach taken by javadoc, and by Perl5 with it's POD
mechanism (POD stands for "plain ol documentation"). The document (be
it user-docs, interface-docs, design-docs, or any combination thereof)
is embedded directly in the source code in the form of structured
comments. Then some extraction program or script parses the comments
from the code, and assembles them into a document that is
auto-generated for formal viewing/reading.

Issues of formatting, indexing, collating, presentation order, and
including diagrams and pictures typically need to be addressed here.
Some amount of formatting and indexing information may need to be
included in the source comments (typically in some kind of markup
language). Some kind of template with section ordering and
table-of-contents control is often employed to generate the document
with the right sections from the right files and directory in the right
ordering (note that this presumes there is a known "right" ordering).

2. ''The document '''is''' the code''

With respect to design documentation, this is the approach taken by
TheSourceCodeIsTheDesign (and probably LightweightDocumentation to
some extent). It is also the approach Bertrand Meyer was aiming for
in his design of the Eiffel language and the ISE Eiffel programming
environment, which contains pretty-printing tools, as well as "short"
and 'flat' forms for viewing the code, and its comments (color-coded
no less). Eiffel was intended to look like a readable but formal
software specification language, so the result may often resemble a
formal software specification/design document.

3. ''The code is in the document''

This is DonKnuth's idea of LiterateProgramming. The code is richly
interspersed in the document, which appears ''in presentation order''.
And a code extraction tool separates the code out into appropriate
modules for building (compiling and linking). Much of the same
formatting issues arise here as with case #1 above. But now the
extraction tools have to process and spit-out the code segments, as
well as the documentation. Again, it is presumed there is a single,
known presentation order for all the embedded information. Issues
of collating the document for presentation are now replaced by (or
combined with) issues of collating the code for compilation.

4. ''The code '''is''' the document''

This is the same thing as case #2 -- ''The document '''is''' the code''.

5. ''User Guide as Requirements Spec''

This doesn't refer explicitly to the code per se, but it still refers to
related software artifacts that would otherwise need to stay in sync.
If the two related artifacts are treated as a single artifact that
meets both needs (not just a single artifact with both artifacts
embedded, but separate, but the very same artifact comprises all parts
of both and serves as the lone presentation for both), then the
distance between the artifact and itself is zero; problem solved (or is
it?).

This means information that some might feel inappropriate for a
requirements doc needs to be in the user docs, and vice-versa.
Personally, I have no problem with this, and have seen it work quite
well on numerous occasions. But I recognize that other projects with
other constraints (or certain contractual obligations) cant always get
away with this. If you can, however, it can work quite nicely!


'''Leveraging Locality at Multiple Levels'''

The LoRD principle can be applied at several different levels
of scale and abstraction throughout a project:

''1. Inter-artifact: the directory-tree level''

This applies for portions of documentation or code that, for various
reasons, must go in separate files. Here, you can try to place the code
and its "docs" in the same directory (or at least in the same subtree
of the source tree).  Some people put their C/C++ header files (.h
files) in the same directory as their .c or .cc files for this very
reason: because it makes it easier for them to remember to update one
file when the other has changed. It's right there in the same directory
when they list its contents or view it as a folder containing file
icons.

It also applies for documentation that neither rightfully nor logically
belongs in any one file; this is usually the case with things that
describe higher-level concepts or strategies that encompass the
interactions or relationships between multiple files or components.
Ive seen project that put the design docs for a subsystem (the overview
part at least) in the same subdirectory with the code for the subsystem
itself. The subsystem's code is spread across multiple files, so it may
not make sense to document the collaborations of the subsystem's parts
across the various subcomponent source files (because the document is
describing the workings of the whole - not of its parts).

In a similar manner, I've seen people put a README file in each
subdirectory of the source tree, to serve as a kind of documentation
for the directory itself, explaining its purpose and organizational
principles/criteria.

''2. Intra-artifact: the file level''

This is the LiterateProgramming and JavaDoc-type stuff that Tim
just mentioned.  Docs that are applicable to a single artifact
(a file) can be embedded in the file itself. Often tools like javadoc,
or Perl5 POD introduce the need for some undesirable redundancy or
additional references. This is because the particular "extraction
tool" has limitations regarding whether or not, and how often, it
is capable of treating certain portions of a file as ''both''
document ''and'' code.

If the tool can't recognize cases where portions of the code should be
made part of the documentation, the programmer is forced to repeat
certain redundant information in the docs that is readily apparent from
the code (e.g. parameter names and types). This is an unfortunate
drawback of such approaches when using tools with these limitations. It
needs to be carefully balanced against the amount, and formality of
documentation that the project requires.

Still, despite this drawback, many projects feel they have great success
using these approaches. They may not necessarily use ''all'' of the
facilities or "document sections" that something like javadoc or POD
provides (partly in response to this aforementioned problem), but they
manage to find a suitable subset that meets their needs quite nicely.

''3. Micro-artifact: the code-block/statement level''

This goes a step further than the previously mentioned level. If a
comment corresponding to a snippet of code is not very close to the
code itself, it is likely to become obsolete (which isn't very helpful,
and can even be harmful). The most common occurrence of this is I've 
seen is function-header comments. The function (or method) header
comment is a good place to say certain things about the protocol and/or
logic of the function ''in general'', but some folks often include
somewhat detailed pseudo-code in the function header as well. It almost
always becomes obsolete shortly thereafter because the pseudo-code is
all the way at the top of the function, even when code near the end of
the function changes. Whether or not the pseudo-code should even be
present is a separate matter; for now, lets presume we are talking
about comments that arent redundant with the code.

Comments that correspond to specific parts of the function or method
(instead of the function as a whole) should immediately precede the
corresponding logical block of statements.  Similarly, comments at the
beginning of a class, or an interface, or even a source file can suffer
from the some problem. A certain amount of "high-level" or generally
descriptive text that applies to the class or file as a whole is fine.
But anything that could easily be separated out and placed closer to
the specific declaration, definition, or code-block, should be.

Some of this is what is described in SimplyUnderstoodCode. But other
aspects of it are more like the "inter-artifact" or "directory"
level discussed above. We have a single file, class, or function/method
that encompasses several smaller logical entities: a file or package
contains classes and methods or functions; a class or interface
contains definitions or declarations of several methods and data
members; and a function or method contains multiple logical sections or
blocks of code. So in this sense, the larger-grained "entity" can
be viewed somewhat like a "directory" of the smaller-grained entities,
but at the micro-level (all in the same artifact) rather than at the
mini or macro-level (across multiple artifacts).


'''Maintaining LoRD - The Facts of Refactoring'''

As the project evolves, code needs to be refactored, for any of a
number of reasons: Changes may be made which make it more convenient
to commonize code in other files; Things that seemed to belong in the
same file may later need to be split apart (sometimes files with high
checkout-contention are a good candidate for this); People or roles may
change during the lifetime of the project and artifacts are rearranged to
make a more "navigable living space" for their maintainer; or things
may need to be fused-together or split-apart simply to help decrease the
time required to build a large system.

In short, ''refactoring happens!'' When it does, the corresponding
sections of documentation will have to be appropriately refactored as
well. Code and docs that belong together will be easier to keep
together during the move if they are already close together to begin
with (especially if they are within the same set of lines). But some
docs that went into a single artifact will now refer to things in
several artifacts, and may itself need to be split out into a separate,
"overview-level" artifact of its own. Or some things in an
overview-level document now will need to be moved into a single
artifact (or to a new directory) because the commonality or layer of
abstraction has been factored into its own logical class or subsystem
of the code.

Not all these efforts are necessarily in addition to what would need
to be done anyway when refactoring, but many of them are. And even though
they are done in a way that may still be faithful to the LoRD principle,
the extra code-doc-locality maintenance efforts can be non-trivial.


'''Competing Forces Combatting LoRD'''

Now - having said all of that, there are certain counterforces which
may drive you to put more "distance" between such related artifacts.

''1. Separate responsibility for maintaining docs and code''

It may be that separate groups of people are responsible for
the docs and the code. It may not always be prudent, for example, to
have tech writers and coders checking modules in and out in the same
directory at the same time. This goes double if the docs and code are
in the same file!!! Not only does it create more checkout contention
and/or merging, but the writers have an unprecedented opportunity to
mess with the code, and the coders with the docs.  (or maybe its coders
and SQA, testers, configuration managers, ...).

''2. Increased parallel changes to the documentation''

The likelihood of parallel changes to the docs is increased because it
is distributed across many files which are constantly evolving.  This
not only makes merges of documentation versions more likely, it may be
harder to perform your tools for merging and comparing text files arent
as nice as those offered by your word-processor for merging and
comparing document versions. (OTOH, if your word-processor has no such
merging facilities, it may be easier to merge the docs in text format.)

''3. Increased Build Times''

Checking out a file just to change the docs may cause a ''lot'' of
unnecessary recompilations when rebuilding the system. And in
file-based projects using compiled languages like C or C++,
keeping .c and .h files in the same subtrees may cause very long
directory search paths for include files and libraries. This may
also result in a dramatic slowdown when building the system. Some
may try to have it both ways by keeping headers in a single directory
with (symbolic) links in the source directories, but (sym)links take
up some amount of diskspace and use a slot in the filesystem
registry/tables. This can be non-trivial for systems with tens of
thousand of files (or more). The headers may also make the directory
look more cluttered (though that might be solved with subdirectories).

''4. Document Generation Complexity''

If the docs are distributed throughout the source-tree, just
like the source code, then your Makefiles have the added burden and
complexity of having to construct the documentation as well as the
programs and libraries. Extracting and collating and ordering all the
sections of documentation and generating the correct documents can be
just as complex as building the system, if not more so.

How do you know what order to put things in (the order may depend on
several variables)?  Often, the docs in a file are extracted in order
of appearance and the order in which to compile documentation from the
files is manually recorded or programmed (into a script, or perhaps a
makefile).  Sometimes sections of docs in the same file have known
identifier tags associated with them (or supply their own tags) and the
logic of how to extract them from one or more files in a tag-defined
order is non-trivial.  Often this can require some kind of precompiled
database of extracted docs with known "locations" for each "tag."
This doc-database can take up significant space in the source-tree (just
like object files and executables do) and will require additional
effort to make sure it is updated when the "docs" change, but is
recompiled with minimal effort (just like we try to do when using
makefiles to build the executable from the sources).

I worked on such a documentation extraction and generation tool once upon
a time. It was a third-party tool we were enhancing and bundling into our
suite of programming language development tools. It was pretty spiffy
indeed, but believe me when I say these issues can be an extremely complex
to resolve - especially when the project needs to meet documentation
standards like DoD-2167a or Mil-Std-489 ''[sigh]''. Still, I have to admit
that, as painful as it was, it was ''far'' less painful then keeping such
profuse, formal documentation up to date by hand ''[yikes]''.

''5. Multiple Perspectives and Dimensions of Information''

There is not necessarily a single ''best'' presentation order for both
the code and its related documents. Documents and source-code may be
more suitable for viewing for different audiences and people.
Furthermore, there are multiple dimensions of information for each of
these multiple audiences that needs to be coherently distributed
throughout the codebase.

LoRD presumes the most convenient organization of artifacts to use is
according to the usage patterns of those who spend the most time living
in the codebase: the developers. So the organization of the code comes
first; and the organization of the various docs has to be dispersed
across it. But the most logical organization for navigating through the
code wont necessarily be the best organization for navigating the
portions of a particular document.

'''Concluding Remarks (praise the LoRD ;-)'''

I would have to say that ''minimizing the amount of documentation''
that must be created and maintained in the first place is probably
one of the ''most important'' aspects of applying LoRD (perhaps even a
precursor). This is what concepts like "self-documenting code" and
TheSourceCodeIsTheDesign are trying to achieve (though admittedly
they do this only for design and interface docs, and not for things
like end-user documentation). Note that "minimal documentation" is
not the same as "zero documentation". Here, "zero documentation" means
not only no design docs, but no user-docs, nor any other kind of docs
(not even a single comment anywhere in the code -- since comments are a
form of documentation). If there is zero documentation, the distance
between artifact and its description is infinity; But if the artifact
''is'' also the documentation, then this distance approaches zero.

However, even when you do minimize the amount of documentation you will
provide, some modicum of docs must still exist. Certainly end-user
documentation is always necessary, even if its all part of the online
help system. And some amount of design and interface documentation
will always be necessary, even if its predominantly in the form of
clearly crafted, readily readable code with a light sprinkling of
clarifying comments.

Its certainly desirable to keep code and docs as close as possible. But
its also possible to keep them too close, or to try too hard to keep
them close when faced with many of the obstacles mentioned above. The
strategies for applying LoRD, and the LoRD principle in general, are
not a hard and fast rule. As a general rule, the strongest mandate
that we can infer from LoRD is:

	 :	'''Code and documentation should be kept as close as feasible, but no closer!'''

I would say that LoRD is a ''force'', but not a pattern. Certainly some
of the ways of applying LoRD mentioned above may be patterns, but they
are not without their consequences and tradeoffs. There are other forces
and tradeoffs that may work against LoRD, and which have greater or lesser
importance depending upon the particular project, and its associated scale,
diversity, environment, and users.

Still, its something to keep in mind, and to strive for whenever feasible.

-- BradAppleton

-----

There is truth here! Er, I mean "I HaveThisPattern". If I gave advice (I don't, I sell it), I (who drones on at infinite length) would advise: shorten this to 1/10 this size and lose nothing. It'll be hard, but the result will be pretty fine!  Keep at it, Brad! -- RonJeffries

----

Somewhat related to LocalityOfError?

----

This sounds like an idea we're playing with at TeamInaBox called Sub''''''Text. Have a look at http://www.teaminabox.co.uk/downloads/subtext/ but the essence of it is to provide dynamically built documentation to developers as they work. It uses a similar concept to AspectJ to help discover what documentation is relevant to the code being worked on. It looks like you've done a lot more thinking on this :-) (we had this idea a few days ago and threw a prototype together). We would value you ideas and comments. -- ChanningWalton

----

I would add items 6 and 7 to your list of approaches...

6. The CustomerTest''''s are the documentation.

7. The documentation is interwoven with the CustomerTest''''s.

See AgileRequirementsDocumentation

-- Steve Jorgensen

----

[Discussion of user docs moved to AssociateUserDocsWithSource]

----
CategoryDocumentation