Basically when unused blobs of code are hanging around in a system. Say, for example, code that was written to support Win16 while the product is now supporting only Win32. See the AntiPatternsBook by Brown et al for more detail. ---- Are there any tools for C, C++ or Java for identifying lava flows? I would think that it'd be feasible to graph the functions executed by a program and flag any which do not have "main()" as a root. The reason I ask is that I've got several modules in our legacy project that suffer from severe (and I do mean severe!) lava flows due to the evolution over the years. I'd like to eradicate these but there is a tremendous amount of superstition regarding the removal of code. If I could highlight which functions are unused, it should be a lot easier to convince everyone that it's safe to remove those blocks. -- BruceIde * ''Remove it. You can always grab it back out of SourceControl.'' ''Watch yourself pretty carefully, there, Bruce. You might have functionality that is invoked through some sort of interrupt. In that case there is no "root" for the functions to derive from; they are invoked through other mechanisms.'' ''Still, it would be nice to identify all the fluff that systems accumulate throughout the years and get rid of as much as one can. Non-existent code needs no VCS storage space, no supporting documents, and no maintenance. It does not cause bugs nor add complexity. It needs no testing, no validation, and no SEI metrics.'' ''Get rid of what you can. Keep what you must. This is the tao of product development, my son.'' Our code is pretty straightforward -- it's a back-end database loader with no GUI component. There are a few places now where we use tables of pointers to functions now, but overall it would be pretty easy to follow the flow of the code. I think it'd be possible to do what I need done using the data that cscope creates, though we might have to explicitly tell the program about our function pointers. The unused code does make it significantly more difficult to grasp the overall focus of the program, this increasing the difficulty with which it can be maintained. --BruceIde ''You might want to consider a coverage analysis tool. Parasoft makes an excellent one: http://www.parasoft.com/jsp/products/home.jsp?product=Insure&itemId=70 for C/C++ applications.'' ----- ''Since it's on the website anyway, I thought I'd share the whole LavaFlow text with the rest of the wiki...'' http://www.antipatterns.com/images/lavaflow_s.gif ''The Lava Flow of obsolete technologies and forgotten extensions, leaves hardened globules of dead code in its wake.'' * AntiPattern Name: '''Lava Flow''' * Also Known As: Dead Code * Most Frequent Scale: Application * Refactored Solution Name: Architectural Configuration Management * Refactored Solution Type: Process * Root Causes: Avarice, Greed, Sloth * Unbalanced Forces: Management of Functionality, Performance, Complexity '''Anecdotal Evidence''' : ''Oh that! Well Ray and Emil (they're no longer with the company) wrote that routine back when Jim (who left last month) was trying a work around for Irene's input processing code (she's in another department now, too). I don't think it's used anywhere now, but I'm not really sure. Irene didn't really document it very clearly, so we figured we just leave well enough alone for now' After all, the bloomin' thing works doesn't it?!'' '''Background''' In a data mining expedition, the authors were looking for insight into developing a standard interface for a particular kind of system. The system we were mining was very similar to those we hoped would eventually support the standard we were working on. It was also a research-originated system and was highly complex. As we delved into it, we interviewed many of the developers concerning certain components of the massive pages of code printed out for us. Over and over again we got the same answer, "I don't know what that class is for, it was written before I got here." We gradually realized that between 30 and 50% of the actual code comprising this complex system was not understood or documented by any one currently working on it. Furthermore, as we analyzed it, we learned that the questionable code really served no purpose in the current system, rather it was mostly there from previous attempts or approached by long-gone developers. The current staff, while very bright, was loath to modify or delete code that they didn't write or didn't know what it did for fear of breaking something and not knowing why or how to fix it. At this point, we began referring to these blobs of code as "hardened lava", referring to the fluid nature in which they originated as compared to the basalt-like hardness and difficulty in removing it once it had solidified. Suddenly, it dawned on both of us that we had identified a potential AntiPattern. Nearly a year later, and after several more data mining expeditions, and interface design efforts, we had encountered the same pattern so frequently that we were routinely referring to Lava Flow throughout the department. '''General Form''' http://www.antipatterns.com/images/fig3_s.gif ''Typical Lava Flow source code listing'' The Lava Flow AntiPattern is commonly found in systems that originated as research but ended up in production. It is characterized by the lava-like "flows" of previous developmental versions strewn about the code landscape, but now hardened into a basalt-like, immovable, generally useless mass of code which no one can remember much if anything about. This is the result of earlier (perhaps Jurassic) developmental times while in a research mode where developers tried out several ways of accomplishing things, typically in a rush to deliver some kind of demonstration, and therefore casting sound design to the winds and sacrificing documentation. The result is several fragments of code, wayward variables classes and procedures that are not clearly related to the overall system. In fact, these flows are often so complicated looking and spaghetti-like that they seem important but no one can really explain what they do or why they exist. Sometimes, an old gray-haired hermit like developer can remember certain details, but everyone has decided to "leave well enough alone" since they code in question "doesn't really cause any harm, and might actually be critical, and we just don't have time to mess with it." "Architecture is the art of how to waste space". -- Phillip Johnson While it can be fun to dissect these flows and study their anthropology, there is usually not time in the schedule for such meanderings. Instead, we take the expedient route and neatly work around them. This AntiPattern is, however, incredibly common in innovative design shops where proof of concept or prototype code rapidly moves into production. It is poor design, for several key reasons: The Lava Flows are expensive to analyze, verify and test. All such effort is expended entirely in vain and is an absolute waste. In practice verification and test are rarely possible. The Lava flow code can be expensive to load into memory, wasting important resources and impacting performance. As with many AntiPattern''''''s, you lose many of the inherent advantages of an OO design, in this case, you lose the ability to leverage modularization and reuse without further proliferating the lava flow blobs '''Symptoms And Consequences''' * Frequent unjustifiable variables and code fragments in the system. * Undocumented complex, important-looking functions, classes or segments which don't clearly relate toe the system Architecture. * Very loose "evolving" system architecture. * Whole blocks of commented-out code with no explanation or documentation. * Lots of "in flux" or "to be replaced" code areas. * Unused (dead) code, just left in. Unused, inexplicable or obsolete interfaces in header files. * Code that is actively working - all of the results of which are ultimately ignored or discarded without ever affecting anything else. * If existing lava flow code is not removed, it can continue to proliferate as code is reused in other areas. * If the process that leads to lava flow is not checked, there can be a geometric growth as succeeding developers, to rushed or intimidated to analyze the original flows, continue to produce new, secondary flows as they try to "work around the original ones. * As the flows compound and harden, it rapidly becomes impossible to document the code or understand its architecture enough to make improvements. '''Typical Causes''' * R&D code placed into production without thought toward Configuration Management. * Uncontrolled distribution of unfinished code. Implementation of several trial approaches toward implementing some functionality. * Single developer (lone wolf) written code. * Lack of configuration management or compliance with process management policies * Lack of architecture, or non-architecture driven development. This is especially prevalent with highly transient development teams. * Iterative Development Process - Often the goals of the software project are unclear or change repeatedly. To cope with the changes, the project must do rework, backtracking, and prototype development efforts. In response to a demonstration deadlines, there is a tendency to make hasty changes to code on-the-fly to deal with immediate problems. The code is never cleaned up, leaving architectural consideration and documentation postponed indefinitely. * Architectural Scars. Sometimes architectural commitments are made during requirements analysis, only to be discovered not to work after some amount of development. The system architecture may be reconfigured, but these in-line mistakes are seldom removed. It may not even be feasible to commented out unnecessary code, especially in modern development environments where hundreds of individual files comprise the code of a system. "Who's going to look in "all those files," just link em in!" '''RefactoredSolution''' There is only one sure-fire way to prevent the Lava Flow AntiPattern. That is by ensuring that sound architecture precedes production code development. This architecture must be backed up by a configuration management process that ensures architectural compliance, and accommodates mission creep (changing requirements). If architectural consideration is short-changed up front, code is developed that is ultimately not a part of the target architecture, and therefore redundant or dead code. Over time, dead code becomes problematic for analysis, testing and revision. In cases where Lava Flow exists already, the cure can be painful. An important principle is to avoid architecture changes during active development. In particular, this applies to computational architecture, the software interfaces defining the systems integration solution. Management can postpone development until a clear architecture has been defined and disseminated to developers. Defining the architecture may require one or more system discovery activities. System discovery is required to locate the components that are really used and necessary to the system. The system discovery also identifies those lines of code that can be safely deleted. This activity is tedious; it can require the investigative skills of an experienced detective. As suspected dead code is eliminated bugs will be introduced. When this happens, resist the urge to immediately fix the symptoms without fully understanding the cause of the error. Study the dependencies. This will help you to better define the target architecture. To avoid Lava Flow, it is important to establish system-level software interfaces that are stable, well defined, and clearly documented. Investment up-front in quality software interfaces can produce big dividends in the long-run compared to the cost of jack-hammering away hardened globules of lava flow code. VersionControl tools such as the Source-Code Control System (SCCS) assist in configuration management. SCCS is bundled with most Unix environments and provides a basic capability to record histories of updates to configuration-controlled files. '''Related Solutions''' In today's competitive world, it is often desirable to minimize the time delay between R&D and production. In many industries this is critical to a company's survival. Where this is the case, inoculation against lava flow can sometimes be found in a customized CM process that puts certain limiting controls in place at the prototyping stage, sort of like "hooks" into a real production-class development without the full restraining impact on the experimental nature of R&D. Where possible, automation can play a big role here, but the key is in customizing a quasi CM process that can readily be scaled into a full-blown CM control system once the product moves into a production environment. The issue here is really one of balance between the costs of CM in hampering the "creative process" and the cost of rapidly gaining CM control of the development once that "creative process" has birthed something useful and marketable. This approach can be facilitated by periodic mapping of a prototyping system into an updated system architecture, including limited, but standardized in line documentation of the code. '''Applicability to Other AntiPatternsViewPoints and Scales''' The architectural viewpoint plays a key role in initially preventing lava flows. Managers can also play a role in early identification of lava flows or the circumstances that can lead to lava flows. These managers must also have the authority to put the brakes on when lava flow is first identified, and then taking corrective measures. As with most AntiPattern''''''s, prevention is always cheaper than correction, so up front investment in good architecture and team education can typically insure a project against this and most other AntiPattern''''''s. While this initial investment will not show concrete results, it is certainly a good investment. -- regards, SkipMcCormick ---- LavaFlow is commonly used now to refer to APIs that were incorrectly designed (sometimes atrociously so) but cannot be changed because too many things (and not all of them under the developers control) depend on them behaving exactly as they do.