random-state.net

Nikodemus Siivola

<< next | top | previous >>

August 19th 2004 #
random, August 19th 2004

I am easily distracted. Instead of focusing on finishing one project I tend to have too many things on my hands at any given moment. Right now, counting only the SBCL related ones: linkage-table, STEP, docstring to texinfo converter enchancements, and plenty of documentation. (No, linkage-table and STEP won't make it in this month, but should do so for earlish 0.8.14.)

As if that was not enough I started looking at how to do XREF (LispM-style code cross-referencing) while drinking my morning coffee... Though I must say I was pleasantly surprised at the apparent simplicity of all the three main approaches:

  • File based static analysis, as done by the Mark Kantrowitz's now classic (portable) XREF package. Simply put, source files are read in and code is walked to produce the XREF data. Good: (1) simple and maintainable, (2) you pay for the luxury of XREF only when you want it, (3) can analyze code without loading it. Bad: (1) to get accurate WHO-CALLS for a symbol you need to process all the files for in the system, (2) cannot be distributed as a part of SBCL, since the licence is too restrictive — making it asdf-installable would be possible though, and this not a problem related to the technique itself.
  • Dynamic heap analysis, as is actually already done by SBCL's SB-INTROSPECT contrib module (based on similar work done for CMUCL by Helmut Eller, I believe). To list callers of a function traverse the heap, and see which functions contain references to the one we're interested in. Good: (1) actually dead easy, (2) pay as you go, (3) fairly accurate though I'm not sure if the current version picks up funcalls of constant symbols, and currently it certainly doesn't do WHO-REFERENCES — but I believe fixing that should be possible. Bad: (1) doesn't and cannot provide WHO-MACROEXPANDS, (2) doesn't and cannot provide XREF info for inlined functions.
  • Compile-time analysis, as done for CMUCL by Eric Marsden: XREF information is recorded by the compiler during the normal course of compilation. Good: (1) accurate, (2) provides the full shebang (WHO-CALLS, WHO-MACROEXPANDS, WHO-REFERENCES, etc.) Bad: (1) in order for the XREF information to be accurate you need to record it all the time — and this apparently has a rather severe performance hit on the compiler (not on the resulting code, just compilation time).

I'm thinking about a combination of the latter two: record macroexpansions and inline functions at compile-time (the theory is that this would have a neglible performance penalty), and get all the rest dynamically from the heap. This should give the best of both worlds: macroexpansions and inline functions can be seen in XREF, yet code compiled without recording can still be XREF'd in a limited fashion. This could potentially be further coupled with the static analysis approach by making the Kantrowitz's XREF package asdf-installable and allowing it to push the analysis results into SBCL's own database — but for me at least that's not too interesting.

So, once XREF exists, what can you do with it? For example:

  • Ask questions about the code to understand it better, including generation of fancy graphs.
  • Automate rebuilding of dependant parts when macros or inline functions are redefined. This is what I personally am most hungry for.
  • Automate DEFSYSTEM generation based on dependencies between code in image. (Though this requires something more then afaict provided by the implementations mentioned above: XREF information on types.)

So many hacks, so little time...