For a project I am doing right now, I descended into DocBook hell. Not completely unscathed, I made it through the learning curve (why don’t they call it what it is: The Unfathomable and Horrific Tunnel of Learning) and blinked slowly in the light of day.
I realized DocBook is nice, but it’s not actually what I wanted. Doh.
What I wanted was a structured way to represent my data, and I want two things to happen to it. Today, I want to publish my data with DocBook. Tomorrow I want someone to be able to suck the brains out of my DocBook document (leaving it to wander the earth as a zombie) and to put them into a wiki so that my project can become a community-maintained database, instead of being a single DocBook-formatted document that lives in my Subversion repository.
The way to do that is to step back from DocBook as a primary source, and instead generate DocBook from my data. I organize my original data into XML, then reformat it with XSLT from my data structure into DocBook, then DocBook reformats it into XML-FO or HTML, and those become readable documents.
To other people contemplating the same idea, I’d say “go for it”. But beware the startup cost is huge. The results are nice. Here are the things you’ll need to learn:
- How to use your XML editor (Emacs in nXML-mode for me). You cannot just limp along with Notepad or vi. Won’t work, don’t try it.
- How to write a schema for your new data structure. There are like 12 schema formats to choose from, but I chose Relax NG Compact Syntax because nXML-mode prefers it. Don’t use DTD or else your brain will melt. SGML = Bad. Everything post-SGML = slightly less bad. Farther from SGML is better (thus Relax NG compact = best).
- How to load your schema into your editor. (Tip: C-C C-S C-F for nXML)
- XSLT (which is the ugliest, stupidest, most verbose language ever foisted by computer science on users)
- How to use XInclude to build up XML documents from pieces. Don’t skimp on this. Figure it out and use it, because the alternative is the supreme ugliness that is SGML external entities. Remember: SGML = bad, XML = slightly less bad.
- How to make your XSLT processor work (and which of the 12 to choose — go with xsltproc, it’s super fast and stable, and doesn’t care which exact point release of Java you have. Remember: C good. Java bad. Write once, test everywhere…)
Don’t try to do it without a schema. You need to be 100% sure your data is in the right format before you go too far hacking on the XSLT, or else you’ll get confused and sad. Just suck it up and get the schema right, so that your editor can whack you with a clue-by-four before XSLT starts wasting your time going off into tag soup never-never land.
Leave a Reply