Apache FOP and document properties

I am a little bit obsessive about checking out the document properties in PDF files I read. I can’t explain why, but there you have it.

I was sad when I noticed the PDF file being emitted by my XML has no document properties. So I figured, no problem, I can just go find the right FO tags, grep for them in the DocBook XSL and reverse engineer what stuff I need to put into my DocBook to get them set right. I figured it would be something obscure like, <author> or something. In fact, DocBook already knows who the author is, in order to format the title page nicely, so no dice there.

What I found is this page on the Apache FOP faq that explains that FOP can’t do it. WTF? This can’t be that hard, and it really seems like a natural thing that would make it into version 1.0. (Of course, FOP is version 0.94, which might explain something as well.) In their defense, it seems like this is also braindeadness in the FO spec, beacuse I found a commercial implementation of FO that says you have to resort to special extension namespaces to specify metadata in their implementation. But Apache FOP already has the “fox” namespace for this purpose, so no big deal.

The solution is to write your own Java program (real user friendly there, guys) that uses a nifty PDF library called iText to add on the metadata in a post processing step.

In case someone else needs it, here’s what I came up with:

/* based on example here:
   http://itextdocs.lowagie.com/tutorial/general/copystamp/index.php
*/

import java.io.FileOutputStream;
import java.util.HashMap;

import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.PdfStamper;

public class Stamp {
    public static void main(String[] args) {
        try {
            if (args.length != 6) {
                System.out.println("Arguments expected:");
                System.out.println("  pdf-in pdf-out title subject author keywords");
            } else {
                // we create a reader for a certain document
                PdfReader reader = new PdfReader(args[0]);

                // we create a stamper that will copy the document to a new file
                PdfStamper stamp =
                    new PdfStamper(reader,
                                   new FileOutputStream(args[1]));

                // adding the metadata
                HashMap moreInfo = new HashMap();
                moreInfo.put("Title", args[2]);
                moreInfo.put("Subject", args[3]);
                moreInfo.put("Author", args[4]);
                moreInfo.put("Keywords", args[5]);
                stamp.setMoreInfo(moreInfo);

                // closing PdfStamper will generate the new PDF file
                stamp.close();
            }
        }
        catch (Exception de) {
            de.printStackTrace();
        }
    }
}

Comments

One response to “Apache FOP and document properties”

  1. Just an update on this – it seems FOP 1.1 supports metadata: https://xmlgraphics.apache.org/fop/1.1/metadata.html

Leave a Reply

Your email address will not be published. Required fields are marked *