At work we've been writing all our big documents in OpenOffice.org format, .SXW files, then pumping them through Apache Forrest
to create the site. This has some nice features
- Creates PDF and HTML pages from XML source documents
- Can produce a static site for documentation redistributions, and a dynamic site for iterative document development.
- Can insert link aliases across all documents
- Can detect broken links
However, I don't think the PDF pages are on a par with what OpenOffice can do itself, because OOO is very much WYSIWIG. So I spent an afternoon working out how to get Ant to generate the PDFs by way of OpenOffice.
There is some discussion
about a macro to do this, macros I took and turned into ones I could call from Ant. One to convert a single file, another to convert an entire directory
Sub ConvertWordToPDF( cFile)
ConvertToPDF(cFile,False,true)
end sub
Sub InnerConvertToPDF( cFile ,hidden,quiet)
hiddenValue=MakePropertyValue( "Hidden", hidden )
exportValue=MakePropertyValue( "FilterName", "writer_pdf_Export" )
cURL = ConvertToURL( cFile )
' Open the document.
' Just blindly assume that the document is of a type that OOo will
' correctly recognize and open -- without specifying an import filter.
oDoc = StarDesktop.loadComponentFromURL( cURL, "_blank", 0, Array(_
hiddenValue ,_
) )
file2 = Left( cFile, Len( cFile ) - 4 ) + ".pdf"
url2 = ConvertToURL( file2 )
if not quiet then
Print "["+cURL+"] => ["+url2+"]"
end if
' Save the document using a filter.
oDoc.storeToURL( url2, Array(exportValue,))
oDoc.close( True )
set oDoc=Nothing
End Sub
Function MakePropertyValue( Optional cName As String, Optional uValue ) As com.sun.star.beans.PropertyValue
Dim oPropertyValue As New com.sun.star.beans.PropertyValue
If Not IsMissing( cName ) Then
oPropertyValue.Name = cName
EndIf
If Not IsMissing( uValue ) Then
oPropertyValue.Value = uValue
EndIf
MakePropertyValue() = oPropertyValue
End Function
' Convert a bunch of SXW Documents.
Sub BulkConvert(cFolder)
InnerConvert(cFolder,true)
end sub
Sub InnerConvert(cFolder,quiet)
' This is the hardcoded pathname to a folder containing sxw files.
'cFolder = "/home/someone/temp"
' Get the pathname of each file within the folder.
pattern=cFolder + "/*.*"
cFile = Dir$( pattern )
converted = 0
Do While cFile <> ""
' If it is not a directory...
If cFile <> "." And cFile <> ".." and LCase( Right( cFile, 4 ) ) = ".sxw" Then
converted = converted + 1
InnerConvertToPDF(cFolder+"/"+cFile,false,quiet)
EndIf
cFile = Dir$
Loop
if converted=0 then
Print "Warning, no files matching "+pattern+" were found"
end if
End Sub
These macros need to be added to your OOo configuration. Using the tool's macro organizer, I created a library called SmartFrog and a module called Utils, on the (shareable) VM we use for creating releases. The macros were all pasted into this module.
We then have some ant tasks to call these macros from the build
<target name="init-ooo"> <property name="ooffice.exe" value="ooffice" /> <presetdef name="ooo"> <exec executable="${ooffice.exe}" failonerror="true"> </exec> </presetdef> <macrodef name="pdf"> <attribute name="file" /> <sequential> <ooo> <arg value="macro:///SmartFrog.Utils.ConvertWordToPDF(@{file})" /> </ooo> </sequential> </macrodef> <macrodef name="bulkpdf"> <attribute name="dir" /> <sequential> <ooo> <arg value="macro:///SmartFrog.Utils.BulkConvert(@{dir})" /> </ooo> </sequential> </macrodef> </target>
This target defines a new task, <ooo>, that runs open office and fails if it doesn't work. This preset is extended with two macros, pdf to export a single file to PDF, and then <bulkpdf> to convert every .sxw file in a directory (and complain if there were none).
To use them, we just point them at a file, or an entire folder:
<target name="pdf-doc-folder" depends="init, init-ooo" description="Generate PDFs from all .SXW files in the documentation folder"> <bulkpdf dir="${doc.dir}" /> </target>
This will create a .pdf file for every source .sxw, file in the same directory as the source files. That is is a bit naughty, and goes against the normal premise of separate source content from generated files. Why do this? It would be trivial to add a new destination directory parameter?
The reason we build the PDFs into the source documentation directory is that we don't want to force everyone who builds the product to recreate the PDFs on demand.
- It requires open office to be installed
- It requires the custom macro to be added to a new library
- It does not work on a Unix/linux server without X11 running and the user/machine doing the build having its DISPLAY variable set up and with the rights to access the display.
- It is asynchronous; the macro triggers the creation, but does not block until it is finished.
- When it is working, GUI windows appear everywhere.
Now, the macros do have a quiet option that could be turned on in the source file, to stop the process being visible, but when this is set, things didn't work. Also we need to see how things have finished; it can take a while to do a 200+ page document.
As a result, we do it by hand, we do it front of our eyes, and whoever is doing it has to leave that VM alone for ten minutes. After doing it, the files are checked back in.
This isn't perfect, but the PDFs it creates look nice. A ten minute stage to rebuild and check in the docs after locking down the documentation is not a complex part of the release process, not compared with what it takes to test that RPMs work properly. And the result is nice!
Long term, it would be nice for the OpenOffice development tools
to give me a direct Ant task that does synchronous document generation, ideally with errors feeding back into the build file, running on a headless build engine. Being all open source, I have the right to do this, but then so does any reader who is feeling suitably motivated...
Update 29 July 2007
Olivier Pernet had something like this working on a server, using xvfb and the -headless option of OpenOffice.org:
xvfb-run -a openoffice -headless
"macro:///FileConversion.Conversion.ConvertTextToPDF($1,$2)"