What is the recommended toolchain for formatting XML DocBook? [closed]

安稳与你 提交于 2019-11-28 03:24:15
Gustavo Carreno

I've been doing some manual writing with DocBook, under cygwin, to produce One Page HTML, Many Pages HTML, CHM and PDF.

I installed the following:

  1. The docbook stylesheets (xsl) repository.
  2. xmllint, to test if the xml is correct.
  3. xsltproc, to process the xml with the stylesheets.
  4. Apache's fop, to produce PDF's.I make sure to add the installed folder to the PATH.
  5. Microsoft's HTML Help Workshop, to produce CHM's. I make sure to add the installed folder to the PATH.

Edit: In the below code I'm using more than the 2 files. If someone wants a cleaned up version of the scripts and the folder structure, please contact me: guscarreno (squiggly/at) googlemail (period/dot) com

I then use a configure.in:

AC_INIT(Makefile.in)

FOP=fop.sh
HHC=hhc
XSLTPROC=xsltproc

AC_ARG_WITH(fop, [  --with-fop  Where to find Apache FOP],
[
    if test "x$withval" != "xno"; then
        FOP="$withval"
    fi
]
)
AC_PATH_PROG(FOP,  $FOP)

AC_ARG_WITH(hhc, [  --with-hhc  Where to find Microsoft Help Compiler],
[
    if test "x$withval" != "xno"; then
        HHC="$withval"
    fi
]
)
AC_PATH_PROG(HHC,  $HHC)

AC_ARG_WITH(xsltproc, [  --with-xsltproc  Where to find xsltproc],
[
    if test "x$withval" != "xno"; then
        XSLTPROC="$withval"
    fi
]
)
AC_PATH_PROG(XSLTPROC,  $XSLTPROC)

AC_SUBST(FOP)
AC_SUBST(HHC)
AC_SUBST(XSLTPROC)

HERE=`pwd`
AC_SUBST(HERE)
AC_OUTPUT(Makefile)

cat > config.nice <<EOT
#!/bin/sh
./configure \
    --with-fop='$FOP' \
    --with-hhc='$HHC' \
    --with-xsltproc='$XSLTPROC' \

EOT
chmod +x config.nice

and a Makefile.in:

FOP=@FOP@
HHC=@HHC@
XSLTPROC=@XSLTPROC@
HERE=@HERE@

# Subdirs that contain docs
DOCS=appendixes chapters reference 

XML_CATALOG_FILES=./build/docbook-xsl-1.71.0/catalog.xml
export XML_CATALOG_FILES

all:    entities.ent manual.xml html

clean:
@echo -e "\n=== Cleaning\n"
@-rm -f html/*.html html/HTML.manifest pdf/* chm/*.html chm/*.hhp chm/*.hhc chm/*.chm entities.ent .ent
@echo -e "Done.\n"

dist-clean:
@echo -e "\n=== Restoring defaults\n"
@-rm -rf .ent autom4te.cache config.* configure Makefile html/*.html html/HTML.manifest pdf/* chm/*.html chm/*.hhp chm/*.hhc chm/*.chm build/docbook-xsl-1.71.0
@echo -e "Done.\n"

entities.ent: ./build/mkentities.sh $(DOCS)
@echo -e "\n=== Creating entities\n"
@./build/mkentities.sh $(DOCS) > .ent
@if [ ! -f entities.ent ] || [ ! cmp entities.ent .ent ]; then mv .ent entities.ent ; fi
@echo -e "Done.\n"

# Build the docs in chm format

chm:    chm/htmlhelp.hpp
@echo -e "\n=== Creating CHM\n"
@echo logo.png >> chm/htmlhelp.hhp
@echo arrow.gif >> chm/htmlhelp.hhp
@-cd chm && "$(HHC)" htmlhelp.hhp
@echo -e "Done.\n"

chm/htmlhelp.hpp: entities.ent build/docbook-xsl manual.xml build/chm.xsl
@echo -e "\n=== Creating input for CHM\n"
@"$(XSLTPROC)" --output ./chm/index.html ./build/chm.xsl manual.xml

# Build the docs in HTML format

html: html/index.html

html/index.html: entities.ent build/docbook-xsl manual.xml build/html.xsl
@echo -e "\n=== Creating HTML\n"
@"$(XSLTPROC)" --output ./html/index.html ./build/html.xsl manual.xml
@echo -e "Done.\n"

# Build the docs in PDF format

pdf:    pdf/manual.fo
@echo -e "\n=== Creating PDF\n"
@"$(FOP)" ./pdf/manual.fo ./pdf/manual.pdf
@echo -e "Done.\n"

pdf/manual.fo: entities.ent build/docbook-xsl manual.xml build/pdf.xsl
@echo -e "\n=== Creating input for PDF\n"
@"$(XSLTPROC)" --output ./pdf/manual.fo ./build/pdf.xsl manual.xml

check: manual.xml
@echo -e "\n=== Checking correctness of manual\n"
@xmllint --valid --noout --postvalid manual.xml
@echo -e "Done.\n"

# need to touch the dir because the timestamp in the tarball
# is older than that of the tarball :)
build/docbook-xsl: build/docbook-xsl-1.71.0.tar.gz
@echo -e "\n=== Un-taring docbook-xsl\n"
@cd build && tar xzf docbook-xsl-1.71.0.tar.gz && touch docbook-xsl-1.71.0

to automate the production of the above mentioned file outputs.

I prefer to use a nix approach to the scripting just because the toolset is more easy to find and use, not to mention easier to chain.

We use XMLmind XmlEdit for editing and Maven's docbkx plugin to create output during our builds. For a set of good templates take a look at the ones Hibernate or Spring provide.

For HTML output, I use the Docbook XSL stylesheets with the XSLT processor xsltproc.

For PDF output, I use dblatex, which translates to LaTeX and then use pdflatex to compile it to PDF. (I used Jade, the DSSSL stylesheets and jadetex before.)

Verhagen

We use

  • Serna XML Editor
  • Eclipse (plain xml editing, mostly used by the technical people)
  • own specific Eclipse plug-in (just for our release-notes)
  • Maven docbkx plug-in
  • Maven jar with specific corporate style sheet, based on the standard docbook style-sheets
  • Maven plug-in for converting csv to DocBook table
  • Maven plug-in for extracting BugZilla data and creating a DocBook section from it
  • Hudson (to generate the PDF document(s))
  • Nexus to deploy the created PDF documents

Some ideas we have:

Deploy with each product version not only the PDF, but also the original complete DocBook document (as we partly write the document and partly generate them). Saving the full DocBook document makes them independent for changes in the system setup in the future. Meaning, if the system changes, from which the content was extracted (or replaced by diff. systems) we would not be able to generate the exact content any more. Which could cause an issue, if we needed to re-release (with different style-sheet) the whole product ranche of manuals. Same as with the jars; these compiled Java classes are also placed in Nexus (you do not want to store them in your SCM); this we would also do with the generated DocBook document.

Update:

Fresh created a Maven HTML Cleaner Plug-in, which makes it possible to add DocBook content to a Maven Project Site (Beta version available). Feedback is welcome through the Open Discussion Forum.

The DocBook stylesheets, plus FOP, work well, but I finally decided to spring for RenderX, which covers the standard more thoroughly and has some nice extensions that the DocBook stylesheets take advantage of.

Bob Stayton's book, DocBook XSL: The Complete Guide, describes several alternate tool chains, including ones that work on Linux or Windows (almost surely MacOS, too, though I have not personally used a Mac).

A popular approach is to use DocBook XSL Stylesheets.

Regarding the question about Apache's FOP: when we established our toolchain (similar to what Gustavo has suggested) we had very good results using the RenderX XEP engine. XEPs output looks a little bit more polished, and as far as I recall, FOP had some problems with tables (this was a few years ago though, this might have changed).

With FOP you get the features that someone decided they wanted bad enough to implement. I'd say that no one who's serious about publishing uses it in production. You're far better off with RenderX or Antenna House or Arbortext. (I've used them all over the last decade's worth of implementation projects.) It depends on your business requirements, how much you want to automate, and what your team's skills, time, and resources are like as well. It's not just a technology question.

uman

If you're on Red Hat, Ubuntu, or Windows, you could take a look at Publican, which is supposed to be a fairly complete command line toolchain. Red Hat uses it extensively.

The article called The DocBook toolchain might be useful as well. It is a section of a HOWTO on DocBook written by Eric Raymond.

I've been using two CLI utils for simplifying my docbook toolchain: xmlto and publican.

Publican looks elegant to me but enough fitted for the Fedora & Redhat publication needs.

I release/am working on an open-source project called bookshop which is a RubyGem that installs a complete Docbook-XSL pipeline/toolchain. It includes everything needed to create and edit Docbook source files and output differing formats (currently pdf and epub, and growing quickly).

My goal is to make it possible to go from Zero-to-Exporting(pdf's or whatever) from your Docbook source in under 10 minutes.

The Summary:

bookShop is an OSS ruby-based framework for docbook toolchain happiness and sustainable productivity. The framework is optimized to help developers quickly ramp-up, allowing them to more rapidly jump in and develop their DocBook-to-Output flows, by favoring convention over configuration, setting them up with best practices, standards and tools from the get-go.

Here's the gem location: https://rubygems.org/gems/bookshop

And the source code: https://github.com/blueheadpublishing/bookshop

I prefer using Windows for most of my content creation (Notepad++ editor). Publican in Linux is a good tool chain to create a good documentation structure and process outputs. I use Dropbox (there are other document sharing services as well, which should work well on both platforms) on my Windows machine as well as Virtual Linux machine. With this setup I've been able to achieve a combination that works great for me. Once edit work is completed in Windows (which immediately syncs to Linux machine), I switch to Linux to run publican build and create HTML and PDF outputs, which again are updated in my Windows folder by Dropbox.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!