====== PDF Workflow ======
===== Undocumented PDF Common Workflow Issues =====
* split pdf into individual files (burst)
* convert to Postscript
* remove all comments/annotations
* remove or flatten forms
* pdf/x standard (no transparency, no bookmarks/links?, no annotations/forms?)
* document exact pdf procedure for creating valid files with every company program (create a standard)
* flatten all comments/annotations
* print bookmarks list
===== Burst PDF pages into individual files =====
This gives you the ability to work with each page on an individual basis. You can also see how large each page is by comparing file sizes. You can edit and compress each page efficiently prior to putting them together in a single pdf file. Even if you work in Adobe Acrobat, you don't know the size of individual pages. So, if you want to identify large pages and optimize them, you can use this procedure and replace the pages when you are finished.
* requires pdftk
* example:
pdftk input.pdf burst
===== Convert PDF to PS (PostScript) =====
* these tools come with ghostscript
* this step can be the source of problems, so make sure to read the "Troubleshooting / Large Files" section below
* example:
pdf2ps input.pdf output.ps
or
pdf2ps -dLanguageLevel=1 input.pdf output.ps
===== Optimize individual PDF page =====
* these tools come with ghostscript
* burst a multi-page pdf file into single files
* options:
* convert the pdf file to ps, then use ps2pdf to write a new pdf file with image compression optimizations: (note that ps2pdf is a shell script for gs)
(force zip/flate image compression)
ps2pdf -dAutoFilterColorImages=false -dColorImageFilter=/FlateEncode input.ps output.pdf
or
(force jpeg compression)
ps2pdf -dAutoFilterColorImages=false -dColorImageFilter=/DCTEncode input.ps output.pdf
* use GhostScript to convert and write a new pdf file with image compression optimizations (see section below):
gs -dNOPAUSE -sDEVICE=pdfwrite -dAutoFilterColorImages=false -dColorImageFilter=/FlateEncode -dAutoFilterGrayImages=false -dGrayImageFilter=/FlateEncode -sOutputFile=output.pdf input.ps -c quit
===== Image Compression with Ghostscript =====
* http://casper.ghostscript.com/~ghostgum/pdftips.htm#image
* http://www.cs.wisc.edu/~ghost/doc/cvs/Ps2pdf.htm#note_2
* http://cosmocoffee.info/viewtopic.php?p=213
The ghostscript defaults -dAutoFilterColorImages=true and -dAutoFilterGrayImages=true cause ghostscript to automatically detect whether JPEG or Flate compression is most suitable for each image. JPEG is good for photo images. Flate is good for line drawings, cartoons and computer screen shots.
The compression can be forced to JPEG with
-dAutoFilterColorImages=false -dColorImageFilter=/DCTEncode
Other filters are /FlateEncode (zlib/gzip/pkzip) and /CCITTFaxEncode (ITU-T group 3 fax suitable for monochrome images).
To get smaller file sizes, enable image downsampling.
-dDownsampleColorImages=true -dColorImageDownsampleType=/Average
-dColorImageDownsampleThreshold=1.5 -dColorImageResolution=72
This says that if the image resolution is greater than 72*1.5=108dpi, it should be resampled to 72dpi by averaging the pixels. There are similar settings for Gray and Mono images.
Using -dPDFSETTINGS=/screen will set color and gray image downsampling to 72dpi, -dPDFSETTINGS=/ebook will downsample to 150dpi, and -dPDFSETTINGS=/printer will downsample to 300dpi.
===== Troubleshooting / Large Files =====
This process failed to work for me on a particular document when text was over an underlying image (the file was very large as a result). I was able to get everything to work if I opened the pdf file in Adobe Illustrator, then saved it as a PDF, and used this new PDF for the PDF -> PS -> PDF conversion process. The options I have tested to fix the problems are:
* open pdf file and save as pdf file using Adobe Illustrator, then pdf2ps, then ps2pdf
* open pdf file and save as ps file (via Print -> Save PDF as Postscript) using Apple Preview, then ps2pdf
* open pdf file and save as ps file using Adobe Acrobat, then ps2pdf
It should also be noted that resaving the file (even using optimization) in Acrobat 6 did not make a usable pdf file. The best indicator I can think of for this problem is to watch for unusually large Postscript files, when compared to others in the same file.
===== Export Images =====
You can export all images from a PDF file using the Advanced -> Export All Images... function in Adobe Acrobat 6