PDF File Information
Content
A PDF file is often a combination of vector graphics, text, and raster graphics. The basic types of content in a PDF are:
- text stored as such
- vector graphics for illustrations and designs that consist of shapes and lines
- raster graphics for photographs and other types of image
In later PDF revisions, a PDF document can also support links (inside document or web page), forms, JavaScript (initially available as plugin for Acrobat 3.0), or any other types of embedded contents that can be handled using plug-ins.
PDF 1.6 supports interactive 3D documents embedded in the PDF.
Two PDF files which look similar on a computer screen may be of very different sizes. For example, a high resolution raster image takes more space than a low resolution one. Typically higher resolution is needed for printing documents than for displaying them on screen. Other things that may increase the size of a file is embedding full fonts, especially for Asiatic scripts, and storing text as graphics.
File Versions
PDF Version | Introduced | Acrobat Version | New Features |
---|---|---|---|
1.0 | 1992 | 1.x | internal links, bookmarks, embedded fonts, RGB color only |
1.1 | 1994 | 2.x | external links, article threads, security features, device independent color, notes; Acrobat 2.1 added support for multimedia (audio/video) in pdf files |
1.2 | 1996 | 3.x | OPI 1.3 specs, CMYK color space, spot colors could be maintained, halftone functions could be included as well as overprint instructions |
1.2 (PDF/X-1) | 1998 | none | can embed extra data like copydot files, ICC based colors, definitions of bleed, trim, and art-box, and a key that documents whether the file has already been trapped (more info in the subsets section below) |
1.3 | 1999 | 4.x | 2-byte CID fonts, OPI 2.0 specs, new color space (DeviceN) to improve spot colors, smooth shading (blends one color or tint to another), annotations |
1.4 partial (technote 5407) | 2000 | Illustrator 9 | transparency |
1.4 | 2001 | 5.x | transparency, improved security - including 128-bit encryption and the option of setting the quality of printing (you can define that a PDF can be printed but only in low resolution), improved support for javascript (including JavaScript 1.5 and better integration with databases), “Tagged PDF's” (more info in the Tagged PDF section below); also Acrobat 5 replaced the “Paper Capture” OCR plug-in with a fairly limited web based service |
1.5 | 2003 | 6.x | improved compression techniques including object streams, JPEG2000 compression and allowing 16-bit images, additional encryption options, selectively hide content, ability to allow the display of a PDF file as a slideshow, additional annotation types, transition actions, support for layers, improved support for tagged PDF |
1.6 | 2004 | 7.x | AES encryption, selectively encrypt embedded files, specify size unit of user space enabling larger maximum page size, enhanced DeviceN color spaces, embedding OpenType fonts, enhanced markup annotations, ability to specify non-rectangular regions for link annotations, additional support for embedded file attachments, ability to specify relationships between dimensions of objects on the page and their real world counterparts, ability to incorporate 3-dimensional models using the U3D format, Tagged PDF enhancements |
1.7 | 2006 | 8.x | presentation enhancements for 3D artwork, additions to interactive features for markup and review, and the ability to specify printing characteristics intended for a legal setting, accessibility related features, document navigation additions to handle “Portable Collections”, security related features, general features for improving cross-platform stability |
Subsets
Proper subsets of PDF have been, or are being, standardized under ISO for several constituencies:
- PDF/X for the printing and graphic arts as ISO 15930 (working in ISO TC130)
- PDF/A for archiving in corporate/government/library/etc environments as ISO 19005 (work done in ISO TC171)
- PDF/E for exchange of engineering drawings (work done in ISO TC171)
- PDF/UA for universally accessible PDF files
- A PDF/H variant (PDF for Healthcare) is being developed.[4] However, it may consist more in a set of “best practices” than in a specific format or subset.
PDF/X-1 History
Although vendors pushed hard to get PDF of the ground, the market was a bit slow to react. This was mainly due to the fact that the use of PDF required additional tools as well as some know-how on the file format, its limitations and curiosities. People also got disappointed of PDF when they discovered that it is a very open standard. Although the PDF standard was usable in a prepress environment, there were simply to many ways in which a perfectly valid but non-usable PDF-file could be created.
To solve the above issue, a consortium of prepress companies got together and released the PDF/X-1 standard in 1998. PDF/X-1 is based on the PDF 1.2 file specifications but it is a very well defined description on what a PDF file should look like to allow for blind transfers. A PDF/X-1 file is a file in which you are sure that all fonts are included, all highres images are embedded and so on.
The first version of PDF/X was called PDF/X-1. It is available in two flavours: PDF/X-1 1999 and PDF/X-1a 2001.
PDF/X-1 1999
As you can imagine this first version of PDF/X-1 was finalised in 1999. It was approved as an American national standard in that year and is aimed primarily at ad delivery for publications and newsprint in an environment where CMYK data is preferred. PDF/X-1 1999 is based on the PDF 1.2 specifications. This was the original file format created by Acrobat Distiller 3. It also dictated the inclusion of a specific ICC Profile. This particular ICC profile has been the cause of colour shifts in some cases and consequently, to get the right output, users have stripped out that profile.
PDF/X-1a 2001
The new PDF/X-1a 2001 specifications handle ICC profiling in a different way: there is still a ICC profile involved (which is a very reasonable thing to do as it describes the printing process for which the PDF file had been created), but while being embedded in the PDF/X-1:2001 file (sitting in a different, less 'risky' place, called OutputIntents) it is not automatically used by any standard piece of software or system, neither Acrobat nor any PostScript RIP nor any of the PDF workflow systems. As it has only device color (CMYK, spot colours) in it, you send it straight through any workflow, system, device and get what you'd expect. PDF/X-1a 2001files are also based on PDF 1.3 (Acrobat 4 files) and support both the DeviceN colour space (meaning that spot colours are better supported, even in duotones) and smooth shading (improved quality of blends). PDF/X-1a 2001 is ISO-certified (ISO 15930-1) and was released in the summer of 2001.
Main Characteristics
- All component files and resources must be embedded in the PDF/X-1 file. This includes high resolution data. OPI is not allowed in PDF/X-1 files.
- PDF/X-1 ignores music, movies and non-printable annotation.
- All fonts must be embedded in the file.
- All colour data must be CMYK (intent as an ICC profile) or named spot colours
- PDF/X-1 files contain extra operators that define the bleed and trim area.
- A PDF/X-1 embedded file may contain only raster data of the following types: TIFF/IT-P1, DCS 1 & 2 (including copydot data) and EPS.
- There is a separate flag (meaning a switch that is either ON or OFF) that details whether the PDF/X-1 file has already been trapped.
- Only a limited number of compression algorithms are supported.
PDF/X-2
PDF/X-2 is being developed as an international standard addressing the wider commercial print market. It was still being worked when I last updated this description (May 2001). PDF/X-2 is similar to PDF/X-1 but in PDF/X-2 files, elements like fonts or high resolution images can intentionally be omitted. It will also be based on the PDF 1.3 or 1.4 standard and support different colour spaces.
PDF/X-3
PDF/X-3 is being developed as an international standard, again primarily for ad delivery, but aimed at the needs of those who wish to use device independent colour. Contrary to PDF/X-1 and PDF/X-2, PDF/X-3 is mainly a European initiative. It is expected to be released in the second quarter of 2002.
Tagged PDF's
“Tagged PDFs” are PDF files that also contain structural information about the data that are representated by the PDF document. This means that meta-information like defining titles, blocks of text,… can be part of a PDF-document.
- This makes it easier to create PDF-files that can adapt themselves to the device they will be used upon. This new feature is mainly meant for the emerging market of ebooks, since it allows PDF files to be repurposd so they can be used on a wider variety of systems. Adobe has started shipping a version of Acrobat Reader that runs on PalmOS PDA's.
- It will also make it easier to repurpose content
Base 14 Fonts
There are fourteen typefaces that have a special significance to PDF documents: Times Roman (in standard, italic, bold, and bold oblique), Courier (in standard, oblique, bold and bold oblique), Helvetica (in standard, oblique, bold and bold oblique), Symbol and Zapf Dingbats. These should always be present (actually present or a close substitute) and so need not be embedded in a PDF. [2] PDF viewers must know about the metrics of these fonts. Other fonts may be substituted if they are not embedded in a PDF.