0

Remove Duplicate Fonts in PDF Files

The solutions to things are just out there awaiting discovery. You just need to dig a little bit deeper and do a research. When you combine PDFs using tools like Pdftk or Apache PDFBox, fonts get accumulated and they have, specially embedded ones, direct impact on merged PDF file’s size.

Below is an 85-page 4,153,344-byte PDF file with duplicate fonts. It could be way smaller without the duplicates. Now 85 pages is nothing. In production, PDF files can have pages as many as 20,000. Imagine the size of the merged PDF!

remove-dup-fonts-001

Merged PDF with Duplicate Fonts

To remove these duplicates, we use iText’s PdfSmartCopy class – com.itextpdf.text.pdf.PdfSmartCopy.

Below is my Eclipse workspace showing the Java file, reference libraries, and JDK version.

remove-dup-fonts-003

Eclipse Project (Kepler IDE)

When you run the application, it generates a smaller PDF file.

remove-dup-fonts-002

New PDF file with smaller size

You may still see the fonts listed in the PDF->Document Properties->Fonts tab, but they now point to the same set of references (or items) within the PDF document. Other duplicates, like shared images, are also removed.

Karl San Gabriel

Karl San Gabriel

Java and Enterprise Technologies Expert