Remove metadata from PDF

  • Tutorial
image
The seemingly trivial task is to remove metadata from the document. Already a thousand times, information security paranoia experts have given this kind of recommendation: “Be sure to remove excess meta-information from documents before publication.” And they explained why this might be needed ( example ). There are a lot of instructions on the network about how to do this for various image and document formats, but at the same time there is not enough intelligible information about such a common format as PDF.

I conducted a small experiment and, based on the results, put together a small toolchain and freeware utilities. That's all I want to share.

So, the first thing that was done was an attempt to delete data using Adobe Acrobat itself using the appropriate instructions. There is a result, but it cannot be called satisfactory, because firstly, it is shooting from a cannon at sparrows, and secondly, the output file size for some reason increased by almost an order of magnitude.

Then, among the crapware heap , wonderful BeCyPDFMetaEdit utility was found , but it confidently copes with PDF v 1.6 and lower, and the result is not guaranteed for newer revisions of the format.

The ultimate solution, as usual, came from the * nix world and the open source community. This is a bunch of utilities ExifTool , QPDF and Xpdf , each of which is also available under Windows. Because Since the licenses of these utilities do not prohibit their free distribution without changes, I safely collected them ina single archive (WINx64) with a script and elementary instructions for use. In short, you unpack the archive, put the pdf file to be cleaned into the resulting folder, and then drag it to DEMETA.bat. The script will work and your file will become virgin.


Also popular now: