Pictures in the doc-file: surgical weight loss

  • Tutorial
Once upon a time there was one translation (as a process, not as a result). It existed in this way: pieces were cut out of a pdf with the original text, inserted directly into Word and a Russian text appeared from below through considerable mental effort, and, as a rule, a couple of other footnotes. And if in the original there were footnotes, then pictures also appeared in the footnotes. The translation went on and the file grew - there were 51 images in the 16-page file, and Word started working so slowly that the translation could not be the result.
For some reason, the built-in image compression function did not help at all, so it was decided to perform surgical intervention.

The file looks like this (narrow columns - not text, but pictures):


You can also get into the inside of a doc file, but it is much easier to do it in .docx. Re-save. We see that re-saving alone did not give anything - just as it weighed a file of 12 megabytes, it weighs.


As you know, a .docx file is a zip archive with a special internal structure - that’s why it’s easy to look into its guts, and there ...

in an unpacked form the directory wordwould weigh almost seven hundred megabytes ! It is not surprising that Word refuses to work ...
Of course, the main culprit is images, each of which takes about ten megabytes:


I wonder why Word stores explicitly bitmaps in EMF - vector format? There is probably some kind of in-depth explanation that will involve the argument about the PDF source (“PDF is also a vector format!”), But now we are not concerned about this. In order not to bother with editing the document.xml file when changing the format of the drawings, we restrict ourselves to only reducing their size. Fortunately, among all other useful functions, IrfanView can write EMF (Photoshop, for example, cannot out of the box). And he can reduce images in batches - we will use this function. 800 pixels on the long side is enough.

(in the picture on the top right, a fragment of the letter g is visible - and this is also the original version reduced in width of the screen).

Then you can pack back the changed files - they will weigh significantly less (15 times)
.

Word completely ate this file. True, when saving back to .doc it got a little thicker, but still not at all so:


Total : the file is compressed 6.5 times without reducing the visible quality (the file is not intended for printing, but even in this case it would look nice) , the translation process went much faster (and with fewer curses), and the house became somewhat calmer, which, without a doubt, is the main achievement.

UPD:
Discussion in the comments on exactly why Word inserts pictures in EMF

Also popular now: