teolink October 11, 2013 at 19:02

Pages to PDF or service in 2 hours

You do not have a poppy, and again you found pages in the incoming message? What is wrong with them?
The last time, after a thorough search of “How to Browse iPages,” a simple and elegant solution was found. And, mind you, this is not a satellite-guided chainsaw. 1. Rename the extension in .zip 2. Open the resulting archive and find the pdf file in it 3. Profit! Do you think we settled only on this? We are so ~~uporolis~~ inspired that gash service for automatic conversion, ~~thinking about monetization scheme~~ . Under the cut you will find a detailed explanation of how it works.

In fact, .pages, like some formats for MacOS, store special preview data in .pdf format.

So that it is not boring in the implementation, instead of processing on the backend, we decided to play with muscles on js.
Yes, because of this, the converter does not work in IE10 + ~~who uses it?~~ because it does not have URL support (everything else can be done emulated). But nothing needs to be uploaded to the server, everything works on the client, which means it is instant (and it’s also safe, you’re not sending anything anywhere).

So how does this work?

pages is a zip container (by the way, like docx, xslx and other new generation office formats).
Inside it are:
-index.xml - the main presentation file
-buildVersionHistory.plist - the file with the metadata that it does is understandable by the name
-QuickLook / Thumbnail.jpg - the thumbnail image for preview inside the folder
-QuickLook / Preview.pdf - the preview file itself, which opens in macOS by pressing the space bar.

How do we get files with drag-and-drop or through inputs - it was told a hundred times, this is not interesting, let's skip this step.
We got this file, and in order to read the file, we need to run FileReader.

Scripts that work with files have a different input format - someone accepts Blob, someone accepts a binary string. We took js-unzip, one of dozens of easy to google solutions. Took for simplicity and clarity.

It requires a line of input, so we run FileReader in readAsBinaryString format:

if (file.type === "application/x-iwork-pages-sffpages") {
        var reader = new FileReader();
        reader.onload = function (event) {
            processZip(event.target.result)
        };
        reader.readAsBinaryString(file);
}

Note that there is no useful information in the event itself, event.target actually refers to the reader, and we could write processZip (reader.result).
Almost all browser standards are very similar in syntax, and FileReader is made with an eye on XMLHttpRequest, so everything will be pretty familiar.

We will also skip working with the zip archive - there are a lot of libraries on the network for this task, and each has its own syntax, especially since in this case zip is just a container, and you did not even have to unzip mechanisms.

The most interesting thing happens at the end (this code is a bit inconsistent with what is on the site, for readability):

var uintArray = new Uint8Array(dataString.length);
for (var i = 0; i < dataString.length; i++) {
    uintArray[i] = dataString.charCodeAt(i)
}
var blob = new Blob([uintArray], {type: 'application/pdf'});
gotLink(URL.createObjectURL(blob));

What happens here: A
binary string in a text expression stores the ascii codes of its bytes. We create a special typed array (uint8array) from single-byte unsigned integers, also in the range from 0 to 255, and byte-by-by-bye transfer the numeric values of the string characters into it.
This is necessary so that the blob-object (binary object in js) is created taking into account the fact that each number stores one byte - otherwise the characters can be interpreted differently and the file will not be generated correctly.
In this case, Blob itself accepts only arrays at the input, so we additionally have to wrap uintArray in a regular array.

Since the output link will be without a format, we additionally specify a mime type for the blob object.
And the biggest piece of magic on the site is using the

URL.createObjectURL (blob) function,

we get a link to the blob in memory. That is, literally - as soon as we close the parent document, the link stops working.
The link looks like this:

blob: http: // localhost: 8005 / 4222c9ec-1c66-4143-96a8-4223482148f6

This is how you can get a separate file from the archive and send it back to the client without accessing the server.
Unfortunately, if there was no need for a link from URL.createObjectURL, you could rely on the server to read the file for ie9 - Blob.poly.js exists, and you can work with it, but the base64 link on the output turned out to be of such size that the browser simply did not want to open it in a new window and hung.

PSIf you find bugs (OS, browser version) write in PM. I promise to fix it right away.

UPD All sources in the public repository on github

Tags:

Pages to PDF or service in 2 hours

Also popular now: