E-books and their formats: talking about EPUB - its history, pros and cons
Earlier in the blog, we wrote about how DjVu and FB2 e-book formats appeared .
The topic of today's article is EPUB. Image: Nathan Oakley / CC BY

In the 90s, proprietary solutions dominated the e-book market. And many manufacturers of readers have their own format. For example, NuvoMedia used files with the extension .rb. These were containers with an HTML file and an .info file containing metadata. This state of affairs complicated the work of publishers - they had to typeset books for each format separately. A group of engineers from Microsoft, already mentioned by NuvoMedia and SoftBook Press, took on the situation.
If we talk about NuvoMedia, then this company is considered the manufacturer of the first mass electronic reader Rocket eBook . The internal memory of the device was only eight megabytes, and the battery life did not exceed 40 hours. As for the SoftBook Press, they also developed electronic readers. But their devices had a distinctive feature - a built-in modem - it allowed you to download digital literature directly from the SoftBookstore store.
In the early 2000s, both companies - NuvoMedia and SoftBook - were bought by the Gemstar media company and merged into the Gemstar eBook Group. This organization has been selling readers for several years (for example, RCA REB 1100 ) and digital books, but in 2003 it went out of business .
But back to the development of a single standard. In 1999, Microsoft, NuvoMedia and SoftBook Press founded the Open eBook Forum, which included working on a draft document that marked the beginning of EPUB. Initially, the standard was called OEBPS (stands for Open EBook Publication Structure). It allowed to distribute a digital publication in a single file (ZIP archive) and simplified the transfer of books between different hardware platforms.
Later, Adobe, IBM, HP, Nokia, Xerox, and publishers McGraw Hill and Time Warner joined the Open eBook Forum. Together, they continued to develop OEBPS and developed the digital literature ecosystem as a whole. In 2005, the organization was renamed the International Forum on Digital Publications, or IDPF .
In 2007, IDPF changed the name of the OEBPS format to EPUB and began developing its second version. She was introduced to the general public in 2010. The novelty was almost no different from its predecessor, but received support for vector graphics and embedded fonts.
In 2009, the Google Books project announced support for EPUB - it was used to distribute more than a million free books. The format began to gain popularity among writers. In 2011, Joan Rowling talked about plans to launch the Pottermore website and make it the only digital point of sale for Potteriana books.
EPUB was chosen as the standard for the distribution of literature, primarily because of its ability to implement copy protection ( DRM ). All books in the online store of the writer are still available only in this format .
The third version of the EPUB format was released in 2011. The developers have added the ability to work with audio and video files and footnotes. Today, the standard continues to evolve - in 2017 IDPF even joined the W3C consortium, which implements technology standards for the World Wide Web.
The book in EPUB format is a ZIP archive. It stores publication text in the form of XHTML or HTML pages or PDF files. Also in the archive is media content (audio, video or images), fonts and metadata. It may also contain additional files with CSS or PLS- style documents with information for speech generation services.
XML markup is responsible for displaying content. A fragment of a book with built-in audio and image may look like this :
In addition to content files, the archive contains a special navigation document (Navigation Document). It describes the layout of text and images in a book. Reader applications access it if the reader wants to “jump over” several pages.
Another required file in the archive is package. It includes metadata - information about the author, publisher, language, title and so on. It also includes a spine of subsections of the book. An example of a package document can be found in the IDPF repository on GitHub .
The advantage of the format is its flexibility. EPUB allows you to create a dynamic layout of a document that adapts to the screen size of the device. This is one of the main reasons why the format supports a large number of readers (and other electronic devices). For example, all ONYX BOOX readers work “out of the box” from EPUB: from the base and 6-inch Caesar 3 to the premium and 9.7-inch Euclid .

/ ONYX BOOX Caesar 3
Since the format is built on top of popular standards (XML), it is easy to convert for reading on the Internet. EPUB also supports interactive elements. Yes, they are similar elements in the PDF, but you can add them to the PDF-document only using proprietary software. In the case of EPUB, they are added to the book with markup and XML tags in any text editor.
Also EPUB, as we have already noted, gives the publisher the ability to set copy protection. If desired, sellers of electronic books can use their mechanisms to restrict access to the document. To do this, modify the rights.xml file in the archive.
To create an EPUB publication, you need to understand the syntax of XML, XHTML, and CSS. At the same time, you have to work with a large number of identifier tags. For comparison, the same FB2 standard includes only the minimally necessary set of tags - sufficient for typesetting fiction. And to create PDF-documents do not require special knowledge at all - specialized software is responsible for everything.
EPUB is also criticized for the complexity of the design of comics and other books with many illustrations. In this case, the publisher has to create a static layout with fixed coordinates for each image - this can take a lot of time and effort.
IDPF is currently working on new specifications for the format. For example, one of them will help create interactive tutorials with hidden sections . The same book will look different for the teacher and student - in the second case, for example, answers to tests or control questions will be hidden.

Image: Guian Bolisay / CC BY-SA
IDPF also creates a specification for embedding Open Annotation footnotes in EPUB. This standard was developed at W3C in 2013 - it simplifies the work with complex types of annotations. For example, with its help, you can put a note to a specific section of a JPEG image. Additionally, the standard implements a mechanism for synchronizing changes in annotations between copies of one EPUB document. Open Annotation format notes can be added to EPUB files now, however, a formal specification for them has not yet been adopted.
Work is also underway on a new version of the standard - EPUB 3.2. WOFF 2.0 and SFNT formats will appear in it ., which are used to compress fonts (in some cases, they can reduce file sizes by 30%). Also, developers will replace some obsolete HTML attributes. For example, instead of a separate trigger element for activating audio and video files, the new standard will have native HTML audio and video elements.
A draft specification and a list of changes are already available in the W3C GitHub repository.
Reviews for ONYX-BOOX Readers:
The topic of today's article is EPUB. Image: Nathan Oakley / CC BY

Format history
In the 90s, proprietary solutions dominated the e-book market. And many manufacturers of readers have their own format. For example, NuvoMedia used files with the extension .rb. These were containers with an HTML file and an .info file containing metadata. This state of affairs complicated the work of publishers - they had to typeset books for each format separately. A group of engineers from Microsoft, already mentioned by NuvoMedia and SoftBook Press, took on the situation.
At that time, Microsoft was going to conquer the e-book market and was developing a reader application for Windows 95. We can say that creating a new format was part of the IT giant’s business strategy.
If we talk about NuvoMedia, then this company is considered the manufacturer of the first mass electronic reader Rocket eBook . The internal memory of the device was only eight megabytes, and the battery life did not exceed 40 hours. As for the SoftBook Press, they also developed electronic readers. But their devices had a distinctive feature - a built-in modem - it allowed you to download digital literature directly from the SoftBookstore store.
In the early 2000s, both companies - NuvoMedia and SoftBook - were bought by the Gemstar media company and merged into the Gemstar eBook Group. This organization has been selling readers for several years (for example, RCA REB 1100 ) and digital books, but in 2003 it went out of business .
But back to the development of a single standard. In 1999, Microsoft, NuvoMedia and SoftBook Press founded the Open eBook Forum, which included working on a draft document that marked the beginning of EPUB. Initially, the standard was called OEBPS (stands for Open EBook Publication Structure). It allowed to distribute a digital publication in a single file (ZIP archive) and simplified the transfer of books between different hardware platforms.
Later, Adobe, IBM, HP, Nokia, Xerox, and publishers McGraw Hill and Time Warner joined the Open eBook Forum. Together, they continued to develop OEBPS and developed the digital literature ecosystem as a whole. In 2005, the organization was renamed the International Forum on Digital Publications, or IDPF .
In 2007, IDPF changed the name of the OEBPS format to EPUB and began developing its second version. She was introduced to the general public in 2010. The novelty was almost no different from its predecessor, but received support for vector graphics and embedded fonts.
By this time, EPUB had conquered the market and became the default standard for many publishers and manufacturers of electronic gadgets. The format was already used by O'Reilly and Cisco Press, plus it was supported by Apple, Sony, Barnes & Noble, ONYX BOOX devices.
In 2009, the Google Books project announced support for EPUB - it was used to distribute more than a million free books. The format began to gain popularity among writers. In 2011, Joan Rowling talked about plans to launch the Pottermore website and make it the only digital point of sale for Potteriana books.
EPUB was chosen as the standard for the distribution of literature, primarily because of its ability to implement copy protection ( DRM ). All books in the online store of the writer are still available only in this format .
The third version of the EPUB format was released in 2011. The developers have added the ability to work with audio and video files and footnotes. Today, the standard continues to evolve - in 2017 IDPF even joined the W3C consortium, which implements technology standards for the World Wide Web.
How does EPUB work?
The book in EPUB format is a ZIP archive. It stores publication text in the form of XHTML or HTML pages or PDF files. Also in the archive is media content (audio, video or images), fonts and metadata. It may also contain additional files with CSS or PLS- style documents with information for speech generation services.
XML markup is responsible for displaying content. A fragment of a book with built-in audio and image may look like this :
the entire transcript
What does it mean to be human if we don't have a shared culture? What
does a shared culture mean if we can't share it? It's only in the last
100, or 150 years or so, that we started tightly restricting how that
culture gets used.

In addition to content files, the archive contains a special navigation document (Navigation Document). It describes the layout of text and images in a book. Reader applications access it if the reader wants to “jump over” several pages.
Another required file in the archive is package. It includes metadata - information about the author, publisher, language, title and so on. It also includes a spine of subsections of the book. An example of a package document can be found in the IDPF repository on GitHub .
Advantages
The advantage of the format is its flexibility. EPUB allows you to create a dynamic layout of a document that adapts to the screen size of the device. This is one of the main reasons why the format supports a large number of readers (and other electronic devices). For example, all ONYX BOOX readers work “out of the box” from EPUB: from the base and 6-inch Caesar 3 to the premium and 9.7-inch Euclid .

/ ONYX BOOX Caesar 3
Since the format is built on top of popular standards (XML), it is easy to convert for reading on the Internet. EPUB also supports interactive elements. Yes, they are similar elements in the PDF, but you can add them to the PDF-document only using proprietary software. In the case of EPUB, they are added to the book with markup and XML tags in any text editor.
EPUB's other strengths are its features for people with vision problems or dyslexia. The standard allows you to modify the display of text on the screen - for example, highlight certain letter combinations.
Also EPUB, as we have already noted, gives the publisher the ability to set copy protection. If desired, sellers of electronic books can use their mechanisms to restrict access to the document. To do this, modify the rights.xml file in the archive.
disadvantages
To create an EPUB publication, you need to understand the syntax of XML, XHTML, and CSS. At the same time, you have to work with a large number of identifier tags. For comparison, the same FB2 standard includes only the minimally necessary set of tags - sufficient for typesetting fiction. And to create PDF-documents do not require special knowledge at all - specialized software is responsible for everything.
EPUB is also criticized for the complexity of the design of comics and other books with many illustrations. In this case, the publisher has to create a static layout with fixed coordinates for each image - this can take a lot of time and effort.
What's next
IDPF is currently working on new specifications for the format. For example, one of them will help create interactive tutorials with hidden sections . The same book will look different for the teacher and student - in the second case, for example, answers to tests or control questions will be hidden.

Image: Guian Bolisay / CC BY-SA
The new feature is expected to help reorganize the educational process. Today, EPUB is quite actively used by large universities, for example, Oxford University. A few years ago, they added EPUB 3.0 support to their digital library application.
IDPF also creates a specification for embedding Open Annotation footnotes in EPUB. This standard was developed at W3C in 2013 - it simplifies the work with complex types of annotations. For example, with its help, you can put a note to a specific section of a JPEG image. Additionally, the standard implements a mechanism for synchronizing changes in annotations between copies of one EPUB document. Open Annotation format notes can be added to EPUB files now, however, a formal specification for them has not yet been adopted.
Work is also underway on a new version of the standard - EPUB 3.2. WOFF 2.0 and SFNT formats will appear in it ., which are used to compress fonts (in some cases, they can reduce file sizes by 30%). Also, developers will replace some obsolete HTML attributes. For example, instead of a separate trigger element for activating audio and video files, the new standard will have native HTML audio and video elements.
A draft specification and a list of changes are already available in the W3C GitHub repository.
Reviews for ONYX-BOOX Readers:
- Give the e-book reader in every pocket! Review of the latest innovations from ONYX BOOX
- When reading can be touched: ONYX BOOX Monte Cristo 4 review
- Journey to the Land of the Giants: ONYX BOOX Gulliver Review
- When the theorem became an axiom: ONYX BOOX Euclid review
- Every reader wants to be a monitor: ONYX BOOX MAX 2 review