Creating FB2 version of the latest issue of a magazine / newspaper

Background

Recently, more and more magazines and newspapers are posting their latest issues on the network (Vedomosti, Expert, Esquire, etc.). Everything is fine with these numbers, with one exception - to read them you need the Internet.
With the Internet, the problem is that it is not everywhere (metro) and not every device sees it (most e-ink readers).
In this regard, the idea was born that it would not be bad to make copies of periodicals in the form of electronic books (for example, in FB2 format).

Task

Create a solution that allows you to generate an FB2 file of a format from a piece of a site containing the desired issue of the journal (for example, Expert No. 32 for 2010 ).
The file should contain pictures and preferably a table of contents with a list of articles.
Creating a file with a new number should occur in (semi) automatic mode, take no more than 5-10 minutes and not require serious manual processing.

Search for a solution

Converter What-To-> FB2

As it turned out the HTML-> FB2 converters, the cat cried. But there are none at all that can automatically process a bunch of html pages and correctly compile a table of contents and register links. ~~Although maybe I was looking badly or not understanding the possibilities of what I found.~~
To start, I tried all the editors described in the review of computer .

"Any to FB2" - completely killed the Cyrillic alphabet (most likely due to crooked hands) and is imprisoned for working with one page.
“FictionBook Designer” is a powerful thing, but does not have (did not find) the auto-conversion function.
Web2FB2 - closest to what you want but has a limit of 10 pages and brings everything together in one pile without a table of contents

Further search was brought to the wonderful FeedConverter service (we already wrote about this service on the hub ).
Testing on the first available Russian RSS feed showed that the service:

copes with the Cyrillic alphabet
generates a table of contents in the form of a list of entries
takes pictures

Those. Now, in order to get the result, it is enough to feed a feed to this service, in which there will be full-text articles of the issue.

Full-text RSS feed

This site does not supply full-text RSS with data numbers. Only annotations of the last issue .
It’s convenient to use Yahoo Pipes to create Full Text RSS . We feed him our stream, and in a cycle we load the full text of the article - http://pipes.yahoo.com/pipes/pipe.edit?_id=661b8231fa3df88317939d452e772c10 . If the site does not provide RSS feeds at all, but only publishes articles (such as Esquire), the Yahoo Pipes mechanism allows you to parse the content of the page, get links from there and download the necessary articles. For this, I created a pipe http://pipes.yahoo.com/pipes/pipe.edit?_id=85427a7ff66aa7c06a1fa8da677fbd25
This mechanism has a plus in that it allows you to get any number, and not just the last one.
To do this, in the call line, you only need to change the parameter that is responsible for the year and the number inside the year http://pipes.yahoo.com/pipes/pipe.run?_id=85427a7ff66aa7c06a1fa8da677fbd25&_render=rss&number= 31 & year = 2010 .

Total

The final algorithm for creating the FB2 periodical version is as follows:

We find a site with information
We take RSS or an index page
Parsing the page in Yahoo Pipes and pulling up the full article
We feed feedConverter pipe and pick up the FB2 book
??????
PROFIT!

With a once configured pipe, obtaining a new number in FB2 form will consist of visiting the FeedConverter website and pressing the generation button.

A spoon of tar

Due to the fact that Yahoo Pipes does not work too fast, the generation may not pass on the first try. I hope the creators of FeedConverter do something about it.
At Yahoo Pipes there are restrictions on how much processor time can eat up one pipe. In this regard, some volume issues of the magazine do not fit into the Procrustean bed and fly out with an error (for example, Expert No. 1 for 2010 ). What to do with this is not yet clear. It may be worth parsing and loading texts into different pipes.
Full articles can be uploaded to ReadBox.info (see below)

Update 1: The comments below suggested an excellent service for creating full-text versions - ReadBox.info . In order to get FullText feed input, you need to feed the RSS feed and XPath block with text. Thus, the function of loading the text can be removed from Y! P, which will allow it to work more stably.
Actually now the process can be done like this:

We find a site with information
We take RSS or an index page
We leave only the necessary articles or parse the index page in Yahoo Pipes
We tighten the full text of articles using ReadBox.info
We feed RSS FeedConverter and we pick up the FB2 book
??????
PROFIT!

Update 2: For those who are not afraid of picking in configs and smoking manuals there is an excellent program for our purpose - nmdparser . Here is an example of how it can be configured to receive an archive copy of Autoreview in FB2 .

Tags: