Six weeks before Google Reader closes - save everything you can

    image

    Google Reader appeared in 2005. A year or two later, I began to use it as the main source of information. And suddenly, here, get - not profitable, not profile, close ... As a result, firstly, they lost an advanced (geeks) and loyal audience, and secondly, these geeks immediately began to write or add various alternatives. The segmentation intensified, the problem of choice arose, and in general some folks got pissed off ...

    During all this time I have accumulated about 30 subscriptions, which I regularly read and plan to continue in the future. The official recommendation on the blog suggests using the Google Takeout service to upload subscriptions and bookmarks to a file.

    I went, unloaded. Looked for alternatives ( times ,two , three , four ). Found, downloaded. Immediately problems:

    • Limited depth of story
    • Starred items import doesn't work everywhere
    • Part of the blogs or articles to which the subscriptions link has long been offline


    For the sake of completeness of the story and saving posts from dead blogs, I had to strain and write a tool to download full-text articles from Google Reader (all that was in the RSS feed). There are articles to which I periodically return, and I do not have habits to save in instapaper / scrapbook / evernote. In addition, I often used the services making Full-Text RSS from scant streams (like HackerNews) and in this regard my subscriptions are quite readable right in the reader.

    To work with the Reader API, there is documentation and a couple of modules for Python (sorry, I didn’t look at other languages). Of these, you can immediately take libgreader and do not read the rest. The result is a fetch-google-reader project on Github.

    1. Install (preferably in virtualenv, plus for python <2.7 you will still need the argparse module):

    pip install git+git://github.com/max-arnold/fetch-google-reader.git
    curl -s -k https://raw.github.com/max-arnold/fetch-google-reader/master/requirements.txt | xargs -n 1 pip install
    


    2. Create a directory where articles will be saved:

    mkdir rss-backup
    cd rss-backup
    


    3. We receive the list of RSS subscriptions:

    fetch-greader.py -u YOUR-USERNAME@gmail.com -p YOUR-PASSWORD
    * Please specify feed number (-f, --feed) to fetch: *
    [0] Atomized
    [1] Both Sides of the Table
    [2] Hacker News
    [3] Signal vs. Noise
    [4] хабрахабр: главная / захабренные
    


    Select the desired one and start the download:

    fetch-greader.py -u YOUR-USERNAME@gmail.com -p YOUR-PASSWORD -f 0
    * Output directory: atomized *
    ---> atomized/2011-05-24-i-hate-google-everything/index.html
    ---> atomized/2011-01-19-toggle-between-root-non-root-in-emacs-with-tramp/index.html
    ---> atomized/2010-10-19-ipad/index.html
    ---> atomized/2010-09-01-im-not-going-back/index.html
    ---> atomized/2010-08-31-they-cant-go-back/index.html
    ---> atomized/2010-08-28-a-hudson-github-build-process-that-works/index.html
    ---> atomized/2010-08-18-frame-tiling-and-centering-in-emacs/index.html
    ---> atomized/2010-08-17-scratch-buffers-for-emacs/index.html
    ---> atomized/2010-07-01-reading-apress-pdf-ebooks-on-an-ipad/index.html
    


    By default, all articles from the selected stream are downloaded, but you can restrict yourself to those marked by adding the --starred switch. Also, using the --dir key, you can independently specify the directory where the files will be saved.

    The RSS feed is saved in a directory with the name obtained from the name of the stream (converted to Latin). Each article is saved in a separate directory with the name obtained from the date and title of the article. This is done so that you can save additional metadata or pictures there. At the moment, the pictures are not saved because the utility is designed for blogs that are no longer offline, but nothing prevents it from being finalized.

    Question to the audience


    Which of the existing alternatives does not limit the depth of the RSS feed history (i.e. stores everything that has ever been downloaded) and caches full-text content forever? And just in case, has an API or export so that all this can be pulled to your computer when the cards fall again wrong?

    PS Picture © mashable.com

    Also popular now: