Python + Google Reader Podcast Script

    Introduction


    There was a useful post " Automating podcast downloads to mp3 player ". Useful for me, because I don’t use iTunes and other similar software (I don’t want to discuss it :). I just need to download a pack of podcasts that periodically gather in the reader’s feed. And I prefer Python to PHP.

    I would like to hear some advice - I am only learning Python. And I like to write posts with examples for beginners. Comments would, critics ... But to the point.

    Organization of the process

    I keep a list of podcast tapes in Google Reader . The ribbons are tagged with their tag and neatly in their folder:


    To pump out new podcasts that fall into the Podcasts folder, I wrote a small Python script . I took the pyrfeed library as a basis, in which the useful class GoogleReader is implemented.

    The source code of the library is available and includes a small example of working with it. There is documentation. True, I found the documentation only on the Google Reader API, and not on working with the library itself. There is also an example utility with a Gui interface for reading RSS feeds.

    Source

    Link to the archive with the source at the end.
    And here is the source of the main script:
    import sys
    import os
    import time
    import urlparse
    import urllib
    import progressBar
    import GoogleReader
    downloadDir = "myDownloadDir";
    logFile = downloadDir + "PodcastsDownloadTool.log";
    tag = "Podcasts";
    login = "myGoogleReaderLogin";
    password = "myGoogleReaderPassword";
    def GetLocalFileNameFromURL (fullpath):
        (filepath, filename) = os.path.split (urlparse.urlparse (fullpath) .path)
        return downloadDir + filename
    def LogMessage (message):
        f = open (logFile, "a")
        print >> f, message;
        f.close ();
        pass
    def DownloadFile (url, filename):
        progressBar.ResetProgressBar ();
        urllib.urlretrieve (url, filename, reporthook = progressBar.ProgressBarReportHook);
        pass
    def ProcessPodcastDownloading ():
        # Check and create dir
        if not os.path.exists (downloadDir):
            os.mkdir (downloadDir);
        # Login to Google Reader
        gr = GoogleReader.GoogleReader ();
        gr.identify (login, password);
        if gr.login ():
            print "Login OK";
        else:
            print "Login KO";
            return
        xmlfeed = gr.get_feed (feed = "user / - / label /% s"% tag, n = 17, xt = "user / - / state / com.google / read");
        for entry in xmlfeed.get_entries ():
            try:
                googleID = entry ['google_id'];
                if entry.has_key ('enclosure'):
                    # Prepare vars and print info
                    URLToDownload = entry ['enclosure'];
                    localFilePath = GetLocalFileNameFromURL (URLToDownload);
                    print "Title:% s"% entry ['title'];
                    print "Download from URL:% s ..."% URLToDownload;
                    print "Local file:% s"% localFilePath;
                    # Download file
                    DownloadFile (url = URLToDownload, filename = localFilePath)
                    # Log message
                    LogMessage ("% s% s% s% s \ n"% (time.strftime ('% x% X'), URLToDownload, googleID, entry ['published']));
                    print "Downloaded.";
                    # Mark as readed
                    gr.set_read (googleID);
                    print "Marked.";
            except:
                #Print and log error
                print "Error:", sys.exc_info ();
                LogMessage ("% s \ nError:% s \ nEntry:% s \ nException info:% s \ n% s \ n"% ("=" * 80, time.strftime ('% x% X'), entry, sys.exc_info (), "=" * 80));
        pass
    if __name __ == '__ main__':
        ProcessPodcastDownloading ();
    


    Code Explanations

    The main parameters are set at the beginning of the script:
    • downloadDir - directory where podcasts will be downloaded
    • logFile - log file
    • tag - the name of the tag / folder in Google Reader where the feeds will be viewed
    • login and password - login and password in Google Reader

    And then nothing complicated:
    • Google Reader authentication
    • receiving parsed RSS feeds
    • loop on records with information output and logging
    • actually downloading files
    • record label read

    The pyrfeed library itself is not included in the application. It is enough to download it, make a couple of lines (about which later), and put in a place acceptable for import. For example, in the Lib directory of the directory where Python is installed - then the library will become available to all scripts.
    My GoogleReader and web directories are located in the same directory where my script is located.

    Interface

    This is a console utility. Draw conclusions.
    A simple display of the loading process looks like this: The


    progress bar is taken from some example. I don’t remember exactly where. There are many examples on the Internet, and most are alike. The source is in the application.

    Patch for feed.py

    Unfortunately, the GoogleFeed class does not retrieve the download link from the resulting XML.
    I solved this problem by adding XML parsing after such a snippet:
    elif dom_entry_element.localName == 'link':
        if dom_entry_element.getAttribute ('rel') == 'alternate':
            entry ['link'] = dom_entry_element.getAttribute ('href')


    Such a piece:
    if dom_entry_element.getAttribute ('rel') == 'enclosure':
        entry ['enclosure'] = dom_entry_element.getAttribute ('href')


    It turned out like this:
    elif dom_entry_element.localName == 'link':
        if dom_entry_element.getAttribute ('rel') == 'alternate':
            entry ['link'] = dom_entry_element.getAttribute ('href')
        if dom_entry_element.getAttribute ('rel') == 'enclosure':
            entry ['enclosure'] = dom_entry_element.getAttribute ('href')
    elif dom_entry_element.localName == 'category':


    Disadvantages that suit me

    • Not supported resume. In the event of a failure, the RSS feed will not be marked as read. The next time you run the script, the file will be downloaded again.
    • Hence the next point - the next time you run the script, erroneous entries (for example, the entry does not have a link to the file to download, it happens) will be re-processed. You could tag them with a special tag and skip them in the future. But I am satisfied with the subsequent manual viewing of the tape for unread entries.
    • Configuration storage - login and password in clear text. The login is not so scary yet, but the password ... You can use the getpass () function or store it in another place.
      You can automate the launch of the script when you connect a USB flash drive or player, for example, using the USB Detect & Launch utility (it was already said about it on the hub).

    Finish

    And Sources in the archive.
    I copied the note from my page .

    Also popular now: