Bing + Python, image search

    Bing + PythonSometimes it’s necessary to collect pictures on a certain topic in order to be able to choose the right one from the existing set, etc. Current search engines provide such an opportunity, but you need to open a browser, navigate through the pages, work with the mouse and, in general, do it. I would like to have the console utility “run and forget” for a set of the necessary pictures. We consider the Bing API, getting started in Python, and their combination for image search.


    Introduction


    This is my first more or less large Python program that I began to study recently (by the way, many thanks to kossmak for his translations of articles). Examples of use at the end of the article.

    TK


    A console program that accepts a search string and the required number of images as input. The output is a subdirectory in the current one with search results.

    Why bing


    Some time ago it was necessary to test the asynchronous loader under ActionScript. Google was selected for the load, however, as a result, it turned out that Google produces no more than 64 results for requests through the API. This (at that time) was enough, but the sediment remained. After the searches, it was found: Yahoo (with comments that many of the data it provides are out of date) and Bing (which on its page promises up to 1000 results). Bing was chosen, because, in addition to the request itself, it allows you to impose filters on it (see below)

    Bing


    Bing development begins on the Bing Developer Center page . There you need to get APP_ID for signing each request, registration is minute. I didn’t really understand the imposed restrictions (maybe they just don’t exist), so I publish my test APP_ID along with the examples (if you intend to use it, I recommend that you add and drive your APP_ID into the code).

    Bing API


    An API exists for VB / C # / C ++ / F # / JS, but this example uses the final http request. Description of the API for searching images here
    So, the minimal request for searching for pictures and a response in JSON format looks like this:
    api.search.live.net/json.aspx?appid=APP_ID&sources=image&query=SEARCH_QUERY
    Example query (search for apple):
    http : //api.search.live.net/json.aspx? appid = 4EFC2F2CA1F9547B3C048B40C33A6A4FEF1FAF3B & sources = image & query = apple

    Python


    Everything is simple and cross-platform. The python itself (version 2.6.x) is put from here . As a development environment, I really liked PyDev. We put Eclipse (if not already) and from it we put PyDev

    Algorithm


    I will not comment block by block, there are a lot of comments in the code, moreover, it is not so large as not to put it in one block. Short:
    • In the main loop, a request is sent to the Bing API and the image.offset parameter is increased until either the required number of images is typed, or the Bing API does not show that the results are over.
    • Each request asks for 8 pictures (stopped at this size, 4 is too small, for 16 sometimes it takes a long time to wait for an answer, maximum 50).
    • For each picture found, a URL is retrieved, and a thread is started that downloads the picture to memory and saves to disk. I ran into a problem here - pictures are often called the same. So the save function “blocks” the remaining threads, and adds “_” to the file name in front, until it turns out that there is no such file yet. Further saving and unlocking.


    The code


    # import used libraries
    import urllibjsonsysosthreading

    def load_url(url, filename, filesystem_lock):
        try:
            # open connection to URL
            socket = urllib.urlopen(url)
            # read data
            data = socket.read()
            # close connection
            socket.close()
        # on all exceptions
        except:
            print "error loading", url
        # if no exceptions
        else:
            # save loaded data
            save_to_file(data, filename, filesystem_lock)
            
    def save_to_file(data, filename, filesystem_lock):
        # wait for file system and block it
        filesystem_lock.acquire()
        try:
            # while already have file with this name        
            while os.path.isfile(filename):
                # append '_' to the beginning of file name
                filename = os.path.dirname(filename) + "/_" + os.path.basename(filename)
            # open for binary writing
            with open(filename, 'wb') as f:
                # and save data
                f.write(data)
                f.close()
            print filename
        except:
            print "error saving", filename
        # release file system
        filesystem_lock.release()
        
    def main():
        # Bing search URL
        SERVICE_URL = "http://api.search.live.net/json.aspx"
        # request parameters dictionary (will append to SERVICE_URL) 
        params = {}
        params["appid"]         = "4EFC2F2CA1F9547B3C048B40C33A6A4FEF1FAF3B"
        params["sources"]       = "image"
        params["image.count"]   = 8
        params["image.offset"]  = 00

        # try to read command line parameters
        try:
            params["query"] = sys.argv[1]
            images_count = int(sys.argv[2])
            if len(sys.argv) > 3:
                params["image.filters"] = sys.argv[3] 
        # if have less than 2 parameters (IndexError) or
        # if second parameter cannot be cast to int (ValueError)
        except (IndexErrorValueError):
            # print usage string
            print "Bing image search tool"
            print "Usage: bing.py search_str images_count [filters]"
            # end exit
            return 1

        # make directory at current path
        dir_name = "./" + params["query"] + "/"
        if not os.path.isdir(dir_name):
            os.mkdir(dir_name)
            
        # list to store loading threads
        loaders = []
        # file system lock object
        filesystem_lock = threading.Lock()
        
        try:
        
            # loop for images count
            while(params["image.offset"] < images_count):
                
                # combine URL string, open it and parse with JSON
                response = json.load(urllib.urlopen(SERVICE_URL + "?%s" % urllib.urlencode(params)))
                # extract image section
                images_section = response["SearchResponse"]["Image"]
        
                # if current search offset greater or equal to returned total files  
                if "Total" not in images_section or params["image.offset"] >= images_section["Total"]:
                    # then break search loop
                    break
                
                # extract image results section 
                results = images_section["Results"]
                # loop for results
                for result in results:
                    # extract image URL
                    image_url = result["MediaUrl"]
                    # create new loading thread  
                    loader = threading.Thread(\
                        target = load_url,\
                        args=(\
                              image_url,\
                              dir_name + os.path.basename(str(image_url)),\
                              filesystem_lock))
                    # start loading thread
                    loader.start()
                    # and add it to loaders list
                    loaders.append(loader)
                    # advance search offset
                    params["image.offset"] += 1
                    # break if no more images needed
                    if params["image.offset"] >= images_count:
                        break;            
        
        # on all exceptions
        except:
            print "error occured"
            return 1
        
        # wait for all loading threads to complete 
        for loader in loaders:
            loader.join()

        # all done
        print "done"
        return 0;

    if __name__ == '__main__':
        status = main()
        sys.exit(status)

    Request Examples


    To refine the request, you can use Bing API filters , separated by a space.
    • bing.py apple 1000 - find 1000 pictures for “apple”.
    • bing.py "obama" 16 "size:large style:graphics face:face" - find 16 portraits of Obama, large size in the style of illustration.
    • bing.py "warhammer wallpaper" 16 "size:width:1280 size:height:1024" - find 16 wallpapers on the theme "warhammer", the size of 1280x1024

    Creating single-exe under win32


    To do this, you need py2exe, you can install it from here . Next, the setup.py file is created in the program folder with the following contents (the program in the bing.py file):
    from distutils.core import setup
    import py2exesysos

    sys.argv.append('py2exe')

    setup(
        console=['bing.py'],
        options = {'py2exe': {'bundle_files': 1}},
        zipfile = None,
    )

    And it is launched by the command "python setup.py". As a result of execution, the “compiled” program appears in the ./dist folder (the w9xpopen.exe file can be erased)
    Then it can be shipped with UPX (from 5182Kb it got to 4061Kb)

    What I would like to improve


    • Requests in Russian
    • General progress indicator for all files
    • Download progress for each file
    • Using time-out when trying to load an image (it seems to be minute by default)
    • Normal error handling

    PS


    Strange Habr-glitch.
    0
    It does not display anything.
    Also, links of the form
    http://api.google.com
    Displayed without http: //

    PPS


    Compiled exe for Win32 here .

    Also popular now: