kricha January 29, 2018 at 19:33

Laziness time instagram

Foreword

Now many people use Instagram (hereinafter referred to as insta): someone collects albums there, someone sells, someone buys, and I'm lazy there. I was always interested in how my friends, classmates, colleagues and insta were doing there. He wanted to find out what's new there - he went in, leafed through the tape, saw everything that interested him left ... BUT! For some reason, I always needed to like every post (I can’t explain why, but such things). And just imagine, I haven’t gone there for a week, you’re sitting, you like the weekly pool, and when you have 200+ subscriptions, it’s all hell.

Active actions

As a result, like any normal person, I became too lazy to like everything and I scored. Everything seemed to be fine, I stopped spending a lot of time on useless likes, but my conscience ate. I realized that subscribers feel bad without my royal like, they are sad and blah blah blah ... In general, it was decided that we need to write something simple and easy that can solve the problem of indignation, and maybe help someone else. I’ve heard a lot from friends about python and how to cool test applications using selenium or use it as a crawler. It was decided to use python and selenium in conjunction with phantom js, all this was new to me, because before that I was not familiar with these technologies at all.

Why Selenium and phantom?

Everything is very simple here. The client side of instagram is written in react, therefore, any data can be pulled there only after the page is rendered. Because selenium just serves to automate actions in the browser, and phantom js helps to do it all without any display, it was decided to use them. Looking ahead, I’ll say that I decided to abandon phantom js because it is rather slow, and chrome has the headless option, which made it possible to use it as a “headless” browser.

Why python?

I heard a lot and read that this language is great for working with big data, from here I concluded that it’s convenient to work with any data at all (parsing, sorting, comparing, formatting, etc.), I also read somewhere that it’s convenient and quick to write your own mini-libraries for him (and this is what the bot needs to make it as universal as possible). After weighing everything, I decided to stop at python3 (before that, part of the project had already been written with the ability to run on python2 and python3).

Library development for bot

It’s silly to describe the whole process, so let’s dwell on the most interesting points:

Login

Since the bot is a repetition of a large number of the same actions for which you need to be authorized, you had to come up with something with this process. Each time, logging in through the form is very suspicious, it was decided to try to pull cookies and use them for authorization.

It turned out that instagram is simple with this (but mail ru gave me a wild headache):

import pickle
import time
import tempfile
import os
import selenium.common.exceptions as excp
def auth_with_cookies(browser, logger, login, cookie_path=tempfile.gettempdir()):
    """
    Authenticate to instagram.com with cookies
    :param browser: WebDriver
    :param logger:
    :param login:
    :param cookie_path:
    :return:
    """
    logger.save_screen_shot(browser, 'login.png')
    try:
        logger.log('Trying to auth with cookies.')
        cookies = pickle.load(open(os.path.join(cookie_path, login + '.pkl'), "rb"))
        for cookie in cookies:
            browser.add_cookie(cookie)
        browser.refresh()
        if check_if_user_authenticated(browser):
            logger.log("Successful authorization with cookies.")
            return True
    except:
        pass
    logger.log("Unsuccessful authorization with cookies.")
    return False
def auth_with_credentials(browser, logger, login, password, cookie_path=tempfile.gettempdir()):
    logger.log('Trying to auth with credentials.')
    login_field = browser.find_element_by_name("username")
    login_field.clear()
    logger.log("--->AuthWithCreds: filling username.")
    login_field.send_keys(login)
    password_field = browser.find_element_by_name("password")
    password_field.clear()
    logger.log("--->AuthWithCreds: filling password.")
    password_field.send_keys(password)
    submit = browser.find_element_by_css_selector("form button")
    logger.log("--->AuthWithCreds: submitting login form.")
    submit.submit()
    time.sleep(3)
    logger.log("--->AuthWithCreds: saving cookies.")
    pickle.dump([browser.get_cookie('sessionid')], open(os.path.join(cookie_path, login + '.pkl'), "wb"))
    if check_if_user_authenticated(browser):
        logger.log("Successful authorization with credentials.")
        return True
    logger.log("Unsuccessful authorization with credentials.")
    return False
def check_if_user_authenticated(browser):
    try:
        browser.find_element_by_css_selector(".coreSpriteDesktopNavProfile")
        return True
    except excp.NoSuchElementException:
        return False

If authentication by cookies is unsuccessful, we authorize with a login / password, save the cookie and use it in the future, the standard scheme.

#TODO: никак не дойдут руки до проверки возраста куки

News feeds

Because First of all, I wrote this for myself, I was wondering that I always had a news feed ejected. Initially, everything was simple, scrolling from the top to the last processed post, the web elements of the posts are entered into the array, the back one is turned on and everything is like on the return path laid out through the web elements of the posts that lie in the previously created array. I was happy that everything works exactly the way I wanted, but after about two months “the moon was a capricorn” and my bot stupidly stopped working. I checked everything as I could, on different web drivers, visually nothing has changed, but nothing works. In general, I killed in search of a problem for about three days. Everything turned out to be very simple: earlier, when the bot passed through scrolled posts, it took their objects from the array, I scrolled to the post (imitating the actions of a person) found the “like” button there, pressed it and went on; Now, the Instagram decided to store in the html markup only ~ 9 posts of which the fifth in the structure is active for the user, the previous 4 and next 4, and all the rest from html were simply deleted. I had to solve the problem by collecting those posts that need to be like in the array by their link, then when scrolling up (stupidly up), look for the current post in the earlier assembled array and if there is one, like.

That addiction ..

for post in progress:
            real_time_posts = br.find_elements_by_tag_name('article')
            post_link = post.get('pl')
            filtered_posts = [p for p in real_time_posts if self._get_feed_post_link(p) == post_link]
            if filtered_posts.__len__():
                real_post = filtered_posts.pop()
                # scroll to real post in markup
                heart = real_post.find_element_by_css_selector('div:nth-child(3) section a:first-child')
                self.browser.execute_script("return arguments[0].scrollIntoView(false);", heart)
                # getting need to process elements
                author = real_post.find_element_by_css_selector('div:first-child .notranslate').text
                heart_classes = heart.find_element_by_css_selector('span').get_attribute('class')
                # check restrictions
                is_not_liked = 'coreSpriteHeartOpen' in heart_classes
                is_mine = author == login
                need_to_exclude = author in exclude
                if is_mine or not is_not_liked:
                    self.post_skipped += 1
                    pass
                elif need_to_exclude:
                    self.post_skipped_excluded += 1
                    pass
                else:
                    # like this post
                    time.sleep(.3)
                    heart.click()
                    time.sleep(.7)
                    self.db.likes_increment()
                    self.post_liked += 1
                    log = '---> liked @{} post {}'.format(author, post_link)
                    self.logger.log_to_file(log)

VICTORY!

Action limits

In order not to attract a lot of attention, you need to set some restrictions to the bot. To adhere to these restrictions, you need to save the counters of the actions performed somewhere. Sqlite was chosen for the repository of all internal information - quickly, conveniently, locally. Directly in the library, I wrote a small module for working in the database, I also added migrations to it for future releases. Every Like / Follow is saved in the database with the hour in which it is made, then the likes / followings per day / current hour are counted, based on this data it is decided whether someone else can like or follow. Limits are still rigidly registered in the library, you will need to make them configurable.

Branch during development

While the library for the bot was being written, the question of numbers appeared in my head. It became interesting how many user likes, views, comments in the context of the post or summarized. To satisfy the interest, a small library class was written which, through a private api instagram, collected all available statistics (without authorization) and issued them to the user:

Hidden text

+-- https://instagram.com/al_kricha/ --------------------------+
|   counter                    |             value             |
+------------------------------+-------------------------------+
|   followed                   |              402              |
|   posts                      |              397              |
|   comments                   |             1602              |
|   likes                      |             20429             |
|   following                  |              211              |
|   video views                |             6138              |
|                                                              |
+--------- https://github.com/aLkRicha/insta_browser ----------+
+--------------------------------------------------------------+
|                       top liked posts                        |
+--------------------------------------------------------------+
|       https://instagram.com/p/BVIUvMkj1RV/ - 139 likes       |
|       https://instagram.com/p/BTzJ38-DkUT/ - 132 likes       |
|       https://instagram.com/p/BI8rgr-gXKg/ - 129 likes       |
|       https://instagram.com/p/BW-I6o6DBjm/ - 119 likes       |
|       https://instagram.com/p/BM4_XSoFhck/ - 118 likes       |
|       https://instagram.com/p/BJVm3KIA-Vj/ - 117 likes       |
|       https://instagram.com/p/BIhuQaCgRxI/ - 113 likes       |
|       https://instagram.com/p/BM6XgB2l_r7/ - 112 likes       |
|       https://instagram.com/p/BMHiRNUlHvh/ - 112 likes       |
|       https://instagram.com/p/BLmMEwjlElP/ - 111 likes       |
+--------------------------------------------------------------+

Having such data, my friend ( txwkx ) decided to visualize them and created instameter.me - a small service where you can see the "summary" of any open instagram account.

Example

What can a bot do?

Today, the bot is not able to do as much as desired, but nevertheless, it performs the key actions:
- He likes the news feed to the last one that isn’t like.
- Like tag for the specified number of posts
- Likes the location on the specified number of posts
- Auto-follow people from location / tag posts when you turn on the settings, but not all in a row, but only those that could potentially become subscribers
- Collects user statistics
- Keeps statistics on hours of completed actions
What would you like to do in the future?
- Writing ± meaningful comments
- Unsubscribe from unnecessary accounts
- Like several posts of a newly followed person
- Rewrite news feed algorithm
- Compare multiple accounts

Conclusion

There is still much to be done, optimized, rewritten. You can always use the tool effectively for other purposes. Laziness is definitely the engine of progress. I hope my bot will help someone in work or in a hobby. A repository with a pypi package can help a novice automation tool. A repository with examples can be useful for SMMs. Thank you all for your attention.

References

insta_browser - my mini library, the heart of the bot
insta_bot - examples repository, bot itself (in this form I use it)
instameter - project for the removal statistics on instagram-account

Tags: