Golang, PHP, Movie Search and Telegraph - What Unites Them?

Periodically, so as not to become covered with dust, I try to create interesting things that could make someone's life easier. I strive to ensure that they are more useful than a social network for cats. One recent example is the Telegram bot, which allows you to find known Wi-Fi points in the specified coordinates and see the passwords for them.

This time was no exception and I decided to create a bot that would allow me to watch my favorite movies and TV shows with the greatest comfort and minimum effort, and even provide content in several versions of voice acting. No sooner said than done. And now, when a person’s iron friend joyfully gives out their favorite shows to users, I’d like to talk about what accompanied the creation of the bot, what problems got in my way and how they were resolved. In the first chapter, I will talk about Go through the eyes of a PHP developer, in the second chapter about finding Zen for parsing Kinopoisk, and in the third about an undocumented feature Telegraph.

image

1. $ alexander-> useLanguage (GOLANG);


My name is Alexander, I am 21 years old. I am engaged in web development and most often write in PHP.

Clap clap
image

I can’t say that PHP is a dream language. He, like any other language, has strengths and weaknesses. However, I began to notice that I'm getting tired of PHP - development in this language is gradually boring, its children’s sores like similar functions that take similar arguments, but in a different order, not always predictable behavior and, of course, weak typing. So for the next product, I decided to use Golang. At the time I started, I knew this about him:

  • Strong typing
  • Not a lot of keywords
  • Goroutines are convenient parallelism out of the box
  • They say that the language is simple and predictable.

Also, I once flipped through a Golang-book from boredom. At first everything was very unusual ... Well, the first 3-5 hours. Yes, entering the language is very simple. The lack of magic and the abundance of keywords, as well as predictable behavior do the trick - if you are already familiar with any programming language, for sure immersion in Go will not take much time. There is an important remark: If you write single-page pages for three years, and the experience ends there, I take my words back. Predictability of the language and strict typing allow you to write a very large amount of code without compiling the binary for launch and checks. Of course, there are runtime errors, but after PHP it’s a breath of fresh air - you see, I made a mistake myself, and the error is not obvious.

With the organization of the code in Golang, everything is simple: “Here is a directory for you, at the same time it is a namespace, by the way. Keep everything here. ” And ... It works. It is so easy to develop and support that the tears of happiness welling themselves. To be honest, I don’t know how big a project can be created with this approach. I looked in the repositories of several large libraries - it looks sane, but I can’t talk about support. Subjectively, the same size code base in PHP is more difficult to maintain than in Go.

In fairness, convenient and obvious work with arrays (slices) is not about Go:

//...
s.KeyBoard = [][]string{}
s.KeyBoard = append(s.KeyBoard, []string{})
s.KeyBoard[0] = append(s.KeyBoard[0], s.text.GetAction(locale, "view-prev"))
//...

From the point of view of Golang, everything looks logical, but from a human point of view it is slightly strange. This topic is described in more detail here .

Also, for parallel work in Golang goroutines (threads) are used, while in PHP it is customary to use forks (processes). There are not many places in my project where I could apply goroutines. However, where they are used, it looks so logical and simple that you don’t feel like going back to forks. Since forks are independent processes, they usually use a third party to communicate with each other: Redis or Memcache. A similar problem in Golang is solved with the help of channels - the part of the language that is accessible out of the box. Just think about it! Parallel operation out of the box, and even with synchronization support. I haven’t even dreamed of this before. I don’t think that I demand too much from PHP, because the tasks of parallel work in modern backend development are commonplace. Also, I do not want to say

2. Alexander.NeedInfo ()


At some point, the API that I used to retrieve movie information from Kinopoisk ended.

And, apparently, forever.
image

It was decided to write Kinopoisk’s own parser (guys from the Kinopoisk team, don’t throw slippers, better make a public API).

v1 - Lone Hero


The first implementation was simple and forehead - a lone PHP script settled in the project, which, when accessing it, fetched the address of a random proxy server from the queue and sent through it to Kinopoisk for the movie. The parsing itself took place also in PHP. Due to the fact that the lone hero did not use cookies, Kinopoisk banned (began to show captcha) every address after a single request, and yet not all proxies were fast.

It would seem to implement support for cookies, and the deal with the end. However, I noticed that even with cookie support, Kinopoisk shows captcha to my parser more often than it does to me in the browser. I decided not to investigate the protection of Kinopoisk from parsing in more detail, as I realized that this was starting to smell like js-code execution on the client.

v2 - Full client


The next version of the parser was a web server on Go, which, on a GET request, launches PhantomJS with the necessary parameters and the transmitted movie ID. It worked. I no longer needed a proxy server, I went to Kinopoisk directly from my IP. I had session support, a full browser and, in general, everything was convenient. But it was very slow. PhantomJS honestly waited for all the statics to load and execute all the necessary JS code. Besides the fact that it was slow, it was very expensive in terms of resources. It took 100-150mb RAM to parse one page. The reason for the shot in the head of this version was the voracity of PhantomJS and its unstable work - for example, its processes did not always end, remaining hanging in running ones and not freeing memory after itself. I tried different versions of PhantomJS,

v99


In the process of searching for the Holy Grail for parsing Kinopoisk, I lost track of how many versions of different parsers and their modifications I managed to create. As a result, I called the next version ninety-ninth. The ninety-ninth version was written in PHP. I used Guzzle (HTTP client for PHP), supported sessions and tried to be as similar as possible to the user's browser. I refused support for JS. CAPTCHAs, of course, are shown, but much less often than in the first version of the parser and, in principle, this option can be called comfortable. On this version, I stopped.

Also, I know that at the request of Kinopoisk can provide access to its API, but I did not consider this option: even if I had access, it could become a potential point of failure, because access can be taken away at any time.

3. Video.Publish ()


After the war with Kinopoisk, I found myself in a situation where I was ready to give the user a link to the video, and there was nowhere to play it: Telegram Bot API does not provide convenient functionality for showing video by link, but register a domain, host something other than a bot with a parser and I absolutely did not want to engage in the development of the front.

What to do?


We will publish the video somewhere. After a little thought, I decided that the Telegraph could completely pass for "somewhere." A site that is de facto used to publish articles from Telegram? What you need! One trouble - you can’t post videos by reference (except YouTube or Vimeo).

And if you look?


Looking at how easily and dynamically blocks are created on a page, and by pressing only one button an article is published, one involuntarily wonders: How does it work? Especially if you are looking for a place to publish content. I decided to find out.

And what did I see?
[{
    "tag": "p",
    "children": ["Story"]
  }, {
    "tag": "p",
    "children": [{
        "tag": "br"
      }
    ]
  }, {
    "tag": "figure",
    "children": [{
        "tag": "div",
        "attrs": {
          "class": "figure_wrapper"
        },
        "children": [{
            "tag": "img",
            "attrs": {
              "src": "/file/a2e8087fbc53679c14fa1.jpg"
            }
          }
        ]
      }, {
        "tag": "figcaption",
        "children": ["Pff"]
      }
    ]
  }, {
    "tag": "p",
    "children": [{
        "tag": "br"
      }
    ]
  }
]

A post POST request contains JSON, which is suspiciously similar to HTML markup. And let's try to add a video tag, according to the structure that we have? Come on. A little patience and get this ...

Structure
[{
    "tag": "p",
    "children": ["Story"]
  }, {
    "tag": "p",
    "children": [{
        "tag": "br"
      }
    ]
  }, {
    "tag": "figure",
    "children": [{
        "tag": "div",
        "attrs": {
          "class": "figure_wrapper"
        },
        "children": [{
            "tag": "img",
            "attrs": {
              "src": "/file/a2e8087fbc53679c14fa1.jpg"
            }
          }
        ]
      }, {
        "tag": "figcaption",
        "children": ["Pff"]
      }
    ]
  }, {
    "tag": "p",
    "children": [{
        "tag": "video",
        "attrs": {
          "src": "https://www.w3schools.com/html/mov_bbb.mp4"
        }
      }
    ]
  }
]

If you execute a POST-request for editing with the above structure, then an arbitrary video will be added to the publication by reference . What you need.

It wasn’t there


Everything works and there is no problem. The trouble is that most of the attributes are not supported, which means that you can forget about subtitles, or, for example, a poster for a video. That is, the solution came out from the category of "say thank you, what is there at all." Without thinking twice, I decided to use XSS in order to be able to configure the player. Probably, somewhere in this place the normal development ends, but there was nowhere to retreat: it was necessary to organize the publication of the video. I tried different options for embedding third-party code in the page, even through the picturebut everything was in vain and the Telegraph heroically survived. However, I am not an expert in the field of information security. Perhaps if I spent more time, I would find a working version of XSS for Telegraph, which I would use exclusively to customize the player, however, I left this venture. I tried a few more sites to publish my content, but everywhere something was missing or something did not work. Thus, I still implemented a video player on my side ...

However, this is a completely different story.
image


PS If Telegraph developers read this article: Please add the publication of the video by reference to the interface, since such functionality is available.

Also popular now: