Microservice on GO for grabbing video from tweets

    Good afternoon, Khabravchians! Article for beginners, some kind of new ideas you will not see here. And this functionality, most likely, was implemented dozens of times in various languages. The idea is that if you get a link to a post on Twitter that contains a video, pick up this video and convert it to mkv.


    To business!


    What we need:


    • GO
    • ffmpeg
    • docker (although it is possible without it. However, where without it these days?!;)

    We have a link to a tweet from the video:


    https://twitter.com/FunnyVines/status/1101196533830041600

    Of the entire link, we are only interested in the ID consisting of numbers, so we pull out the entire digital substring with an elementary regularity:


    var reurl = regexp.MustCompile(`\/(\d*)$`)
    // тут еще какой-то код
    e.GET("/*video", func(c *fasthttp.RequestCtx) {
      // регэкспим то что мы получили
      url := reurl.FindSubmatch([]byte(c.UserValue("video").(string)))

    With the received ID, go to the address:


    resp, err := client.Get("https://twitter.com/i/videos/tweet/" + id)

    Where do we get the link to the JS code of the video player:


    src="https://abs.twimg.com/web-video-player/TwitterVideoPlayerIframe.f52b5b572446290e.js"
    From this js file we need one very important thing - bearer for authorization in twitter api.


    Regex'pee him!


    re, _ := regexp.Compile(`(?m)authorization:\"Bearer (.*)\",\"x-csrf`)

    This is not enough to access the api, you still need guest_token. It can be obtained by applying a POST request to the address - " https://api.twitter.com/1.1/guest/activate.json ", passing there: personalization_id and guest_id from the cookie (which we received in the response from the server when accessing the previous one URL):


        var personalization_id, guest_id string
        cookies := resp.Cookies()
        for _, cookie := range cookies {
            if cookie.Name == "personalization_id" {
                personalization_id = cookie.Value
            }
            if cookie.Name == "guest_id" {
                guest_id = cookie.Value
            }
        }
        // // Get Activation
        url, _ := url.Parse("https://api.twitter.com/1.1/guest/activate.json")
        request := &http.Request{
            Method: "POST",
            URL:    url,
            Header: http.Header{
                "user-agent": []string{"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"},
                 "accept-encoding": []string{"gzip", "deflate", "br"}, 
                 "authorization": []string{"Bearer " + bearer},
                 "cookie": []string{"personalization_id=\"" + personalization_id + "\"; guest_id=" + guest_id}
            },
        }
        resp, err = client.Do(request)

    guest_token

    It changes periodically, I did not understand how often to call it (rather, it changes in timer - a 15-minute interval), but it seems like regular activation of /1.1/guest/activate.json allows you to bypass the api limit for 300 requests.


    The answer is gzip, it can be unpacked in Go, something like this:


    res, err := gzip.NewReader(resp.Body)
        if err != nil {
            return "", err
        }
        defer res.Close()
        r, err := ioutil.ReadAll(res)

    Well, that’s it! Now we have everything we need to call the API:


        url, _ = url.Parse("https://api.twitter.com/1.1/videos/tweet/config/" + id + ".json")
        request = &http.Request{
            Method: "GET",
            URL:    url,
            Header: http.Header{
                "user-agent": []string{"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"},
                "accept-encoding": []string{"gzip", "deflate", "br"}, "origin": []string{"https://twitter.com"},
                "x-guest-token": []string{gt.GuestToken}, 
                "referer": []string{"https://twitter.com/i/videos/tweet/" + id},
                "authorization": []string{"Bearer " + bearer}},
        }
        resp, err = client.Do(request)

    The response from the API will be Json with a description of the video, and most importantly, the URL to receive it (playbackUrl):


    {"contentType":"media_entity","publisherId":"4888096512","contentId":"1096941371649347584","durationMs":11201,"playbackUrl":"https:\/\/video.twimg.com\/ext_tw_video\/1096941371649347584\/pu\/pl\/xcBvPmwAmKckck-F.m3u8?tag=6","playbackType"

    And finally, we have the video address, we send it to ffmpeg, while checking in which video format I saw 2 possible formats, the first is mp4:


    if strings.Contains(videoURL.Track.PlaybackURL, ".mp4") {
            convert := exec.Command("ffmpeg", "-i", videoURL.Track.PlaybackURL, "-c", "copy", "./videos/"+id+".mkv")
            convert.Stdout = os.Stdout
            convert.Stderr = os.Stderr
            if convert.Run() != nil {
                return "", err
            }
            return id, nil
        }

    And the second one is the m3u8 playlist file, for this option we need one more step - we get
    it with a GET and take the content URL in the required resolution:


    #EXT-X-INDEPENDENT-SEGMENTS
    #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=256000,RESOLUTION=180x316,CODECS="mp4a.40.2,avc1.4d0015"
    /ext_tw_video/1039516210948333568/pu/pl/180x316/x0HWMgnbSJ9y6NFL.m3u8
    #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=832000,RESOLUTION=464x816,CODECS="mp4a.40.2,avc1.4d001f"
    /ext_tw_video/1039516210948333568/pu/pl/464x816/Z58__ptq1xBk8CIV.m3u8

    And still - ffmpeg.


    And now, a little about the HTTP server


    I used:



    The logic is as follows, we start the server:


    cfg := tcplisten.Config{
            ReusePort:   true,
            FastOpen:    true,
            DeferAccept: true,
            Backlog:     1024,
        }
        ln, err := cfg.NewListener("tcp4", ":8080")
        if err != nil {
            log.Fatalf("error in reuseport listener: %s\n", err)
        }
        serv := fasthttp.Server{Handler: e.Handler, ReduceMemoryUsage: false, Name: "highload", Concurrency: 2 * 1024, DisableHeaderNamesNormalizing: true}
        if err := serv.Serve(ln); err != nil {
            log.Fatalf("error in fasthttp Server: %s", err)
        }

    And we process only one route / * video:


    // сюда прокатит:
    // это http://localhost:8080/https://twitter.com/FunnyVines/status/1101196533830041600
    // и это http://localhost:8080/1101196533830041600
    e.GET("/*video", func(c *fasthttp.RequestCtx) {

    How can all this be collected?


    For example, a simple Makefile (make build, make run ...):


    build:
        go build -o main 
        docker build -t tvideo .
    run:
        go build -o main 
        docker build -t tvideo .
        docker kill tvideo
        docker run -d --rm --name tvideo -v /etc/ssl:/etc/ssl:ro -v videos:/opt/videos -p 8080:8080 tvideo
        docker logs -f tvideo

    Pay attention to the flag "-v / etc / ssl: / etc / ssl: ro", in the base image ubuntu there were no root certificates and the http client did not recognize https twitter, threw it from the host machine through --volume (now, like how, it is more correct to use --mount ).


    Dockerfile


    FROM ubuntu
    // кладем бинарник приложения в docker image
    COPY main /opt/app
    RUN apt-get update && \
        // устанавливаем вундервафлю
        apt-get install -y ffmpeg && \ 
        chmod +x /opt/app
    EXPOSE 8080
    WORKDIR /opt
    CMD ./app

    Undoubtedly, I did not discover America in this article, but suddenly it will come in handy for someone.


    Sources are available here .


    Also popular now: