Microservice on GO for grabbing video from tweets
Good afternoon, Khabravchians! Article for beginners, some kind of new ideas you will not see here. And this functionality, most likely, was implemented dozens of times in various languages. The idea is that if you get a link to a post on Twitter that contains a video, pick up this video and convert it to mkv.
To business!
What we need:
- GO
- ffmpeg
- docker (although it is possible without it. However, where without it these days?!;)
We have a link to a tweet from the video:
https://twitter.com/FunnyVines/status/1101196533830041600
Of the entire link, we are only interested in the ID consisting of numbers, so we pull out the entire digital substring with an elementary regularity:
var reurl = regexp.MustCompile(`\/(\d*)$`)
// тут еще какой-то код
e.GET("/*video", func(c *fasthttp.RequestCtx) {
// регэкспим то что мы получили
url := reurl.FindSubmatch([]byte(c.UserValue("video").(string)))
With the received ID, go to the address:
resp, err := client.Get("https://twitter.com/i/videos/tweet/" + id)
Where do we get the link to the JS code of the video player:
src="https://abs.twimg.com/web-video-player/TwitterVideoPlayerIframe.f52b5b572446290e.js"
From this js file we need one very important thing - bearer for authorization in twitter api.
Regex'pee him!
re, _ := regexp.Compile(`(?m)authorization:\"Bearer (.*)\",\"x-csrf`)
This is not enough to access the api, you still need guest_token. It can be obtained by applying a POST request to the address - " https://api.twitter.com/1.1/guest/activate.json ", passing there: personalization_id and guest_id from the cookie (which we received in the response from the server when accessing the previous one URL):
var personalization_id, guest_id string
cookies := resp.Cookies()
for _, cookie := range cookies {
if cookie.Name == "personalization_id" {
personalization_id = cookie.Value
}
if cookie.Name == "guest_id" {
guest_id = cookie.Value
}
}
// // Get Activation
url, _ := url.Parse("https://api.twitter.com/1.1/guest/activate.json")
request := &http.Request{
Method: "POST",
URL: url,
Header: http.Header{
"user-agent": []string{"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"},
"accept-encoding": []string{"gzip", "deflate", "br"},
"authorization": []string{"Bearer " + bearer},
"cookie": []string{"personalization_id=\"" + personalization_id + "\"; guest_id=" + guest_id}
},
}
resp, err = client.Do(request)
It changes periodically, I did not understand how often to call it (rather, it changes in timer - a 15-minute interval), but it seems like regular activation of /1.1/guest/activate.json allows you to bypass the api limit for 300 requests.
The answer is gzip, it can be unpacked in Go, something like this:
res, err := gzip.NewReader(resp.Body)
if err != nil {
return "", err
}
defer res.Close()
r, err := ioutil.ReadAll(res)
Well, that’s it! Now we have everything we need to call the API:
url, _ = url.Parse("https://api.twitter.com/1.1/videos/tweet/config/" + id + ".json")
request = &http.Request{
Method: "GET",
URL: url,
Header: http.Header{
"user-agent": []string{"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"},
"accept-encoding": []string{"gzip", "deflate", "br"}, "origin": []string{"https://twitter.com"},
"x-guest-token": []string{gt.GuestToken},
"referer": []string{"https://twitter.com/i/videos/tweet/" + id},
"authorization": []string{"Bearer " + bearer}},
}
resp, err = client.Do(request)
The response from the API will be Json with a description of the video, and most importantly, the URL to receive it (playbackUrl):
{"contentType":"media_entity","publisherId":"4888096512","contentId":"1096941371649347584","durationMs":11201,"playbackUrl":"https:\/\/video.twimg.com\/ext_tw_video\/1096941371649347584\/pu\/pl\/xcBvPmwAmKckck-F.m3u8?tag=6","playbackType"
And finally, we have the video address, we send it to ffmpeg, while checking in which video format I saw 2 possible formats, the first is mp4:
if strings.Contains(videoURL.Track.PlaybackURL, ".mp4") {
convert := exec.Command("ffmpeg", "-i", videoURL.Track.PlaybackURL, "-c", "copy", "./videos/"+id+".mkv")
convert.Stdout = os.Stdout
convert.Stderr = os.Stderr
if convert.Run() != nil {
return "", err
}
return id, nil
}
And the second one is the m3u8 playlist file, for this option we need one more step - we get
it with a GET and take the content URL in the required resolution:
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=256000,RESOLUTION=180x316,CODECS="mp4a.40.2,avc1.4d0015"
/ext_tw_video/1039516210948333568/pu/pl/180x316/x0HWMgnbSJ9y6NFL.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=832000,RESOLUTION=464x816,CODECS="mp4a.40.2,avc1.4d001f"
/ext_tw_video/1039516210948333568/pu/pl/464x816/Z58__ptq1xBk8CIV.m3u8
And still - ffmpeg.
And now, a little about the HTTP server
I used:
The logic is as follows, we start the server:
cfg := tcplisten.Config{
ReusePort: true,
FastOpen: true,
DeferAccept: true,
Backlog: 1024,
}
ln, err := cfg.NewListener("tcp4", ":8080")
if err != nil {
log.Fatalf("error in reuseport listener: %s\n", err)
}
serv := fasthttp.Server{Handler: e.Handler, ReduceMemoryUsage: false, Name: "highload", Concurrency: 2 * 1024, DisableHeaderNamesNormalizing: true}
if err := serv.Serve(ln); err != nil {
log.Fatalf("error in fasthttp Server: %s", err)
}
And we process only one route / * video:
// сюда прокатит:
// это http://localhost:8080/https://twitter.com/FunnyVines/status/1101196533830041600
// и это http://localhost:8080/1101196533830041600
e.GET("/*video", func(c *fasthttp.RequestCtx) {
How can all this be collected?
For example, a simple Makefile (make build, make run ...):
build:
go build -o main
docker build -t tvideo .
run:
go build -o main
docker build -t tvideo .
docker kill tvideo
docker run -d --rm --name tvideo -v /etc/ssl:/etc/ssl:ro -v videos:/opt/videos -p 8080:8080 tvideo
docker logs -f tvideo
Pay attention to the flag "-v / etc / ssl: / etc / ssl: ro", in the base image ubuntu there were no root certificates and the http client did not recognize https twitter, threw it from the host machine through --volume (now, like how, it is more correct to use --mount ).
Dockerfile
FROM ubuntu
// кладем бинарник приложения в docker image
COPY main /opt/app
RUN apt-get update && \
// устанавливаем вундервафлю
apt-get install -y ffmpeg && \
chmod +x /opt/app
EXPOSE 8080
WORKDIR /opt
CMD ./app
Undoubtedly, I did not discover America in this article, but suddenly it will come in handy for someone.
Sources are available here .