mkevac February 28, 2017 at 18:01

The story of one thick binary

Transfer

enter image description here

Hey. My name is Marco (I'm a system programmer at Badoo). And I present to you the translation of a post on Go, which I found interesting. Go really scolded for thick binaries, but at the same time praised for the static linking and for the convenience of laying out a single file. If thick binaries aren’t a problem on modern servers, then on embedded systems it’s still. The author describes his story of dealing with them in Go.

Small file sizes are important for applications running under very limited resources. In this article, we will consider creating an agent program that should run on different low-power devices. Their memory and processor resources will be small, and I can not even predict how much.

Go binaries are small in size and self-sufficient: when you create a program on Go, you get a single binary file that contains everything you need. Compare with platforms such as Java, Node.js, Ruby and Python, where your code takes up only a small part of the application, and everything else is a bunch of dependencies, which also have to be packaged if you want to get a self-contained package.

Despite such an important convenience as the ability to create self-contained binaries, Go does not have built-in tools to help you estimate the size of dependencies, so that developers can make informed decisions about whether to include these dependencies in a file or not.

The tool gofatwill help you understand the size of dependencies in your Go-project.

Creating an IoT Agent

I’ll talk a little about how we thought through and created one of our services - an IoT agent that will be deployed on low-power devices around the world. And consider its architecture from an operational point of view.

Sample code can be downloaded from here: https://github.com/jondot/fattyproject

Firstly, we need good CLI ergonomics, so let's use it kingpin- this is a POSIX-compatible library of CLI flags and options (I like this library so much that I used it in many of my projects). But in fact, I will take advantage of my project go-cli-starter, which includes this library:

$ git clone https://github.com/jondot/go-cli-starter fattyproject
Cloning into 'fattyproject'...
remote: Counting objects: 55, done.
remote: Total 55 (delta 0), reused 0 (delta 0), pack-reused 55
Unpacking objects: 100% (55/55), done.

Since our program is an agent, then it should work constantly. As an example, for this we will use a cycle that endlessly performs a bullshit operation.

for {
    f := NewFarble(&Counter{})
    f.Bumple()
    time.Sleep(time.Second * 1)
}

During long-term operation, any junk is accumulated in memory - small memory leaks, forgotten open file descriptors. But even a tiny leak can turn into a giant one if the application has been running non-stop for years. Fortunately, Go has built-in metrics and a means to monitor system health expvars. This will help a lot when analyzing the agent’s inner kitchen: since it has to work non-stop for a long time, from time to time we will analyze its state - processor consumption, garbage collection cycles, and so on. All this will be done for us by a expvarstool that is very convenient for solving such problems expvarmon.

For use, expvarswe need a magic import. Magic - because during the import, a handler will be added to the existing HTTP server. To do this, we need a working HTTP server from net/http.

import (
    _ "expvar"
    "net/http"
    :
    :
go func() {
    http.ListenAndServe(":5160", nil)
}()

Since our program turns into a complex service, we can also add a logging library with support for levels to receive information about errors and warnings, and also to understand when the program is working normally. To do this, use zap (from Uber).

import(
    :
    "go.uber.org/zap"
    :
logger, _ := zap.NewProduction()
logger.Info("OK", zap.Int("ip", *ip))

A service that runs non-stop on a remote device that you do not control and most likely cannot be updated must be extremely stable. So it’s advisable to put flexibility in it. For example, so that it can execute custom commands and scripts, that is, provide a mechanism for changing the behavior of a service without redeploying or restarting it.

Add a tool to run an arbitrary remote script. Although this looks suspicious, but if it is your agent or service, then you can prepare the built-in runtime sandbox for running the code. Most often, runtime environments embed JavaScript and Lua.

We will use the otto embedded JS engine .

import(
    :
    "github.com/robertkrimen/otto"
    :
for {
    :
    vm.Run(`
        abc = 2 + 2;
        console.log("\nThe value of abc is " + abc); // 4
    `)
    :
}

If we assume that the content being transmitted to Runis received from outside, we received a complex and self-updating IoT agent!

Understanding Go binary dependencies

So, what have we come to.

$ ls -lha fattyproject
... 13M ... fattyproject*

We assume that we need all the added dependencies, but as a result, the size of the binary file is matched to 12 megabytes. Although this is slightly compared to other languages and platforms, however, taking into account the modest capabilities of IoT equipment, it would be advisable to reduce the file size and the cost of computing resources.

Let's find out how dependencies are added to our binary.

First, let's figure out a well-known binary. GraphicsMagick is a modern variation of the popular image processing system ImageMagick. You probably already have it installed. If not, then under OS X this can be done with brew install graphicsmagick.

otool- An alternative to the ldd tool , only under OS X. With it, we can analyze the binary file and find out which libraries it is linked to.

$ otool -L `which convert`
/usr/local/bin/convert:
    /usr/local/Cellar/imagemagick/6.9.3-0_2/lib/libMagickCore-6.Q16.2.dylib (compatibility version 3.0.0, current version 3.0.0)
    /usr/local/Cellar/imagemagick/6.9.3-0_2/lib/libMagickWand-6.Q16.2.dylib (compatibility version 3.0.0, current version 3.0.0)
    /usr/local/opt/freetype/lib/libfreetype.6.dylib (compatibility version 19.0.0, current version 19.3.0)
    /usr/local/opt/xz/lib/liblzma.5.dylib (compatibility version 8.0.0, current version 8.2.0)
    /usr/lib/libbz2.1.0.dylib (compatibility version 1.0.0, current version 1.0.5)
    /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.5)
    /usr/local/opt/libtool/lib/libltdl.7.dylib (compatibility version 11.0.0, current version 11.1.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1226.10.1)

From the list, you can also isolate the size of each dependency:

$ ls -lha /usr/l/.../-0_2/lib/libMagickCore-6.Q16.2.dylib
... 1.7M ... /usr/.../libMagickCore-6.Q16.2.dylib

Can we thus get a fairly complete picture of any binary file? Obviously, the answer is no.

By default, Go links dependencies statically. Thanks to this, we get the only self-contained binary file. But it also means that otool, like any other similar tool, it will be useless.

$ cat main.go
package main
func main() {
    print("hello")
}
$ go build && otool -L main
main:

If you still try to parse the Go binary on its dependencies, then we will have to use a tool that understands the format of these binary files. Let's look for something suitable.

To get a list of available tools, we will use go tool:

$ go tool
addr2line
api
asm
cgo
compile
cover
dist
doc
fix
link
nm
objdump
pack
pprof
trace
vet
yacc

We can immediately turn to the source codes of these tools . Take nm, for example, and see its package documentation . I deliberately mentioned this tool. As it turned out, the possibilities are nmvery close to what we need, but this is still not enough. It can display a list of characters and sizes of objects, but all this is useless if we try to get a general idea about the dependencies of a binary file.

$ go tool nm -sort size -size fattyproject | head -n 20
  5ee8a0    1960408 R runtime.eitablink
  5ee8a0    1960408 R runtime.symtab
  5ee8a0    1960408 R runtime.pclntab
  5ee8a0    1960408 R runtime.esymtab
  4421e0    1011800 R type.*
  4421e0    1011800 R runtime.types
  4421e0    1011800 R runtime.rodata
  551a80     543204 R go.func.*
  551a80     543204 R go.string.hdr.*
  12d160     246512 T github.com/robertkrimen/otto._newContext
  539238     100424 R go.string.*
  804760      65712 B runtime.trace
   cd1e0      23072 T net/http.init
  5e3b80      21766 R runtime.findfunctab
  1ae1a0      18720 T go.uber.org/zap.Any
  301510      18208 T unicode.init
  5e9088      17924 R runtime.typelink
  3b7fe0      16160 T crypto/sha512.block
  8008a0      16064 B runtime.semtable
  3f6d60      14640 T crypto/sha256.block

Although in relation to the dependencies themselves, the indicated dimensions (second column) can be exact, but in general we cannot just take and add these values.

Gofat

There was one last trick left that should work. When you compile your binary, Go generates intermediate files for each dependency before statically linking them to a single file.

I present to your attention gofata shell script that is a combination of Go code and some Unix tools. It analyzes dependency sizes in Go binaries:

#!/bin/sh
eval `go build -work -a 2>&1` && find $WORK -type f -name "*.a" | xargs -I{} du -hxs "{}" | gsort -rh | sed -e s:${WORK}/::g

If in a hurry, just copy or download this script and make it executable ( chmod +x). Then run the script without any arguments in the directory of your project to get information about its dependencies.

Let's deal with this command:

eval go build -work -a 2>&1

The -a flag tells Go to ignore the cache and build the project from scratch. In this case, all dependencies will be rebuilt forcibly. The –work flag displays the working directory so that we can analyze it (thanks to the Go developers!).

find $WORK -type f -name "*.a" | xargs -I{} du -hxs "{}" | gsort -rh

Then, with the help of the tool, findwe find all the files *.athat are our compiled dependencies. Then we pass all the lines (file locations) to xargs. This utility allows you to apply commands to each transmitted line - in our case du, which receives the file size.

Finally, we will use the gsort(GNU version of sort) to sort the file sizes in reverse order.

sed -e s:${WORK}/::g

We remove the WORK folder prefix from everywhere and display a cleared string with data on the dependency.

We turn to the most interesting: what is 12 MB in our binary file?

Lose weight

We’re launching gofatit for the first time in relation to our toy project with an IoT agent. We get the following data:

2.2M    github.com/robertkrimen/otto.a
1.8M    net/http.a
1.4M    runtime.a
960K    net.a
820K    reflect.a
788K    gopkg.in/alecthomas/kingpin.v2.a
668K    github.com/newrelic/go-agent.a
624K    github.com/newrelic/go-agent/internal.a
532K    crypto/tls.a
464K    encoding/gob.a
412K    math/big.a
392K    text/template.a
392K    go.uber.org/zap/zapcore.a
388K    github.com/alecthomas/template.a
352K    crypto/x509.a
344K    go/ast.a
340K    syscall.a
328K    encoding/json.a
320K    text/template/parse.a
312K    github.com/robertkrimen/otto/parser.a
312K    github.com/alecthomas/template/parse.a
288K    go.uber.org/zap.a
232K    time.a
224K    regexp/syntax.a
224K    regexp.a
224K    go/doc.a
216K    fmt.a
196K    unicode.a
192K    compress/flate.a
172K    github.com/robertkrimen/otto/ast.a
172K    crypto/elliptic.a
156K    encoding/asn1.a
152K    os.a
136K    strconv.a
128K    os/exec.a
128K    github.com/Sirupsen/logrus.a
128K    flag.a
112K    vendor/golang_org/x/net/http2/hpack.a
104K    strings.a
104K    net/textproto.a
104K    mime/multipart.a

If you experiment, you will notice that with gofatthe assembly time it increases significantly. The fact is that we start the assembly in a mode -ain which everything is rebuilt again.

Now we know how much space each addiction takes. Roll up the sleeves, analyze and take action.

1.8M    net/http.a

Everything related to HTTP processing is 1.8 MB. Perhaps you can throw it away. We give up expvar, instead, we will periodically dump critical parameters and information about the state of the program into the log file. If you do this often, then everything will be fine.

Update: With the release of Go 1.8 net / http began to weigh 2.2 MB.

788K    gopkg.in/alecthomas/kingpin.v2.a
388K    github.com/alecthomas/template.a

And this is a big surprise: about 1 MB is occupied by a very convenient POSIX feature for flag parsing. You can refuse it and use the package from the standard library, or even even end the flags and read the configuration from environment variables (and this will also take some amount).

Newrelic adds another 1.3 MB, so you can also drop it:

668K    github.com/newrelic/go-agent.a
624K    github.com/newrelic/go-agent/internal.a

`Throw Zap too. We use the standard package for logging:

392K go.uber.org/zap/zapcore.a

OttoBeing an embedded JS engine, it weighs a lot:

2.2M github.com/robertkrimen/otto.a 
312K github.com/robertkrimen/otto/parser.a 
172K github.com/robertkrimen/otto/ast.a

At the same time, it logrustakes up little space for such a multifunctional logging library:

128K    github.com/Sirupsen/logrus.a

You can leave it.

Conclusion

We found a way to calculate dependency sizes in Go and saved about 7 MB. And we decided that we would not use certain dependencies, but instead we take analogues from the standard Go library.

Moreover, I will say that if we try hard and experiment with a set of dependencies, then we can shrink our binary file from the original 12 MB to 1.2 MB.

It’s not necessary to do this, because the dependencies in Go are already small compared to other platforms. But you definitely need to have tools at hand to help you better understand what you are creating. And if you are developing software for environments with very limited available resources, then one of these tools might be gofat.

PS: if you want to experiment more, here is the reference repository: https://github.com/jondot/fattyproject .

Tags: