
The story of one thick binary
- Transfer
Hey. My name is Marco (I'm a system programmer at Badoo). And I present to you the translation of a post on Go, which I found interesting. Go really scolded for thick binaries, but at the same time praised for the static linking and for the convenience of laying out a single file. If thick binaries aren’t a problem on modern servers, then on embedded systems it’s still. The author describes his story of dealing with them in Go.
Small file sizes are important for applications running under very limited resources. In this article, we will consider creating an agent program that should run on different low-power devices. Their memory and processor resources will be small, and I can not even predict how much.
Go binaries are small in size and self-sufficient: when you create a program on Go, you get a single binary file that contains everything you need. Compare with platforms such as Java, Node.js, Ruby and Python, where your code takes up only a small part of the application, and everything else is a bunch of dependencies, which also have to be packaged if you want to get a self-contained package.
Despite such an important convenience as the ability to create self-contained binaries, Go does not have built-in tools to help you estimate the size of dependencies, so that developers can make informed decisions about whether to include these dependencies in a file or not.
The tool gofat
will help you understand the size of dependencies in your Go-project.
Creating an IoT Agent
I’ll talk a little about how we thought through and created one of our services - an IoT agent that will be deployed on low-power devices around the world. And consider its architecture from an operational point of view.
Sample code can be downloaded from here: https://github.com/jondot/fattyproject
Firstly, we need good CLI ergonomics, so let's use it kingpin
- this is a POSIX-compatible library of CLI flags and options (I like this library so much that I used it in many of my projects). But in fact, I will take advantage of my project go-cli-starter
, which includes this library:
$ git clone https://github.com/jondot/go-cli-starter fattyproject
Cloning into 'fattyproject'...
remote: Counting objects: 55, done.
remote: Total 55 (delta 0), reused 0 (delta 0), pack-reused 55
Unpacking objects: 100% (55/55), done.
Since our program is an agent, then it should work constantly. As an example, for this we will use a cycle that endlessly performs a bullshit operation.
for {
f := NewFarble(&Counter{})
f.Bumple()
time.Sleep(time.Second * 1)
}
During long-term operation, any junk is accumulated in memory - small memory leaks, forgotten open file descriptors. But even a tiny leak can turn into a giant one if the application has been running non-stop for years. Fortunately, Go has built-in metrics and a means to monitor system health expvars
. This will help a lot when analyzing the agent’s inner kitchen: since it has to work non-stop for a long time, from time to time we will analyze its state - processor consumption, garbage collection cycles, and so on. All this will be done for us by a expvars
tool that is very convenient for solving such problems expvarmon
.
For use, expvars
we need a magic import. Magic - because during the import, a handler will be added to the existing HTTP server. To do this, we need a working HTTP server from net/http
.
import (
_ "expvar"
"net/http"
:
:
go func() {
http.ListenAndServe(":5160", nil)
}()
Since our program turns into a complex service, we can also add a logging library with support for levels to receive information about errors and warnings, and also to understand when the program is working normally. To do this, use zap (from Uber).
import(
:
"go.uber.org/zap"
:
logger, _ := zap.NewProduction()
logger.Info("OK", zap.Int("ip", *ip))
A service that runs non-stop on a remote device that you do not control and most likely cannot be updated must be extremely stable. So it’s advisable to put flexibility in it. For example, so that it can execute custom commands and scripts, that is, provide a mechanism for changing the behavior of a service without redeploying or restarting it.
Add a tool to run an arbitrary remote script. Although this looks suspicious, but if it is your agent or service, then you can prepare the built-in runtime sandbox for running the code. Most often, runtime environments embed JavaScript and Lua.
We will use the otto embedded JS engine .
import(
:
"github.com/robertkrimen/otto"
:
for {
:
vm.Run(`
abc = 2 + 2;
console.log("\nThe value of abc is " + abc); // 4
`)
:
}
If we assume that the content being transmitted to Run
is received from outside, we received a complex and self-updating IoT agent!
Understanding Go binary dependencies
So, what have we come to.
$ ls -lha fattyproject
... 13M ... fattyproject*
We assume that we need all the added dependencies, but as a result, the size of the binary file is matched to 12 megabytes. Although this is slightly compared to other languages and platforms, however, taking into account the modest capabilities of IoT equipment, it would be advisable to reduce the file size and the cost of computing resources.
Let's find out how dependencies are added to our binary.
First, let's figure out a well-known binary. GraphicsMagick is a modern variation of the popular image processing system ImageMagick
. You probably already have it installed. If not, then under OS X this can be done with brew install graphicsmagick
.
otool
- An alternative to the ldd tool , only under OS X. With it, we can analyze the binary file and find out which libraries it is linked to.
$ otool -L `which convert`
/usr/local/bin/convert:
/usr/local/Cellar/imagemagick/6.9.3-0_2/lib/libMagickCore-6.Q16.2.dylib (compatibility version 3.0.0, current version 3.0.0)
/usr/local/Cellar/imagemagick/6.9.3-0_2/lib/libMagickWand-6.Q16.2.dylib (compatibility version 3.0.0, current version 3.0.0)
/usr/local/opt/freetype/lib/libfreetype.6.dylib (compatibility version 19.0.0, current version 19.3.0)
/usr/local/opt/xz/lib/liblzma.5.dylib (compatibility version 8.0.0, current version 8.2.0)
/usr/lib/libbz2.1.0.dylib (compatibility version 1.0.0, current version 1.0.5)
/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.5)
/usr/local/opt/libtool/lib/libltdl.7.dylib (compatibility version 11.0.0, current version 11.1.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1226.10.1)
From the list, you can also isolate the size of each dependency:
$ ls -lha /usr/l/.../-0_2/lib/libMagickCore-6.Q16.2.dylib
... 1.7M ... /usr/.../libMagickCore-6.Q16.2.dylib
Can we thus get a fairly complete picture of any binary file? Obviously, the answer is no.
By default, Go links dependencies statically. Thanks to this, we get the only self-contained binary file. But it also means that otool
, like any other similar tool, it will be useless.
$ cat main.go
package main
func main() {
print("hello")
}
$ go build && otool -L main
main:
If you still try to parse the Go binary on its dependencies, then we will have to use a tool that understands the format of these binary files. Let's look for something suitable.
To get a list of available tools, we will use go tool
:
$ go tool
addr2line
api
asm
cgo
compile
cover
dist
doc
fix
link
nm
objdump
pack
pprof
trace
vet
yacc
We can immediately turn to the source codes of these tools . Take nm, for example, and see its package documentation . I deliberately mentioned this tool. As it turned out, the possibilities are nm
very close to what we need, but this is still not enough. It can display a list of characters and sizes of objects, but all this is useless if we try to get a general idea about the dependencies of a binary file.
$ go tool nm -sort size -size fattyproject | head -n 20
5ee8a0 1960408 R runtime.eitablink
5ee8a0 1960408 R runtime.symtab
5ee8a0 1960408 R runtime.pclntab
5ee8a0 1960408 R runtime.esymtab
4421e0 1011800 R type.*
4421e0 1011800 R runtime.types
4421e0 1011800 R runtime.rodata
551a80 543204 R go.func.*
551a80 543204 R go.string.hdr.*
12d160 246512 T github.com/robertkrimen/otto._newContext
539238 100424 R go.string.*
804760 65712 B runtime.trace
cd1e0 23072 T net/http.init
5e3b80 21766 R runtime.findfunctab
1ae1a0 18720 T go.uber.org/zap.Any
301510 18208 T unicode.init
5e9088 17924 R runtime.typelink
3b7fe0 16160 T crypto/sha512.block
8008a0 16064 B runtime.semtable
3f6d60 14640 T crypto/sha256.block
Although in relation to the dependencies themselves, the indicated dimensions (second column) can be exact, but in general we cannot just take and add these values.
Gofat
There was one last trick left that should work. When you compile your binary, Go generates intermediate files for each dependency before statically linking them to a single file.
I present to your attention gofat
a shell script that is a combination of Go code and some Unix tools. It analyzes dependency sizes in Go binaries:
#!/bin/sh
eval `go build -work -a 2>&1` && find $WORK -type f -name "*.a" | xargs -I{} du -hxs "{}" | gsort -rh | sed -e s:${WORK}/::g
If in a hurry, just copy or download this script and make it executable ( chmod +x
). Then run the script without any arguments in the directory of your project to get information about its dependencies.
Let's deal with this command:
eval go build -work -a 2>&1
The -a flag tells Go to ignore the cache and build the project from scratch. In this case, all dependencies will be rebuilt forcibly. The –work flag displays the working directory so that we can analyze it (thanks to the Go developers!).
find $WORK -type f -name "*.a" | xargs -I{} du -hxs "{}" | gsort -rh
Then, with the help of the tool, find
we find all the files *.a
that are our compiled dependencies. Then we pass all the lines (file locations) to xargs
. This utility allows you to apply commands to each transmitted line - in our case du
, which receives the file size.
Finally, we will use the gsort
(GNU version of sort) to sort the file sizes in reverse order.
sed -e s:${WORK}/::g
We remove the WORK folder prefix from everywhere and display a cleared string with data on the dependency.
We turn to the most interesting: what is 12 MB in our binary file?
Lose weight
We’re launching gofat
it for the first time in relation to our toy project with an IoT agent. We get the following data:
2.2M github.com/robertkrimen/otto.a
1.8M net/http.a
1.4M runtime.a
960K net.a
820K reflect.a
788K gopkg.in/alecthomas/kingpin.v2.a
668K github.com/newrelic/go-agent.a
624K github.com/newrelic/go-agent/internal.a
532K crypto/tls.a
464K encoding/gob.a
412K math/big.a
392K text/template.a
392K go.uber.org/zap/zapcore.a
388K github.com/alecthomas/template.a
352K crypto/x509.a
344K go/ast.a
340K syscall.a
328K encoding/json.a
320K text/template/parse.a
312K github.com/robertkrimen/otto/parser.a
312K github.com/alecthomas/template/parse.a
288K go.uber.org/zap.a
232K time.a
224K regexp/syntax.a
224K regexp.a
224K go/doc.a
216K fmt.a
196K unicode.a
192K compress/flate.a
172K github.com/robertkrimen/otto/ast.a
172K crypto/elliptic.a
156K encoding/asn1.a
152K os.a
136K strconv.a
128K os/exec.a
128K github.com/Sirupsen/logrus.a
128K flag.a
112K vendor/golang_org/x/net/http2/hpack.a
104K strings.a
104K net/textproto.a
104K mime/multipart.a
If you experiment, you will notice that with gofat
the assembly time it increases significantly. The fact is that we start the assembly in a mode -a
in which everything is rebuilt again.
Now we know how much space each addiction takes. Roll up the sleeves, analyze and take action.
1.8M net/http.a
Everything related to HTTP processing is 1.8 MB. Perhaps you can throw it away. We give up expvar
, instead, we will periodically dump critical parameters and information about the state of the program into the log file. If you do this often, then everything will be fine.
Update: With the release of Go 1.8 net / http began to weigh 2.2 MB.
788K gopkg.in/alecthomas/kingpin.v2.a
388K github.com/alecthomas/template.a
And this is a big surprise: about 1 MB is occupied by a very convenient POSIX feature for flag parsing. You can refuse it and use the package from the standard library, or even even end the flags and read the configuration from environment variables (and this will also take some amount).
Newrelic
adds another 1.3 MB, so you can also drop it:
668K github.com/newrelic/go-agent.a
624K github.com/newrelic/go-agent/internal.a
`Throw Zap too. We use the standard package for logging:
392K go.uber.org/zap/zapcore.a
Otto
Being an embedded JS engine, it weighs a lot:
2.2M github.com/robertkrimen/otto.a
312K github.com/robertkrimen/otto/parser.a
172K github.com/robertkrimen/otto/ast.a
At the same time, it logrus
takes up little space for such a multifunctional logging library:
128K github.com/Sirupsen/logrus.a
You can leave it.
Conclusion
We found a way to calculate dependency sizes in Go and saved about 7 MB. And we decided that we would not use certain dependencies, but instead we take analogues from the standard Go library.
Moreover, I will say that if we try hard and experiment with a set of dependencies, then we can shrink our binary file from the original 12 MB to 1.2 MB.
It’s not necessary to do this, because the dependencies in Go are already small compared to other platforms. But you definitely need to have tools at hand to help you better understand what you are creating. And if you are developing software for environments with very limited available resources, then one of these tools might be gofat
.
PS: if you want to experiment more, here is the reference repository: https://github.com/jondot/fattyproject .