BratSinot January 18, 2014 at 17:59

[Translation] Arrays, slices (and strings): The 'insert' mechanism

Introduction

One of the most common features of procedural programming languages is the concept of an array. Arrays may seem like something simple, but on the other hand, before adding them to the language, you need to solve several issues, such as:

Fixed or variable size?
Is size part of the type?
What will multidimensional arrays be?
What is the concept of an empty array?

Answers to these questions will define arrays as a simple feature of the language, or as the main part of its design.

In the early days of Go, finding answers to these questions required years of discussion before their concept began to look as we thought fit. The key step was to create the concept of slices , which are based on fixed-size arrays, in order to create a flexible and extensible data structure. However, many newcomers to Go stumble over the principles of slicing, perhaps because their experience with other languages has left its mark.

In this publication, we will try to dispel all these misunderstandings. We will achieve this in pieces explaining how the append function works , and why it works just like that and nothing else.

Arrays

Arrays are an important piece of the Go language, but, like the foundation of the building, it is hidden under more visible parts. We must get you up to speed before moving on to more interesting, powerful, and noticeable slicing features.

Arrays are not often found in Go programs, because the type of an array includes its size, which limits its use.

For example, an ad:

var buffer [256]byte

creates a buffer variable that contains 256 bytes. The buffer variable type includes size and looks like this: [256] byte . An array of 512 bytes will be of type [512] byte .

The data associated with the array is just an array of elements. Schematically, our buffer in memory will look something like this:

buffer: byte byte byte ... 256 times ... byte byte byte

That is, the variable contains 256 bytes of data and nothing more. We can access the elements using the familiar indexing syntax buffer [0] , buffer [1] , and so on to buffer [255] (an index from 0 to 255 covers 256 elements). Attempting to go beyond this range will cause the program to crash.

There is a built-in len function that returns the number of elements in an array, slice, and some other types. Obviously what exactly will return len for the array. In our case, len (buffer) will return 256.

For arrays, you can find your place of application. For example, they are good for transforming matrices, but their most common use in Go is storing slices.

Slices: Slice Header

Slices where something interesting happens, but before you start using them, you will need to understand their need and what they are doing.

A slice is a data structure describing multiple partitioning of an array and stored separately from a variable. A slice is not an array . A slice describes part of an array.

If we take the buffer array from the previous section, we can create a slice that will describe elements from 100 to 150 (to be precise, from 100 to 149 inclusive) by slicing the array:

var slice []byte = buffer[100:150]

In this piece of code, to be precise, we used the full variable declaration. The slice variable is of type [] byte , read as “slice of bytes” and created from the buffer array , by slicing from element 100 (inclusive) to 150 (exclusively). In a more “canonical” syntax, we would omit the type that will be defined during the initialization process:

var slice = buffer[100:150]

And inside the function, we would use a short form of declaration:

slice := buffer[100:150]

What is a slice? This is not a complete description, but from now on, think of a slice as a small structure consisting of two elements: a length and a pointer to an array element. Consider this behind the scenes as something like this:

type sliceHeader struct {
    Length        int
    ZerothElement *byte
}
slice := sliceHeader{
    Length:        50,
    ZerothElement: &buffer[100],
}

Of course, this is just an illustration. Despite this, one can understand from this example that the sliceHeader structure is not accessible to the programmer, and the type of pointer depends on the type of elements, but it makes it possible to understand the basic idea of the mechanics of slicing.

So far, we have used the operation of slicing an array, but we can also slice the slice:

slice2 := slice[5:10]

In exactly the same way as before, this operation creates a new slice, but in this case from elements 5 to 9 (inclusive) relative to the original slice, which means elements 105 to 109 of the original array. The basic structure of sliceHeader variable slice2 will look like this:

slice2 := sliceHeader{
    Length:        5,
    ZerothElement: &buffer[105],
}

Note that the header still points to the base array located in the buffer variable .

We can also re-cut , which means to cut a slice and save the result in the structure of the cut slice. Those. after:

slice = slice[5:10]

the sliceHeader structure for the slice variable will look the same as for the slice2 variable . You will often see overshoot, for example, to cut a slice. In this example, the first and last element of our slice will be omitted:

slice = slice[1:len(slice)-1]

You can often hear from experienced Go programmers about the “slice header” because this is what is stored in the slice variable. For example, when you call a function that takes a slice as an argument, such as bytes.IndexRune , a header will be passed to the function. In this example:

slashPos := bytes.IndexRune(slice, '/')

the slice argument will be passed to the IndexRune function and, in fact, this is just a “slice header”.

There is another data element in the “slice header” that we will talk about below, but first, let's take a look at what the “slice header” means when you write a program that uses slicers.

Passing slices to functions

It is very important to understand that even if a slice contains a pointer, it is a value in itself. Under the hood, this is a structure containing a pointer and a length. This is not a pointer to a structure.

It is important.

When we call IndexRune in the previous example, it takes a copy of the "top of the header." This behavior has an important consequence.

Consider a simple function:

func AddOneToEachElement(slice []byte) {
    for i := range slice {
        slice[i]++
    }
}

She does exactly what is written in the title, i.e. traverses all slice elements (using the for range loop ), increasing its elements.

Try:

func main() {
    slice := buffer[10:20]
    for i := 0; i < len(slice); i++ {
        slice[i] = byte(i)
    }
    fmt.Println("before", slice)
    AddOneToEachElement(slice)
    fmt.Println("after", slice)
}

Despite the fact that the “ slice header ” is passed to the function, it includes a pointer to the elements of the array, so the original slice header and its copy describe the same array. As a result, when the function ends, the changed elements can be seen through the original slice.

The argument to the function is actually a copy, and this example shows this:

func SubtractOneFromLength(slice []byte) []byte {
    slice = slice[0 : len(slice)-1]
    return slice
}
func main() {
    fmt.Println("Before: len(slice) =", len(slice))
    newSlice := SubtractOneFromLength(slice)
    fmt.Println("After:  len(slice) =", len(slice))
    fmt.Println("After:  len(newSlice) =", len(newSlice))
}

Here we see that the contents of the argument can be changed through the function, but not its title . The length is stored in the slice variable and does not change as a result of calling the function, because a copy of the slice header is passed to the function, not the original one. Thus, if we want to write a function that changes the title, we must return it, as we did in the example. The slice variable does not change, but the return value has a new length, which is stored in newSlice .

Slicer Pointers: Production Methods

There is another way to write a function that changes the slice header, and this is passing a pointer to the slice into the function. Here is a variation of our example to demonstrate this feature:

func PtrSubtractOneFromLength(slicePtr *[]byte) {
    slice := *slicePtr
    *slicePtr = slice[0 : len(slice)-1]
}
func main() {
    fmt.Println("Before: len(slice) =", len(slice))
    PtrSubtractOneFromLength(&slice)
    fmt.Println("After:  len(slice) =", len(slice))
}

An example may seem a little awkward, given the extra level of abstraction (temporary variable), but there is one case where you will use pointers to slices. It is accepted to use the pointer as a receiver when writing a method that modifies a slice.

Let's say we wanted a method that eliminates the last slash. We can write it something like this:

type path []byte
func (p *path) TruncateAtFinalSlash() {
    i := bytes.LastIndex(*p, []byte("/"))
    if i >= 0 {
        *p = (*p)[0:i]
    }
}
func main() {
    pathName := path("/usr/bin/tso") // Преобразование из строки в path
    pathName.TruncateAtFinalSlash()
    fmt.Printf("%s\n", pathName)
}

If you run the example, you will see that what is required will happen, that is, the method will change the slice.

On the other hand, if we want to write a method for path that sets the upper case of ASCII characters (no behavior is defined with English letters), the method can operate on a value, not a pointer, because the receiver value still points to the array we need.

type path []byte
func (p path) ToUpper() {
    for i, b := range p {
        if 'a' <= b && b <= 'z' {
            p[i] = b + 'A' - 'a'
        }
    }
}
func main() {
    pathName := path("/usr/bin/tso")
    pathName.ToUpper()
    fmt.Printf("%s\n", pathName)
}

Here the ToUpper method uses two variables in for range in order to use the index of the element and, directly, the slice element itself. This will avoid re-writing to p [i] .

Capacity

Consider the following function, which increases the slice of ints by one element:

func Extend(slice []int, element int) []int {
    n := len(slice)
    slice = slice[0 : n+1]
    slice[n] = element
    return slice
}

Now run:

func main() {
    var iBuffer [10]int
    slice := iBuffer[0:0]
    for i := 0; i < 20; i++ {
        slice = Extend(slice, i)
        fmt.Println(slice)
    }
}

Let's see how the slice grows until ... it grows.

It's time to talk about the third component of the slice header: its capacity . In addition to the pointer to the array and its length, the slice header contains its capacity:

type sliceHeader struct {
    Length        int
    Capacity      int
    ZerothElement *byte
}

The Capacity field contains a record of how much space the array actually occupies - this is the maximum value that Length can reach . An attempt to increase the slice above its capacity will lead to the outflow of the array and cause emergency program termination.

This example creates a slice

slice := iBuffer[0:0]

and its title looks like:

slice := sliceHeader{
    Length:        0,
    Capacity:      10,
    ZerothElement: &iBuffer[0],
}

The Capacity field is equivalent to the length of the original array minus the index of the array element, which is the first slice element (in this case, zero). If you want to know the capacity of the slice, then use the cap function :

if cap(slice) == len(slice) {
    fmt.Println("slice is full!")
}

Make

What if we want to increase the cut more than its capacity? We can not! By definition, capacity is the growth limit. But you can get the same result by creating a new array, copying the data and changing the slice describing the new array.

Let's start by highlighting. We can use the new function to allocate a larger array and as a result of a larger slice, but it will be easier to use the make function . It selects a new array and creates a slice header. The make function has three arguments: the slice type, the initial length and its capacity, where the length of the array is what makeallocates for slice data. As a result, this function call creates a slice of length 10 and the possibility of expanding by 5 (15-10), which you can see by running this:

    slice := make([]int, 10, 15)
    fmt.Printf("len: %d, cap: %d\n", len(slice), cap(slice))

This snippet doubles the capacity of our int slice, but leaves the same length:

    slice := make([]int, 10, 15)
    fmt.Printf("len: %d, cap: %d\n", len(slice), cap(slice))
    newSlice := make([]int, len(slice), 2*cap(slice))
    for i := range slice {
        newSlice[i] = slice[i]
    }
    slice = newSlice
    fmt.Printf("len: %d, cap: %d\n", len(slice), cap(slice))

After that, the slice has much more room for growth before it needs another redistribution.

When creating slices, it often happens that length and capacity are one and the same. The make function has a shortened version. The default length becomes capacity, so you can specify them in a single value. After

gophers := make([]Gopher, 10)

at the gophers slice , the length and capacity will be 10.

Copy

After we doubled the capacity of our slice in the previous section, we rewrote the loop to copy the old data to a new slice. Go has a built-in copy function that simplifies this task. Her arguments are two slices and she copies the data from the right to the left slice. Here is our example rewritten to use copy :

    newSlice := make([]int, len(slice), 2*cap(slice))
    copy(newSlice, slice)

The copy function is smart. It copies only what it can, paying attention to the length of both arguments. In other words, the number of elements to be copied is equal to the minimum of the lengths of both slices. This can save a little "bureaucracy." In addition, copy returns an integer value - the number of elements that were copied, although this is not always worth checking.

The copy function also takes into account cases when the source and the receiver intersect (approx. Trans. This is like memmove () in C), which means that the function can be used to move elements inside one slice. Below is an example of how to use the copy function to paste a value in the middle of a slice.

// Вставляет вставляемое значение в срез по указанному индексу,
// который должен быть из определенного диапазона.
// Срез должен иметь свободное место для нового элемента.
func Insert(slice []int, index, value int) []int {
    // Увеличиваем срез на один элемент
    slice = slice[0 : len(slice)+1]
    // Используем copy для перемещения верхней части среза наружу и создания пустого места
    copy(slice[index+1:], slice[index:])
    // Записываем новое значение.
    slice[index] = value
    // Возвращаем результат.
    return slice
}

There are several points that you may notice in this feature. First, which is obvious, it must return the slice, because its length has changed. Secondly, a convenient contraction is used. Expression

slice[i:]

means the same thing as

slice[i:len(slice)]

In addition, we did not use another trick, we can also leave the first element of the expression empty; by default it will be zero. Thus

slice[:]

It simply means slicing itself, which is useful when slicing an array. This expression is the shortest let’s say: “slice describing all elements of the array”:

array[:]

But that was in between cases, let's try our Insert function .

    slice := make([]int, 10, 20) // Заметим, что capacity > length: есть место для вставки элемента.
    for i := range slice {
        slice[i] = i
    }
    fmt.Println(slice)
    slice = Insert(slice, 5, 99)
    fmt.Println(slice)

Insert: Example

A few sections ago, we wrote an Extend function that extended a slice by one element. It was wrong, if only for the reason that if the capacity of the slice was too small, the function might fail (in our example, the Insert function is subject to the same problem). Now we know how to fix this, so let's write a reliable implementation of the Extend function for integer slices.

func Extend(slice []int, element int) []int {
    n := len(slice)
    if n == cap(slice) {
        // срез полон; должны расти.
        // Мы удвоили его размер и добавили 1, поэтому если размер равен нулю, мы по прежнему можем вырасти
        newSlice := make([]int, len(slice), 2*len(slice)+1)
        copy(newSlice, slice)
        slice = newSlice
    }
    slice = slice[0 : n+1]
    slice[n] = element
    return slice
}

In this case, it is especially important to return the slice, because when we redistributed it, the slice we got describes a completely different array. Here is a small slice to demonstrate what happens if the slice fills up:

    slice := make([]int, 0, 5)
    for i := 0; i < 10; i++ {
        slice = Extend(slice, i)
        fmt.Printf("len=%d cap=%d slice=%v\n", len(slice), cap(slice), slice)
        fmt.Println("address of 0th element:", &slice[0])
    }

Note the redistribution when the initial array of size 5 is filled. The capacity and address of the null element change when a new array is allocated.

Using the robust Extend function as the basis, we can write an even better function that allows us to expand the slice with several elements. To do this, take advantage of Go's ability to turn the list of arguments into an array. That is, we use Go's ability to use a variable number of function arguments.

Let's call the function append . For the first version, we can just call Extend until the function arguments run out. The prototype of the Append function looks like this:

func Append(slice []int, items ...int) []int

This tells us that Append takes one argument - a slice, and it follows from zero to infinity of arguments of type int . These elements are future slices slices, as you can see:

// Append добавляет элементы в срез.
// Первая версия: просто циклически вызываем Extend.
func Append(slice []int, items ...int) []int {
    for _, item := range items {
        slice = Extend(slice, item)
    }
    return slice
}

Notice that the for range loop is executed for each element of the items argument , which is of type [] int . Also, note the use of the empty identifier _, which discards the index, because we do not need it in the loop.

Try it:

    slice := []int{0, 1, 2, 3, 4}
    fmt.Println(slice)
    slice = Append(slice, 5, 6, 7, 8)
    fmt.Println(slice)

Another new trick in this example is that slice initialization is done with a composite literal, which consists of the type and slice elements enclosed in braces:

    slice := []int{0, 1, 2, 3, 4}

The function Append as the same is interesting for another reason. We can not just add elements, we can pass entire slices as arguments to the function using ...:

    slice1 := []int{0, 1, 2, 3, 4}
    slice2 := []int{55, 66, 77}
    fmt.Println(slice1)
    slice1 = Append(slice1, slice2...) // '...' обязательно!
    fmt.Println(slice1)

Of course, we can make Append more efficient by single selection based on Extend :

// Append добавляет элементы в срез.
// Эффективная версия.
func Append(slice []int, elements ...int) []int {
    n := len(slice)
    total := len(slice) + len(elements)
    if total > cap(slice) {
        // Перераспределение. Увеличим размер в 1.5 раза, что позволит нам расти дальше.
        newSize := total*3/2 + 1
        newSlice := make([]int, total, newSize)
        copy(newSlice, slice)
        slice = newSlice
    }
    slice = slice[:total]
    copy(slice[n:], elements)
    return slice
}

Note that here we use copy twice to move the data slice to a new piece of memory and then copy the added items to the end of the old data.

Try it, the behavior is the same as before:

    slice1 := []int{0, 1, 2, 3, 4}
    slice2 := []int{55, 66, 77}
    fmt.Println(slice1)
    slice1 = Append(slice1, slice2...) // '...' обязательно!
    fmt.Println(slice1)

Append: Built-in Function

So, we came to the conclusion that in Go you need to add the built-in function append . It does the same thing as our Append function from the example, with the same efficiency, but works for any type of slicer.

Go's weakness is that any operation defined on a "generic type" must be provided at run time. Someday this may change, but now, in order to simplify work with slices, Go provides a built-in general function append . It works the same as our integer version, but for any type of slice.

Remember that since the slice header is updated every time you call append, you need to save the resulting slice after the call. In fact, the compiler will not allow you to call append without saving the result.

Here are some single-line output. Try them, change and explore:

    // Создаем пару начальных срезов.
    slice := []int{1, 2, 3}
    slice2 := []int{55, 66, 77}
    fmt.Println("Start slice: ", slice)
    fmt.Println("Start slice2:", slice2)
    // Добавляем элемент в срез.
    slice = append(slice, 4)
    fmt.Println("Add one item:", slice)
    // Добавляем один срез в другой.
    slice = append(slice, slice2...)
    fmt.Println("Add one slice:", slice)
    // Делаем копию среза (int).
    slice3 := append([]int(nil), slice...)
    fmt.Println("Copy a slice:", slice3)
    // Копируем срез в конец самого себя.
    fmt.Println("Before append to self:", slice)
    slice = append(slice, slice...)
    fmt.Println("After append to self:", slice)

You should stop and think about the last lines of the example and understand how the design of slices allows you to make such simple calls right.

There are many examples on the Slice Tricks wiki (community-created) using append , copy, and other ways to use slices.

Nil

Furthermore, by using newly acquired knowledge we can understand what a "zero» ( nil ) slice. Naturally, this is the null value of the slice header:

sliceHeader{
    Length:        0,
    Capacity:      0,
    ZerothElement: nil,
}

or simply

sliceHeader{}

The key is that the pointer is also nil . This slice

array[0:0]

has zero length (and maybe even zero capacity), but its pointer is not nil , so this is still not a zero slice.

Obviously, an empty slice can grow (assuming that it is not of zero capacity), but a “nil” ( nil ) slice does not contain an array where you can put values and cannot grow, even to contain at least one element.

It is worth noting that the "zero» ( nil ) cut, in fact, equivalent to slice a zero length, even if he did not for that does not specify. It has zero length and you can add something to it with selection. As an example, look at a few lines above, wherein the slice is copied by adding "zero» ( nil ) slice.

Lines

Now briefly about the lines in Go in the context of slices.

Actually, the strings are very simple: these are read-only byte slices, with a little extra syntactic support from the language.

Since they are read-only, they do not have capacity (you cannot increase them), however in most cases you can treat them as slices of bytes.

For starters, we can use indexing to access individual bytes:

slash := "/usr/ken"[0] // записывает байт '/'

We can slice a string to create a substring:

usr := "/usr/ken"[0:4] // записывает строку "/usr"

It should be obvious what happens behind the curtain when we cut the string.

So we can take a regular byte slice and create a string from it, by a simple conversion:

str := string(slice)

as well as make a slice of bytes from a string:

slice := []byte(usr)

The array underlying the strings is hidden from view, there is no way to access it except through a string. This means that when we do these transformations, a copy of the array must be created. Go takes the job, so don’t worry about it. After any such conversion, changing the array of the underlying byte slice does not affect the corresponding line.

An important consequence of the similar behavior of strings as slices is that substring creation is very efficient. All that is required is the creation of the two tops of the lines. Since the string is read-only, the original string and the string obtained as a result of slicing can safely share the same array.

Historical note: Early string implementations always stood out separately, but since then slices have appeared in the language, which made it possible to create more efficient work with strings. Some benchmarks began to show a huge increase in speed.

Of course, there is much more to tell about the lines, but this topic is for another publication.

Conclusion

Understanding the principles of slicing, helps to understand how they are made. There is a small data structure, the slice heading associated with a slice type variable, and this heading describes part of a separately allocated array. When we create a slice, the header is copied, but the array is always split.

Once you evaluate their work, slices will become not only easy to use, but powerful and expressive, especially with the built-in copy and append functions .

Original publication - blog.golang.org/slices

Tags: