Finger trees (Part 1. Presentation)

Original author: Andrew Gibiansky
  • Transfer
  • Tutorial
Recently there was an article on Habrice about how you can create structures in a functional language such as Queue (the first went in, the first went out) and Dec (resembles a two-way stack - the first went in, the first came out from both ends). I looked at this code and realized that it was terribly inefficient - the complexity of the order O(n). I quickly figured out how to create a structure with which O(1)I did not succeed, so I opened the library implementation code. But there was not an easy and understandable implementation, but <много кода>. This was a description of finger trees, the necessity and elegance of which for this data structure is well disclosed in the current article.

Finger trees

In this article we will consider finger trees. These are functional, immutable general-purpose data structures developed by Ginze and Patterson. Finger-trees provide a functional data structure Sequence ( sequence), which provides amortized access constant in time for addition to both the beginning and the end of the sequence, as well as the logarithmic time for concatenation and for random access. In addition to good asymptotic execution time, the data structure is incredibly flexible: in combination with monoidal tags on the elements, finger trees can be used to implement effective random access sequences, ordered sequences, interval trees, and priority queues.

The article will consist of 3 parts:

Finger trees (Part 1. Presentation)
Finger trees (part 2. Operations)
Finger trees (Part 3. Application)

Designing a data structure

The basis and motivation of finger trees came from 2-3 trees. 2-3 trees are trees that can have two or three branches at each inner vertex and which have all their leaves on the same level. While a binary tree of the same depth dshould have 2 d leaves, 2-3 trees are much more flexible, and can be used to store any number of elements (the number should not be a power of two).
Consider the following 2-3 tree:

This tree holds fourteen elements. Access to any of them requires three steps, and if we were to add more elements, the number of steps for each of them would grow logarithmically. We would like to use these trees to model the sequence. However, in many applicable sequences, very often and repeatedly refer to the beginning or end, and much less often to the middle. To satisfy this wish, we can change this data structure so that the priority of access to the beginning and the end is the highest, unlike other features.

In our case, we add two fingers. The finger is just the point at which you can access part of the data structure, in imperative languages ​​this would be just a pointer. In our case, however, we will restructure the entire tree and make the parents of the first and last children the two roots of our tree. Visually, considering the question of changing the tree above, grab the first and last nodes on the penultimate layer, and drag them up, allowing the rest of the tree to hang:

This new data type is known as the finger tree. The finger tree consists of several layers (circled in blue), which are strung on the axis shown in brown

Each finger tree layer has a prefix (on the left) and a suffix (on the right), as well as a link to a further hike inland. The prefix and suffix contain the values ​​of the finger tree: on the first layer they contain the values ​​of 2-3 trees of depth 0, on the second layer they contain 2-3 trees of depth 1, on the third layer they contain 2-3 trees of depth 2 and so on. The main element of this 2-3 tree is now the element at the very bottom.

Well, having a description, let's describe the data structure. To begin with, we must determine the structure of 2-3 trees that will need to be used to preserve things strung on an axis.

data Node a = Branch3 a a a    -- Ветка (node) может содержать 3х детей.
            | Branch2 a a      -- ... или только 2х детей.
            deriving Show

Note that the branch is parameterized for its children. This allows you to have nested branches for representing 2-3 trees and guarantee the same depth. For example, a 2-3 tree with a depth of 1 can be Node Char:
> -- 2-3 деревья со 2го суффиксного слоя экземпляра пальчикового дерева.
> Branch3 'n' 'o' 't'
Branch3 'n' 'o' 't'
> Branch2 'a' 't'
Branch2 'a' 't'

However, we can also create deeper 2-3 trees. For example, 2-3 tree, depth 2 may be Node (Node Char):

> Branch2 (Branch3 'n' 'o' 't') (Branch2 'a' 't')
Branch2 (Branch3 'n' 'o' 't') (Branch2 'a' 't')

We notice that the mapping guarantees that 2-3 trees will be of the same depth, because the depth is present in the type of tree. This also has its drawbacks, since it is more difficult to write functions that are parametric with respect to the tree depth parameter. But this is not so bad for our case.
To further our convenience, add a few methods that will enable us to transform the values of length of 2 branches from the lists, or 3. To do this, we will use the extension OverloadedLists for the GHC , which will allow us to write functions fromListand toListfor different types of data, and then use them for comparison with the samples if we use lists:

{- LANGUAGE OverloadedLists, TypeFamilies -}
import GHC.Exts (IsList(..))
instance IsList (Node a) where
  type Item (Node a) = a
  toList (Branch2 x y)   = [x, y]
  toList (Branch3 x y z) = [x, y, z]
  fromList [x, y]    = Branch2 x y
  fromList [x, y, z] = Branch3 x y z
  fromList _ = error "Node must contain two or three elements"

Now that we have our type of 2-3 tree, we also need a type to save the prefix and suffixes that are strung on the axis of the finger tree. If our finger tree is a complete analogy of 2-3 trees, then every very first prefix and suffix can have 2 or 3 elements, and the middle ones can only have 1 or 2 (because one of the links goes up one level along the axis). However, to reduce information content, the requirement is weakened for finger trees, and, instead, each prefix and suffix contain from 1 to 4 elements. There can be no more values. We could allow us to save the prefix and suffix as lists, but instead we will use more selective constructors, each of which is responsible for its own correct number of elements:

-- Параметризуем аффикс типом данных, который он сохраняет
-- Тип эквивалентен спискам длиной от 1 до 4
data Affix a = One a
             | Two a a
             | Three a a a
             | Four a a a a
             deriving Show

Working with this data type is not so convenient, so we will quickly add helper functions that allow you to work with affixes as lists.

-- Работам с аффиксами как со списками
instance IsList (Affix a) where
  type Item (Affix a) = a
  toList (One x)        = [x]
  toList (Two x y)      = [x, y]
  toList (Three x y z)  = [x, y, z]
  toList (Four x y z w) = [x, y, z, w]
  fromList [x]          = One x
  fromList [x, y]       = Two x y
  fromList [x, y, z]    = Three x y z
  fromList [x, y, z, w] = Four x y z w
  fromList _ = error "Affix must have one to four elements"
-- Следующая функция может быть куда более эффективной
-- Мы же будем использовать самую простую реализацию на сколько возможно
affixPrepend :: a -> Affix a -> Affix a
affixPrepend x = fromList . (x :) . toList
affixAppend :: a -> Affix a -> Affix a
affixAppend x = fromList . (++ [x]) . toList

Now that we have determined the type of data needed to store the values ​​(2-3 trees that store values ​​and affixes attached to the axis), we can create an axial data structure. This axial structure is what we call the finger tree, and it is defined as follows:

-- Как обычно, параметрезируется тип тем типом, который он сохраняет в пальчиковом дереве
data FingerTree a 
  = Empty      -- Дерево может быть пустым
  | Single a   -- Дерево может содержать единственное значение, удобно это записать в отдельный случай
  -- Общий случай с префиксом, суффиксом и ссылкой на вглубь дерева
  | Deep {
    prefix :: Affix a,             -- Значения слева
    deeper :: FingerTree (Node a), -- Более глубокая часть пальчикового дерева, хранящая 2-3 деревья
    suffix :: Affix a              -- Значения справа
  deriving Show

In the definition above, a deep field FingerTree ais of type FingerTree (Node a). This means that the values ​​stored on the next layer are 2-3 trees that are one level deeper. So, affixes of the first layer are FingerTree Charstored only Char, the second layer is stored FingerTree (Node Char)and has affixes that store 2-3 trees with a depth of 1 ( Node Char). The third layer will already be FingerTree (Node (Node Char))and has affixes that store 2-3 trees with a depth of 2 ( Node (Node Char)).

Now that we have determined our type of finger tree, we will spend a little more time and consider the example that was shown above in order to understand how we translate it into a structure FingerTree Char:

Transferring this to a tree, we get:

layer3 :: FingerTree a
layer3 = Empty
layer2 :: FingerTree (Node Char)
layer2 = Deep prefix layer3 suffix
    prefix = [Branch2 'i' 's', Branch2 'i' 's']
    suffix = [Branch3 'n' 'o' 't', Branch2 'a' 't']
layer1 :: FingerTree Char
layer1 = Deep prefix layer2 suffix
    prefix = ['t', 'h']
    suffix = ['r', 'e', 'e']
exampleTree :: FingerTree Char
exampleTree = layer1

> exampleTree
Deep {prefix = Two 't' 'h', deeper = Deep {prefix = Two (Branch2 'i' 's') (Branch2 'i' 's'), deeper = Empty, 
suffix = Two (Branch3 'n' 'o' 't') (Branch2 'a' 't')}, suffix = Three 'r' 'e' 'e'}

In the next part of the article, we will learn how to easily work with finger trees as sequences.

Also popular now: