Interplanetary File System - Trivial Hash (Identity), DAG Block, and Protocol Buffers

    Recently, I added support for a trivial (identity) hash in IPFS . In my article I will tell about it and show how it can be used.


    Let me remind you: InterPlanetary File System is a new decentralized file sharing network (HTTP-server, Content Delivery Network ). About her, I began the story in the article "Interplanetary File System IPFS" .

    Usually, when hashing passes through the hash function, the data is irreversibly “compressed” and as a result a short identifier is obtained. This identifier allows you to find the data on the network and check its integrity.


    The trivial hash is the data itself. The data does not change at all and, accordingly, the size of the "hash" is equal to the data size.


    The trivial hash performs the same function as the Data: URL . The content identifier in this case contains the data itself instead of the hash. This allows you to nest child blocks in the parent making them available immediately after receiving the parent. You can also include site data directly in the DNS record.


    For example, encode the text string “Hello world” into a content identifier (CID) with a trivial hash.
    image


    ID structure:


    [префикс основания][varint версия CID][varint тип контента][varint ID хеша][varint длинна хеша][хеш]

    Let's start from the end.


    [hash]


    The trivial hash in our case is the string itself. Translate it to HEX .


    "Привет мир" = 0x"D09F D180 D0B8 D0B2 D0B5 D182 20 D0BC D0B8 D180"

    This is the HEX of this line in utf-8 encoding . But to know for sure that the browser is utf-8 string to add to it at the beginning: 0xEFBBBF. This is a byte sequence marker (BOM).


    0x"EFBBBF D09F D180 D0B8 D0B2 D0B5 D182 20 D0BC D0B8 D180"

    [varint is long hash]


    Now we can calculate the length of the hash. Every two HEX characters is one byte. Accordingly, the resulting string is 22 bytes long. In HEX it will be 0x16.


    Add 0x16to the beginning of the line.


    0x"16 EFBBBF D09F D180 D0B8 D0B2 D0B5 D182 20 D0BC D0B8 D180"

    [varint ID hash]


    Now we need a hash identifier. A trivial hash or identity in the hash table has an identifier 0x00.


    Add 0x00to the beginning of the line.


    0x"00 16 EFBBBF D09F D180 D0B8 D0B2 D0B5 D182 20 D0BC D0B8 D180"

    This is already a multi-cache part of the identifier. You can recode HEX into Base58 and the multi-cache is ready. But ipfs does not recognize it outside the content identifier (CID).


    Go ahead.


    [varint content type]


    Now let's look at the multicodec table to get the content type. In our case, this is the raw data and the identifier, respectively 0x55.


    Add 0x55to the beginning of the line.


    0x"55 00 16 EFBBBF D09F D180 D0B8 D0B2 D0B5 D182 20 D0BC D0B8 D180"

    [varint version CID]


    We code to the format of the first version of the content identifier . Therefore we add 0x01.


    Add 0x01to the beginning of the line.


    0x"01 55 00 16 EFBBBF D09F D180 D0B8 D0B2 D0B5 D182 20 D0BC D0B8 D180"

    And so we are already at the finish line.


    [base prefix]


    It indicates which variant of encoding binary data to text is used.


    HEX (F)


    We can directly use the HEX term by appending at the beginning the prefix of the base HEX character "F"


    F01550016EFBBBFD09FD180D0B8D0B2D0B5D18220D0BCD0B8D180

    We got a HEX content identifier which contains utf-8 line: "Hello world"


    We are testing : / ipfs / F01550016EFBBBFD09FD180D0B8D0B2D0B5D18220D0BCD0B8D180


    Base58btc (z)


    Base58btc will be shorter therefore


    We translate our HEX string to base58btc. You can use the online converter .


    0x"01 55 00 16 EFBBBF D09F D180 D0B8 D0B2 D0B5 D182 20 D0BC D0B8 D180" = "3NDGAEgXCxbPucFFCQc9s5ScqZjqVFNr56P" (base58btc)

    At the beginning of the resulting string, add the base prefix symbol base58btc "z"


    z3NDGAEgXCxbPucFFCQc9s5ScqZjqVFNr56P

    We received the base58btc content identifier which contains the utf-8 line: "Hello world"


    We test : / ipfs / z3NDGAEgXCxbPucFFCQc9s5ScqZjqVFNr56P


    DAG block


    The text is good, but in order to encode the HTML page we need to attach its data to the DAG directory block.


    Here is our HTML:


    <b><i><u>Привет мир</u></i></b>

    Similarly, according to the instructions above, we get the content ID in base58btc for this text:


    zeExnPvBXdTRwCBhfkJ1fHFDaXpdW4ghvQjfaCRHYxtQnd3H4w1MPbLczSqyCqVo

    Now we write the JSON file:


    {
        "links": [{
            "Cid": {
                "/": "zeExnPvBXdTRwCBhfkJ1fHFDaXpdW4ghvQjfaCRHYxtQnd3H4w1MPbLczSqyCqVo"
            },
            "Name": "index.html"
        }],
        "data": "CAE="
    }

    1. In "data" the type of DAG of the block is specified - directory.
    2. "links" is an array of links to files.
    3. "Name" is the corresponding file name.
    4. "Cid" contains the content identifier

    Team ipfs dag put -f"protobuf"convert JSON to DAG block via IPFS.


    I got a multi-cache: QmXXixn4rCzGguhxQPjXQ8Mr5rdqwZfJTKkeB6DfZLt8EZ


    At this stage we got a block in which a directory with one file inscribed in the block.


    Next, using this multi-cache, unload the finished block.


    ipfs block get QmXXixn4rCzGguhxQPjXQ8Mr5rdqwZfJTKkeB6DfZLt8EZ > block.dag

    We translate the contents of block.dag to HEX:


    0x"123F0A2F0155002BEFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E120A696E6465782E68746D6C18000A020801"

    Add:


    1. CID version (0x01)
    2. DAG Content Type (0x70)
    3. trivial hash (0x00)
    4. data size 69 bytes (0x45)

    0x"01 70 00 45 123F0A2F0155002BEFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E120A696E6465782E68746D6C18000A020801"

    Convert to Base58btc and add the prefix "z"


    z6S3Z3W1zuRxio8AJC41jRTdyU9pZWnU6sNbvyGyypEdD8JVNdW42ZmGYWKWGbVDELLvJNWcMspaZMUPZKt7JQmhdyXCqq7j37GL

    Thus, we received a content identifier with a directory in which the html index.html page with the text "Hello world".


    We are testing : / ipfs / z6S3Z3W1zuRxio8AJC41jRTdyU9pZWnU6sNbvyGyypEdD8JVNdW42ZmGYWKWGbVDELLvJNWcMspaZMUPZKt7JQmhdyXCqQy4j4j4j4j4jmJJWCMspaZMUPZKt7JQmhYXCqj4j4j4j4j4j4jmjjpcdcdp4dvdbjp3d3d3d3d3d3c3d3d3d3d3c3d3d3d3d3d3c3d3d3d3d3c3d3d3d3d3c3d3dc


    Further, this hash can also be attached to another block or written to the DNS dnslink record. So in one block you can fit a small simple site.


    DAG block and Protocol Buffers


    DAG unit can also be assembled manually. A DAG block is data in the Protocol Buffers format . The top layer is merkledag.proto which has unixfs.proto in Data .


    Protocol buffers


    Any protobuffer starts with a varint field id. Often the identifier occupies one byte because its total value is less than 0x80. In our case, the first byte is 0x12. The lower 3 bits of this field are type. The rest of the ID specified in the proto file.


    Length-delimited

    Decrypt the identifier:


    0x12 & 0x07 = 2 (Тип: Length-delimited)
    0x12 >> 3 = 2 (ID: 2)

    Length-delimited means that the varint size of the field in bytes and its contents immediately follows. This type is used for various nested structures as well as raw data (string, bytes, embedded messages, packed repeated fields). What it defines already proto file.


    Varint

    We decipher the identifier of another type:


    0x18 & 0x07 = 0 (Тип: Varint)
    0x12 >> 3 = 3 (ID: 3)

    Varint means that the next value immediately in the varint. This container is used to record many types of values ​​(int32, int64, uint32, uint64, sint32, sint64, bool, enum). That it also defines the proto file.


    Let's sort the block.dag that we translated to HEX above


    To parse a block, you can use a site that automatically parses any Protocol Buffer without using proto files.


    0x"123F0A2F0155002BEFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E120A696E6465782E68746D6C18000A020801"

    Parse the block and match the identifiers from the proto files.


    merkledag.proto
    // An IPFS MerkleDAG Link
    message PBLink {
      // multihash of the target objectoptional bytes Hash = 1;
      // utf string name. should be unique per objectoptional string Name = 2;
      // cumulative size of target objectoptional uint64 Tsize = 3;
    }
    // An IPFS MerkleDAG Node
    message PBNode {
      // refs to other objects
      repeated PBLinkLinks = 2;
      // opaque user dataoptional bytes Data = 1;
    }

    unixfs.proto
    message Data {
        enum DataType {
            Raw = 0;
            Directory = 1;
            File = 2;
            Metadata = 3;
            Symlink = 4;
            HAMTShard = 5;
        }
        required DataType Type = 1;
        optional bytes Data = 2;
        optional uint64 filesize = 3;
        repeated uint64 blocksizes = 4;
        optional uint64 hashType = 5;
        optional uint64 fanout = 6;
    }

    12 (Тип: 2 (Length-delimited). ID: 2 (PBLink PBNode.Links (merkledag.proto)))
     3F (Размер: 63 байта)
      0A (Тип: 2 (Length-delimited). ID: 1 (PBLink.Hash))
       2F (Размер: 47 байта)
        0155002B (CIDv1 Raw Identity 43 байта)
         EFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E 
          = "<b><i><u>Привет мир</u></i></b>"12 (Тип: 2 (Length-delimited). ID: 2 (PBLink.Name))
       0A (Размер: 10 байт)
        696E6465782E68746D6C = "index.html"18 (Тип: 0 (Varint). ID: 3 (PBLink.Size))
       00 (Значение: 0)
    0A (Тип: 2 (Length-delimited). ID: 1 (PBNode.Data = Data (unixfs.proto)))
     02 (Размер: 2 байт)
      08 (Тип: 0 (Varint). ID: 1 (Data.Type))
       01 (1 == Data.DataType.Directory)

    Accordingly, a block with two files will look like this:


    12 (Тип: 2 (Length-delimited). ID: 2 (PBLink PBNode.Links (merkledag.proto)))
     3B (Размер: 59 байт)
      0A (Тип: 2 (Length-delimited). ID: 1 (PBLink.Hash))
       2F (Размер: 47 байта)
        0155002B (CIDv1 Raw Identity 43 байта)
         EFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E 
          = "<b><i><u>Привет мир</u></i></b>"12 (Тип: 2 (Length-delimited). ID: 2 (PBLink.Name))
       06 (Размер: 6 байт)
        312E68746D6C = "1.html"18 (Тип: 0 (Varint). ID: 3 (PBLink.Size))
       00 (Значение: 0)
    12 (Тип: 2 (Length-delimited). ID: 2 (PBLink PBNode.Links))
     3B (Размер: 59 байт)
      0A (Тип: 2 (Length-delimited). ID: 1 (PBLink.Hash))
       2F (Размер: 47 байта)
        0155002B (CIDv1 Raw Identity 43 байта)
         EFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E 
          = "<b><i><u>Привет мир</u></i></b>"12 (Тип: 2 (Length-delimited). ID: 2 (PBLink.Name))
       06 (Размер: 6 байт)
        322E68746D6C = "2.html"18 (Тип: 0 (Varint). ID: 3 (PBLink.Size))
       00 (Значение: 0)
    0A (Тип: 2 (Length-delimited). ID: 1 (PBNode.Data = Data(unixfs.proto)))
     02 (Размер: 2 байт)
      08 (Тип: 0 (Varint). ID: 1 (Data.Type))
       01 (1 == Data.DataType.Directory)

    That is, the PBNode.Links (0x12) field is repeated as many times as the number of files should be placed in the block.


    To check, add at the beginning of "F 01 70 00" (HEX CIDv1 DAG Identity) and the DAG size of the block "7E" (126 bytes)


    F017000 7E
    12 3B 0A 2F 015500 2B EFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E 1206 312E68746D6C 180012 3B 0A 2F 015500 2B EFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E 1206 322E68746D6C 1800
    0A 020801

    Check: / ipfs / F0170007E123B0A2F0155002BEFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E1206312E68746D6C1800123B0A2F0155002BEFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E1206322E68746D6C18000A020801


    Conclusion


    I hope I gave enough information in order to be able to implement the creation of blocks and identifiers.


    Also popular now: