Secure crypto programming. Part 1

    In this post we would like to introduce Habr users to the basic rules of cryptographic algorithm programming. This set of rules, called the “Cryptography coding standard”, was created in 2013 at the initiative of one of the gurus of modern cryptography, Jean-Philippe Omasson . Despite the fact that the approaches described in it are well known to those who are professionally involved in the development of security systems, for beginners and students, we think it will be interesting to get acquainted with the proposed text, which is a translation of the set of rules from the cryptocoding.net website .

    1. Compare rows of secret data in constant time


    Problem


    Byte comparison of strings can be used to implement attacks that use information about the execution time of the program (so-called timing attacks), for example, to fake authentication codes for MAC messages (see the vulnerability in the Google Keyczar cryptographic library ).

    Built-in implementations of the comparison functionality, such as the memcmp function, the Arrays.equals (Java) method, or the == (Python) operator, usually do not execute in constant time. Consider, for example, the implementation of [memcmp] from Microsoft CRT :

    EXTERN_C int __cdecl memcmp(const void *Ptr1, const void *Ptr2, size_t Count)
    {
        INT v = 0;
        BYTE *p1 = (BYTE *)Ptr1;
        BYTE *p2 = (BYTE *)Ptr2;
        while(Count-- > 0 && v == 0) {
            v = *(p1++) - *(p2++);
        }
        return v;
    }
    

    Decision



    Use comparison functions performed over a fixed time, such as those offered in the NaCL library:

    
    int crypto_verify(const unsigned char *x,const unsigned char *y)
    {
      unsigned int differentbits = 0;
    #define F(i) differentbits |= x[i] ^ y[i];
      F(0)
      F(1)
      F(2)
      F(3)
      F(4)
      F(5)
      F(6)
      F(7)
      F(8)
      F(9)
      F(10)
      F(11)
      F(12)
      F(13)
      F(14)
      F(15)
      return (1 & ((differentbits - 1) >> 8)) - 1; /* returns 0 if equal, 0xFF..FF otherwise */
    }
    

    A more general version of the same approach is as follows:

    
    int util_cmp_const(const void * a, const void *b, const size_t size) 
    {
      const unsigned char *_a = (const unsigned char *) a;
      const unsigned char *_b = (const unsigned char *) b;
      unsigned char result = 0;
      size_t i;
      for (i = 0; i < size; i++) {
        result |= _a[i] ^ _b[i];
      }
      return result; /* returns 0 if equal, nonzero otherwise */
    }
    

    Consider examples of implementing comparison functions and tests for 32-bit values ​​that are executed in a fixed time:

    #include 
    /* Сравнение беззнаковых величин */
    /* Возвращает 1 если условие выполнено, 0 в противном случае*/
    int ct_isnonzero_u32(uint32_t x)
    {
        return (x|-x)>>31;
    }
    int ct_iszero_u32(uint32_t x)
    {
        return 1 ^ ct_isnonzero_u32(x);
    }
    int ct_neq_u32(uint32_t x, uint32_t y)
    {
        return ((x-y)|(y-x))>>31;
    }
    int ct_eq_u32(uint32_t x, uint32_t y)
    {
        return 1 ^ ct_neq_u32(x, y);
    }
    int ct_lt_u32(uint32_t x, uint32_t y)
    {
        return (x^((x^y)|((x-y)^y)))>>31;
    }
    int ct_gt_u32(uint32_t x, uint32_t y)
    {
        return ct_lt_u32(y, x);
    }
    int ct_le_u32(uint32_t x, uint32_t y)
    {
        return 1 ^ ct_gt_u32(x, y);
    }
    int ct_ge_u32(uint32_t x, uint32_t y)
    {
        return 1 ^ ct_lt_u32(x, y);
    }
    /* Сравнение знаковых величин */
    /* Возвращает 1 если условие выполнено, 0 в противном случае*/
    int ct_isnonzero_s32(uint32_t x)
    {
        return (x|-x)>>31;
    }
    int ct_iszero_s32(uint32_t x)
    {
        return 1 ^ ct_isnonzero_s32(x);
    }
    int ct_neq_s32(uint32_t x, uint32_t y)
    {
        return ((x-y)|(y-x))>>31;
    }
    int ct_eq_s32(uint32_t x, uint32_t y)
    {
        return 1 ^ ct_neq_s32(x, y);
    }
    int ct_lt_s32(uint32_t x, uint32_t y)
    {
        return (x^((x^(x-y))&(y^(x-y))))>>31;
    }
    int ct_gt_s32(uint32_t x, uint32_t y)
    {
        return ct_lt_s32(y, x);
    }
    int ct_le_s32(uint32_t x, uint32_t y)
    {
        return 1 ^ ct_gt_s32(x, y);
    }
    int ct_ge_s32(uint32_t x, uint32_t y)
    {
        return 1 ^ ct_lt_s32(x, y);
    }
    /* Generate a mask: 0xFFFFFFFF if bit != 0, 0 otherwise */
    uint32_t ct_mask_u32(uint32_t bit)
    {
        return -(uint32_t)ct_isnonzero_u32(bit);
    }
    /* Возвращает x или y в зависимости от значения параметра bit*/
    /* Эквивалентно: return bit ? x : y */
    uint32_t ct_select_u32(uint32_t x, uint32_t y, uint32_t bit)
    {
        uint32_t m = ct_mask_u32(bit);
        return (x&m) | (y&~m);
        /* return ((x^y)&m)^y; */
    }
    

    These approaches do not give any guarantees: C and C ++ use the “as if” rule , which allows the compiler to implement operations arbitrarily, if the observed behavior of the program (runtime is not considered observable behavior in both languages) remains unchanged. For example, consider the following ct_select_u32 option:

    uint32_t ct_select_u32(uint32_t x, uint32_t y, _Bool bit)
    {
        uint32_t m = ct_mask_u32(bit);
        return (x&m) | (y&~m);
    }
    

    If we now compile this code with the parameters clang-3.5 -O2 -m32 -march = i386, we get the result:

    ct_select_u32:                          
    	mov	al, byte ptr [esp + 12]
    	test	al, al
    	jne	.LBB0_1
    	lea	eax, dword ptr [esp + 8]
    	mov	eax, dword ptr [eax]
    	ret
    .LBB0_1:
    	lea	eax, dword ptr [esp + 4]
    	mov	eax, dword ptr [eax]
    	ret
    

    Due to the uneven speed of execution of conditional branch instructions, this code can potentially reveal the selected secret value. Since the compiler has almost unlimited freedom in generating code that can be executed at different times, it is necessary to check the final assembly for execution for a fixed time.

    Avoid secret program branching


    Problem


    If the conditional branching of the program (if, switch, while, for) depends on the secret data, then the executable code, as well as the time of its execution, depends on the secret data.

    A classic example of exploiting this vulnerability is timing attacks on exponential algorithms based on squaring and multiplication (or doubling and addition for algorithms using elliptic curves).

    A particular case of this problem are cycles in which the boundary value of the counter depends on the secret value.

    Decision


    Leaks in the execution time of the program can be counteracted by inserting dummy operations in the program branches in order to guarantee a fixed time for its execution. However, it is much more reliable to completely avoid branching, for example, by implementing conditional statements in the form of a straightforward program. For example, a choice of two inputs | a | and | b | depending on the control bit | bit | can be implemented as follows:

    unsigned select (unsigned a, unsigned b, unsigned bit) 
    {
        /* -0 = 0, -1 = 0xff....ff */
        unsigned mask = - bit;
        unsigned ret = mask & (a^b);
        ret = ret ^ a;
        return ret;
    }
    

    For Intel processors, a faster solution is possible, which is based on the use of CMOV conditional branch instructions .

    Avoid accessing secret sensitive data table


    Problem


    The access time to a table element may vary depending on the value of its index (for example, if the element is not in the cache). This effect was used in a number of cache attacks on AES.

    Decision


    Replace table calls with a sequence of logical operations with constant execution time, for example, using the bit-slice technique.

    Avoid cycles in which the limit value of the counter depends on the secret value


    Problem


    A loop with a counter boundary value, which depends on the secret value, directly makes the program vulnerable to attacks by runtime. For example, the implementation of the Montgomery ladder (exponentiation algorithm) in OpenSSL 0.9.8o allowed the logarithm (secret random number) to leak in the ECDSA algorithm, which could be used to obtain the TLS server secret key. The corresponding program code is presented below. Here | scalar | Is a secret random number, and | scalar-> d | - pointer to an array of its bits:

    /* find top most bit and go one past it */
    i = scalar -> top - 1; j = BN_BITS2 - 1;
    mask = BN_TBIT ;
    while (!( scalar -> d[i] & mask )) { mask >>= 1; j --; }
    mask >>= 1; j - -;
    /* if top most bit was at word break , go to next word */
    if (! mask )
    {
      i - -; j = BN_BITS2 - 1;
      mask = BN_TBIT ;
    }
    for (; i >= 0; i - -)
    {
      for (; j >= 0; j - -)
      {
        if ( scalar ->d[ i] & mask )
        {
          if (! gf2m_Madd ( group , & point ->X , x1 , z1 , x2 , z2 , ctx )) goto err ;
          if (! gf2m_Mdouble ( group , x2 , z2 , ctx )) goto err ;
        }
        else
        {
          if (! gf2m_Madd ( group , & point ->X , x2 , z2 , x1 , z1 , ctx )) goto err ;
          if (! gf2m_Mdouble ( group , x1 , z1 , ctx )) goto err ;
        }
        mask >>= 1;
      }
      j = BN_BITS2 - 1;
      mask = BN_TBIT;
    }
    

    Decision


    Make sure that the number of iterations in all loops is limited by a constant (or at least some unclassified variable).

    In particular, make sure, as far as possible, that the boundaries of the cycles and situations in which the counter values ​​exceed these limits or do not reach them do not depend on user-controlled input data (doesn’t anyone need your own Heartbleed ?).

    To be continued…

    Also popular now: