Hardware support for AES algorithm by modern processors

    Intel in 2008 proposed new teams for the x86 architecture, which added hardware support for the symmetric AES (Advanced Encryption Standard) encryption algorithm. AES is currently one of the most popular block cipher algorithms. Therefore, a hardware implementation should lead to increased productivity of programs using this encryption algorithm (OpenSSL, The Bat, TrueCrypt ... ). The new expansion of teams received the name AES-NI. It contains the following instructions:
    • AESENC - Perform one round of AES encryption,
    • AESENCLAST- Perform the last round of AES encryption,
    • AESDEC - Perform one round of decryption of AES,
    • AESDECLAST - Perform the last round of AES decryption,
    • AESKEYGENASSIST - Contribute to the generation of the AES round key,
    • AESIMC - Reverse Mix Columns.

    Since much has already been said about the AES encryption algorithm itself, in this post we will look at how to use these instructions.

    First, remember how AES works. This is required in order to understand what mechanisms are implemented in these instructions.
    The AES algorithm uses 4 functions:
    1. AddRound - XOR (exclusive or) messages with a key,
    2. SubBytes - substitution function,
    3. ShiftRows - a cyclic shift of fields in a block according to a given rule,
    4. MixColumns - mixing procedure.

    The encryption algorithm itself looks like this:

    Getting started

    To get started, you need to make sure that the AES-NI extension is present in our processor. To do this, there is a special CPUID command, which, with the value eax = 0x00000001, should set bits in registers relative to the present extensions. For the AES extension, this is 25 bits of the ECX register:
    AES-NI verification code:
    		mov eax,0x00000001;
    		test ecx,0x2000000;
    		je L_no_AES; 

    If the bit is set to 1, then we can move on to encryption.

    Key Extension / ExpandKey

    The key expansion algorithm in pseudo-code looks like this:
    KeyExpansion(byte key[4*Nk], word w[Nb*(Nr+1)], Nk)
        word temp
        i = 0;
        while ( i < Nk)
            w[i] = word(key[4*i], key[4*i+1], key[4*i+2], key[4*i+3])
            i = i+1
        end while
        i = Nk
        while ( i < Nb * (Nr+1))
            temp = w[i-1]
            if (i mod Nk = 0)
                temp = SubWord(RotWord(temp)) xor Rcon[i/Nk]
            else if (Nk > 6 and i mod Nk = 4)
                temp = SubWord(temp)
            end if
            w[i] = w[i-Nk] xor temp
            i = i + 1
        end while

    For hardware support, you must use the AESKEYGENASSIST instruction, which will execute:
    AESKEYGENASSIST xmm1, xmm2/m128, imm8 
    Tmp := xmm2/LOAD(m128) 
    X3[31-0] = Tmp[127-96]; 
    X2[31-0] = Tmp[95-64]; 
    X1[31-0] = Tmp[63-32]; 
    X0[31-0] = Tmp[31-0]; 
    RCON[7-0]:= imm8; 
    RCON [31-8]:= 0; 
    xmm1 :=[RotWord (SubWord (X3)) XOR RCON, SubWord (X3), RotWord (SubWord (X1)) XOR RCON, SubWord (X1)] 

    As you can easily see, the instruction does not execute:
    w[i] = w[i-Nk] xor temp

    You will have to perform these operations yourself using the MMX instructions.
    Key extension 128b example
    aeskeygenassist xmm2, xmm1, 0x1 ; 1 раунд
    pshufd xmm2, xmm2, 0xff; 
    movups xmm3, xmm4;
    pxor xmm2,xmm3;
    pshufd xmm2, xmm2, 0x00;
    pshufd xmm3, xmm3, 0x39;
    pslldq xmm3,0x4;
    pxor xmm2,xmm3;
    pshufd xmm2, xmm2, 0x14;
    pshufd xmm3, xmm3, 0x38;
    pslldq xmm3,0x4;
    pxor xmm2,xmm3;
     pshufd xmm2, xmm2, 0xA4;
    pshufd xmm3, xmm3, 0x34;
    pslldq xmm3,0x4;
    pxor xmm2,xmm3;


    To implement one round of encryption, the AESENC instruction is used, which performs the following actions:

    AESENC xmm1, xmm2/m128
    Tmp = xmm1
    Round Key := xmm2/m128
    Tmp = ShiftRows (Tmp)
    Tmp = SubBytes (Tmp)
    Tmp = MixColumns (Tmp)
    xmm1 = Tmp xor Round Key 

    The last round of encryption is implemented using the AESENCLAST statement:
    AESENC xmm1, xmm2/m128
    Tmp = xmm1
    Round Key := xmm2/m128
    Tmp = ShiftRows (Tmp)
    Tmp = SubBytes (Tmp)
    xmm1 = Tmp xor Round Key 

    The difference between this instruction and AESENC is that the MixColums operation in the last step is not performed:
    Encryption Example
    aesenc xmm1, xmm2 ; 
    aesenclast xmm1, xmm3;


    To implement the decryption procedure, the AESDEC instruction is used:

    AESDEC xmm1, xmm2/m128
    Tmp = xmm1
    Round Key = xmm2/m128
    Tmp = InvShift Rows (Tmp)
    Tmp = InvSubBytes (Tmp)
    Tmp = InvMixColumns (Tmp)
    xmm1 = Tmp xor Round Key 

    To get InvKey, you need to perform the InvMixClomuns operation on the key. The instruction that does this is AESIMC xmm1.xmm2.
    And for the last round of decryption, the AESDECLAST statement is used:
    AESDECLAST xmm1, xmm2/m128
    State = xmm1
    Round Key = xmm2/m128
    Tmp = InvShift Rows (State)
    Tmp = InvSubBytes (Tmp)
    xmm1= Tmp xor RoundKey

    Decryption Example
    aesmic xmm2,xmm2;
    aesdec xmm1, xmm2 ; 
    aesdeclast xmm1, xmm3;

    So, hardware support should give us a decent increase in encryption speed. As a completion of the post, I will give a C ++ class that implements encryption and decryption operations in ECB mode. After running the test, the encryption speed on one core i5-3740 (3.2GHz) was achieved, equal to 320MB / sec


    1. Intel Advanced Encryption Standard (AES) New Instructions Set
    2. List of processors supporting AES-NI expansion
    3. C ++ class with AES-NI assembler inserts
    4. Animashka how AES works
    5. Wikipedia article listing libraries and programs using AES-Ni instructions

    Also popular now: