AES 白盒学习

前言

最近终于考完学校里的一些课程了，也有时间来看看这个最近几次在逆向题目中遇到的大坑了。

本文对互联网上部分查询到的资料进行学习，并将其进行整理记录，具体深入了解还需要读者自行完成。

~~实际上就是给自己菜找个借口~~

白盒AES的实现

白盒算法是将密钥混淆到算法中，让攻击者无法获取到算法的内部细节，也无法还原出密钥的一种算法(实际上还是有方法的)。

通常来说白盒AES的实现有三种方式：Chow等人的查找表方式、Bringer等人的插入扰乱项的方式、Biryukov等人的多变量密码的方式，下面进行简单的介绍三种方式：

查找表技术

白盒实现:

Chow 等设计的 AES 白盒实现的主要方法是给定一个密钥, 把 AES 的每一轮拆分成一个个小模块, 然后对每个小模块进行置乱编码, 并将每个模块所有可能的输入输出做成一个查找表, 用查找表来表示这些模块. 白盒 AES 的执行过程就转换成对一个个查找表进行查找的过程.

效率评价:

如上所述, 整个 AES 的执行过程都可以用查找表来实现, 每一次执行需要 3008 次查表. 总的查找表大小为 770048 B(752 KB).

安全评价:

在 2004 年, Billet 等人提出了一个非常有效的 BGE 攻击方法, 他们选择某些特定的查找表, 合并成一个可以用输入输出表示的函数, 使用代数的方法去掉其中的非线性部分, 提取出隐藏在T-Box 中的密钥。这个攻击在 2008 年由 Michiels 等人改进为一种通用攻击方法, 可以对类似算法的白盒实现进行攻击。2013 年, Lepoint 等人提出了一种更加有效的攻击方法, 能够以 2的22次方复杂度恢复 AES 的私钥。

插入扰乱项

白盒实现:

与前面所使用的方法全然不同, 2006 年 Bringer 等人中提出一个新 AES 白盒实现方法, 该方法使用同构多项式问题(Isomorphism of Polynomials, IP problem), 其主要思想是增加额外的扰乱方程到原始的方程中去, 以此来扰乱原本方程中的代数结构, 从而使得针对代数结构进行的攻击变得困难。

效率评价:

Bringer 白盒 AES 实现一个实例需要占用 142 MB 空间, 因为其方案中采用的 0 多项式由 4个多项式来构造, 需要 4 个运行实例, 所以该白盒实现方法最后需要占用 568 MB 空间, 其总共占用空间比Chow 的白盒实现提高了近 900 倍。

安全评价:

Bringer 等人在文中从三个方面对其方案的安全性进行了分析, 说明该方案有足够的能力抵抗其描述的攻击. 但是, 2010 年 Mulder 等人提出对该白盒 AES 实现的攻击方法, 能够以较低的复杂度恢复出等价密钥。

多变量密码

白盒实现:

在各种白盒实现方案被相继攻破之后, 2014 年, Alex Biryukov 等人提出了一种新的白盒密码设计方式。这种设计方式并不是已有的密码算法的白盒实现, 而是基于 ASASA(Affine-Sbox)结构的通用白盒密码设计。

Alex 等人的设计分为两类: 强白盒密码设计和弱白盒密码设计。在强白盒密码中, 作者使用有限域上的多变量多项式的方法, 分别以 χ-scheme 和扩展 S 盒(expanding S-box)来实现 ASASA 中的 S 盒, 得出两种不同的 ASASA 白盒方案, 这两种方案都需要插入扰乱项来抵抗攻击。

在弱白盒密码中, 使用查找表的方式来实现。第一种方案是单一 ASASA 结构的对称加密方案；第二种方案是一个 SPN 结构的对称加密方案, 该方案 S(代换)层由多个第一种方案中构造的 ASASA 或单一的 S 盒构成, 每一个 ASASA 模块或 S 盒由一个查找表实现。

效率评价:

以 128 bit 输入为例, 在强白盒密码的两个方案中, 以 χ-scheme 构造非线性层的方案需要300 MB 存储空间, 以扩展 S 盒来构造非线性层的方案需要 24 MB. 弱白盒密码的两个方案中, 单一ASASA 结构的对称加密方案需要 196 KB, SPN 结构的对称加密方案按照不同层数最低需要 8 MB 的存储空间, 最高需要 20 GB 的存储空间。

安全评价:

目前, 除了文中作者的安全性分析以外, 还未有针对这四种方案的已知攻击(数据截止至2020年)。

置换表方案的白盒AES构造思想

Chow方案的白盒AES正如我们前面所提到的是通过将AES中的执行过程以查找表的方式来进行实现，而为了方便基于查表的AES方法的实现，需要将原来的AES结构进行修改，原先的AES的结构如下：

在最终轮的运算过程中没有列混合运算

调整轮函数结构

而在最开始的初始变换的过程中，我们实际上是进行了轮密钥加的一个操作的过程，如果我们稍微对其流程进行修改为下图，其整体上整个算法的结构还是不变的：

交换字节替代和行移位

因为行以为本身不影响数据的内容，而只是单纯的将其位置的移动。而字节替代又和数据相关，因此如果我们将这两项进行顺序的交换也是无伤大雅的。参考下图：

整体的字节替代过程中我们相当于是将输入的数据的十六进制的第一位作为对应SBox的行，第二位作为对应的列，因此对应的字节替代过程仅与数据相关。具体可以参考下面实现代码：

void subBytes(int a[4][4], int encode){
    // encode 为1 代表字节替代，为0代表逆字节替代
    for (int i = 0; i < 4; ++i) {
        for (int j = 0; j < 4; ++j) {
            int temp = a[i][j];
            int row = temp / 16;
            int column = temp % 16;
            if (encode)
                a[i][j] = S_BOX[row][column];
            else
                a[i][j] = INVERSE_S_BOX[row][column];
        }
    }
}

那么此时我们的AES的执行流程如下：

交换轮密钥加和行移位

因为行移位是线性的，而轮密钥加则是按找对应矩阵的位置进行异或数据，如果我们将行移位和轮密钥加进行交换时，那么我们对对应的轮密钥进行同样的置换运算即可与原来的数据进行对应起来。此时我们的AES执行流程如下：

置换表方案的白盒AES置换表构造

T-Boxes

我们回到我们之前所修改的AES流程上，我们观察之前的轮密钥加以及字节替代我们可以构造以下函数，对任意输入的一个状态我们得到特定的输出：

其中 r 为轮次，i 为密钥轮次，x 为对应的输入数据(x 的实际取值应该是 0 - 255)

我们枚举上述中的所有情况便可以得到一个轮密钥加和字节替代结合的一张置换表(T-boxes)，Rust实现代码如下：

fn add_round_key_after_shift(state: &mut [u8; 16], key: &[u32; 4]) { // 用于计算ty_boxes 
    for i in 0..4 {
        for j in 0..4 {
            state[i * 4 + j] ^= (key[(j + i) % 4] >> ((3 - j) * 8)) as u8; // key 也进行了行移位
        }
    }
}

fn calculate_t_boxes(round_key: &[u32; 44]) -> [[[u8; 256]; 16]; 10] {
    let mut t_boxes = [[[0u8; 256]; 16]; 10];
    for r in 0..10 { // 相当于执行完了10轮的加密，得到了t_boxes
        for x in 0..256 {
            let mut state = [x as u8; 16];
            add_round_key_after_shift(&mut state, &round_key[r * 4..(r * 4 + 4)].try_into().expect(""));
            sub_bytes(&mut state);
            if r == 9 {
                add_round_key(&mut state, round_key[40..44].try_into().expect(""))
            }
            for i in 0..16 {
                t_boxes[r][i][x] = state[i];
            }
        }
    }
    return t_boxes;
}

Ty-Boxes

对于列混淆，我们计算的过程是GF(256)上的加法运算，计算过程有如下图：

那么对于上述的过程我们其实也可以找到一个函数：

我们分别计算对应有限域上的固定矩阵数据分别对0 - 255范围的数据进行异或后得到的值构建相应的表即可得到对应的置换表，Rust实现如下：

fn calculate_ty() -> [[[u8; 4]; 256]; 4] {
    let mut ty = [[[0u8; 4]; 256]; 4];
    for x in 0..=255 {
        ty[0][x][0] = gmul(x as u8, 2);
        ty[0][x][1] = gmul(x as u8, 3);
        ty[0][x][2] = x as u8;
        ty[0][x][3] = x as u8;
        ty[1][x][0] = x as u8;
        ty[1][x][1] = gmul(x as u8, 2);
        ty[1][x][2] = gmul(x as u8, 3);
        ty[1][x][3] = x as u8;
        ty[2][x][0] = x as u8;
        ty[2][x][1] = x as u8;
        ty[2][x][2] = gmul(x as u8, 2);
        ty[2][x][3] = gmul(x as u8, 3);
        ty[3][x][0] = gmul(x as u8, 3);
        ty[3][x][1] = x as u8;
        ty[3][x][2] = x as u8;
        ty[3][x][3] = gmul(x as u8, 2);
    }
    ty
}

异或表 Xor-Tables

实际上是对于每轮当中的两个半字节进行一个查表的异或运算

其中 n 代表一轮需要的异或次数，r 代表轮次，i，j 代表 4 bits 数的所有可能(0 - 255)，实现代码如下：

fn calculate_xor_table() -> [[[[u8; 16]; 16]; 96]; 9] {
    let mut xor_table = [[[[0u8; 16]; 16]; 96]; 9];
    for r in 0..9 { // 代表第几轮
        for n in 0..96 { // 代表一轮需要的异或次数
            for i in 0..16 { // 代表第一个输入
                for j in 0..16 { // 代表第二个输入
                    xor_table[r][n][i][j] = (i ^ j) as u8;
                }
            }
        }
    }
    xor_table
}

表的合并

我们可以发现T-Boxes和Ty-Boxes可以进行合并处理，通过组合查找表可以减少执行加密所需的单个表的访问次数：

对应实现代码如下所示：

fn calculate_ty_boxes(round_key: &[u32; 44]) -> ([[[u32; 256]; 16]; 10], [[u8; 256]; 16]) {
    let t_boxes = calculate_t_boxes(&round_key); // 计算t_boxes
    let ty: [[[u8; 4]; 256]; 4] = calculate_ty(); // 列混淆的置换表 ty_boxes
    let mut ty_boxes = [[[0u32; 256]; 16]; 10];
    let mut last_t_boxes = [[0u8; 256]; 16];
    for r in 0..9 {
        for x in 0..256 {
            for j in 0..4 {
                for i in 0..4 {
                    let v0 = ty[0][t_boxes[r][j * 4 + i][x] as usize][i] as u32;
                    let v1 = ty[1][t_boxes[r][j * 4 + i][x] as usize][i] as u32;
                    let v2 = ty[2][t_boxes[r][j * 4 + i][x] as usize][i] as u32;
                    let v3 = ty[3][t_boxes[r][j * 4 + i][x] as usize][i] as u32;
                    ty_boxes[r][j * 4 + i][x] = (v0.wrapping_shl(24) | (v1.wrapping_shl(16)) | v2.wrapping_shl(8) | v3) as u32; // 表的合并
                }
            }
        }
    }
    for x in 0..256 {
        for i in 0..16 {
            last_t_boxes[i][x] = t_boxes[9][i][x];
        }
    }
    // println!("ty_boxes{:x?}", ty_boxes);
    (ty_boxes, last_t_boxes)
}

置换的总体实现

对应的通过置换表的方式来进行实现可以通过T-Boxes、Ty-Boxes、Xor-Tables来进行实现，对应的整体逻辑流程图可以参考下图：

对应置换表的方式实现代码(Rust)：

use std::convert::TryInto;
// region 标准aes实现
static SBOX: [[u8; 16]; 16] = [
    [0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5, 0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76],
    [0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0, 0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0],
    [0xb7, 0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc, 0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15],
    [0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a, 0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75],
    [0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0, 0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84],
    [0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b, 0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf],
    [0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85, 0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c, 0x9f, 0xa8],
    [0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5, 0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2],
    [0xcd, 0x0c, 0x13, 0xec, 0x5f, 0x97, 0x44, 0x17, 0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73],
    [0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88, 0x46, 0xee, 0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb],
    [0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c, 0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79],
    [0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9, 0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08],
    [0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6, 0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a],
    [0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e, 0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e],
    [0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e, 0x94, 0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf],
    [0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68, 0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16],
];
fn gmul(ap: u8, bp: u8) -> u8 {
    let mut p = 0u8;
    let mut a = ap;
    let mut b = bp;
    while a != 0 && b != 0 {
        if b & 1 != 0 {
            p ^= a;
        }
        if a & 0x80 != 0 {
            // XOR with the primitive polynomial x^8 + x^4 + x^3 + x + 1 (0b1_0001_1011) – you can change it but it must be irreducible
            a = (((a << 1) as u16) ^ 0x11b) as u8;
        } else {
            a = a << 1;
        }
        b >>= 1;
    }
    return p & 0xFF;
}
fn add_round_key(state: &mut [u8; 16], key: &[u32; 4]) {
    for i in 0..4 {
        for j in 0..4 {
            state[i * 4 + j] ^= (key[i] >> ((3 - j) * 8)) as u8;
        }
    }
}
fn add_round_key_after_shift(state: &mut [u8; 16], key: &[u32; 4]) { // 用于计算ty_boxes 
    for i in 0..4 {
        for j in 0..4 {
            state[i * 4 + j] ^= (key[(j + i) % 4] >> ((3 - j) * 8)) as u8; // key 也进行了行移位
        }
    }
}
fn sub_bytes(state: &mut [u8; 16]) {
    for i in 0..4 {
        for j in 0..4 {
            state[i * 4 + j] = SBOX[(state[i * 4 + j] >> 4) as usize][(state[i * 4 + j] & 0x0F) as usize];
        }
    }
}
fn shift_rows(state: &mut [u8; 16]) {
    let shifts = [
        0, 5, 10, 15,
        4, 9, 14, 3,
        8, 13, 2, 7,
        12, 1, 6, 11,
    ];
    let in_state = [
        state[0], state[1], state[2], state[3],
        state[4], state[5], state[6], state[7],
        state[8], state[9], state[10], state[11],
        state[12], state[13], state[14], state[15],
    ];
    for i in 0..16 {
        state[i] = in_state[shifts[i]];
    }
}
fn mix_columns(state: &mut [u8; 16]) {
    for i in 0..4 {
        let (a, b, c, d) = (state[4 * i + 0], state[4 * i + 1], state[4 * i + 2], state[4 * i + 3]);
        state[0 + 4 * i] = gmul(a, 2) ^ gmul(d, 1) ^ gmul(c, 1) ^ gmul(b, 3);
        state[1 + 4 * i] = gmul(b, 2) ^ gmul(a, 1) ^ gmul(d, 1) ^ gmul(c, 3);
        state[2 + 4 * i] = gmul(c, 2) ^ gmul(b, 1) ^ gmul(a, 1) ^ gmul(d, 3);
        state[3 + 4 * i] = gmul(d, 2) ^ gmul(c, 1) ^ gmul(b, 1) ^ gmul(a, 3);
    }
}
fn sub_word(word: u32) -> u32 {
    let mut result = 0u32;
    for i in 0..4 {
        let upper_nibble = ((word >> (4 + 8 * i)) & 0xF) as usize;
        let lower_nibble = ((word >> (8 * i)) & 0xF) as usize;
        result += (SBOX[upper_nibble][lower_nibble] as u32) << (8 * i);
    }
    return result;
}
fn rot_word(x: u32) -> u32 {
    (x << 8) | (x >> 24)
}
fn expend_keys(key: &[u8; 16], nk: u32, nr: u32) -> [u32; 44] { // nk = 4, nr = 10 for AES-128
    let nb = 4;
    let r_con: [u32; 15] = [
        0x01000000, 0x02000000, 0x04000000,
        0x08000000, 0x10000000, 0x20000000,
        0x40000000, 0x80000000, 0x1b000000,
        0x36000000, 0x6c000000, 0xd8000000,
        0xab000000, 0x4d000000, 0x9a000000,
    ];
    let mut w = [0u32; 44];
    for i in 0..4 {
        w[i] = ((((key[4 * i + 0]) as u32) << 24)
            | ((key[4 * i + 1] as u32) << 16)
            | ((key[4 * i + 2] as u32) << 8)
            | ((key[4 * i + 3] as u32) << 0)) as u32;
    }
    for i in nk..(nb * (nr + 1)) {
        let mut temp = w[(i - 1) as usize];
        if i % nk == 0 {
            temp = sub_word(rot_word(temp)) ^ r_con[((i - 1) / nk) as usize];
        } else if nk > 6 && (i % nk) == 4 {
            temp = sub_word(temp);
        }
        w[i as usize] = w[(i - nk) as usize] ^ temp;
    }
    w
}
fn encrypt_block(bytes: &[u8; 16], round_key: &[u32; 44]) -> [u8; 16] {
    let mut result = [0u8; 16];
    let mut state = [
        bytes[0], bytes[1], bytes[2], bytes[3],
        bytes[4], bytes[5], bytes[6], bytes[7],
        bytes[8], bytes[9], bytes[10], bytes[11],
        bytes[12], bytes[13], bytes[14], bytes[15],
    ];
    add_round_key(&mut state, round_key[0..4].try_into().expect(""));
    for i in 1..10 {
        sub_bytes(&mut state);
        shift_rows(&mut state);
        mix_columns(&mut state);
        add_round_key(&mut state, round_key[i * 4..(i + 1) * 4].try_into().expect(""));
    }
    sub_bytes(&mut state);
    shift_rows(&mut state);
    add_round_key(&mut state, round_key[40..44].try_into().expect(""));
    for i in 0..16 {
        result[i] = state[i];
    }
    result
}
// endregion
fn calculate_t_boxes(round_key: &[u32; 44]) -> [[[u8; 256]; 16]; 10] {  // 相当于执行完了10轮的加密，得到了t_boxes
    let mut t_boxes = [[[0u8; 256]; 16]; 10];
    for r in 0..10 {
        for x in 0..256 {
            let mut state = [x as u8; 16]; 
            add_round_key_after_shift(&mut state, &round_key[r * 4..(r * 4 + 4)].try_into().expect("")); 
            sub_bytes(&mut state);
            if r == 9 {
                add_round_key(&mut state, round_key[40..44].try_into().expect(""))
            }
            for i in 0..16 {
                t_boxes[r][i][x] = state[i];
            }
        }
    }
    return t_boxes;
}
fn calculate_ty() -> [[[u8; 4]; 256]; 4] { // 计算列混淆的置换表
    let mut ty = [[[0u8; 4]; 256]; 4];
    for x in 0..=255 {
        ty[0][x][0] = gmul(x as u8, 2);
        ty[0][x][1] = gmul(x as u8, 3);
        ty[0][x][2] = x as u8;
        ty[0][x][3] = x as u8;
        ty[1][x][0] = x as u8;
        ty[1][x][1] = gmul(x as u8, 2);
        ty[1][x][2] = gmul(x as u8, 3);
        ty[1][x][3] = x as u8;
        ty[2][x][0] = x as u8;
        ty[2][x][1] = x as u8;
        ty[2][x][2] = gmul(x as u8, 2);
        ty[2][x][3] = gmul(x as u8, 3);
        ty[3][x][0] = gmul(x as u8, 3);
        ty[3][x][1] = x as u8;
        ty[3][x][2] = x as u8;
        ty[3][x][3] = gmul(x as u8, 2);
    }
    ty
}
fn calculate_ty_boxes(round_key: &[u32; 44]) -> ([[[u32; 256]; 16]; 10], [[u8; 256]; 16]) {
    let t_boxes = calculate_t_boxes(&round_key); // 计算t_boxes
    let ty: [[[u8; 4]; 256]; 4] = calculate_ty(); // 列混淆的置换表
    let mut ty_boxes = [[[0u32; 256]; 16]; 10];
    let mut last_t_boxes = [[0u8; 256]; 16];
    for r in 0..9 {
        for x in 0..256 {
            for j in 0..4 {
                for i in 0..4 {
                    let v0 = ty[0][t_boxes[r][j * 4 + i][x] as usize][i] as u32;
                    let v1 = ty[1][t_boxes[r][j * 4 + i][x] as usize][i] as u32;
                    let v2 = ty[2][t_boxes[r][j * 4 + i][x] as usize][i] as u32;
                    let v3 = ty[3][t_boxes[r][j * 4 + i][x] as usize][i] as u32;
                    ty_boxes[r][j * 4 + i][x] = (v0.wrapping_shl(24) | (v1.wrapping_shl(16)) | v2.wrapping_shl(8) | v3) as u32; // 表的合并
                }
            }
        }
    }
    for x in 0..256 {
        for i in 0..16 {
            last_t_boxes[i][x] = t_boxes[9][i][x];
        }
    }
    // println!("ty_boxes{:x?}", ty_boxes);
    (ty_boxes, last_t_boxes)
}
fn calculate_xor_table() -> [[[[u8; 16]; 16]; 96]; 9] {
    let mut xor_table = [[[[0u8; 16]; 16]; 96]; 9];
    for r in 0..9 { // 代表第几轮
        for n in 0..96 { // 代表一轮需要的异或次数
            for i in 0..16 { // 代表第一个输入
                for j in 0..16 {  // 代表第二个输入
                    xor_table[r][n][i][j] = (i ^ j) as u8;
                }
            }
        }
    }
    xor_table
}
fn encrypt_block_by_table(bytes: &[u8; 16], round_key: &[u32; 44]) -> [u8; 16] {
    let mut result = [0u8; 16];
    let (ty_boxes, last_t_boxes) = calculate_ty_boxes(round_key);  // 计算ty_boxes
    let xor_table = calculate_xor_table(); // 计算xor_table
    let mut input = bytes.clone();
    for r in 0..9 {
        shift_rows(&mut input);
        for j in 0..4 {
            let aa = ty_boxes[r][j * 4 + 0][input[j * 4 + 0] as usize] as usize; // 实现列混淆的异或
            let bb = ty_boxes[r][j * 4 + 1][input[j * 4 + 1] as usize] as usize;
            let cc = ty_boxes[r][j * 4 + 2][input[j * 4 + 2] as usize] as usize;
            let dd = ty_boxes[r][j * 4 + 3][input[j * 4 + 3] as usize] as usize;
            let n0 = xor_table[r][j * 24 + 0][(aa >> 28) & 0xf][(bb >> 28) & 0xf] as usize; // 实现列混淆的异或
            let n1 = xor_table[r][j * 24 + 1][(cc >> 28) & 0xf][(dd >> 28) & 0xf] as usize;
            let n2 = xor_table[r][j * 24 + 2][(aa >> 24) & 0xf][(bb >> 24) & 0xf] as usize;
            let n3 = xor_table[r][j * 24 + 3][(cc >> 24) & 0xf][(dd >> 24) & 0xf] as usize;
            input[j * 4 + 0] = (xor_table[r][j * 24 + 4][n0][n1] << 4) | xor_table[r][j * 24 + 5][n2][n3];
            let n0 = xor_table[r][j * 24 + 6][(aa >> 20) & 0xf][(bb >> 20) & 0xf] as usize;
            let n1 = xor_table[r][j * 24 + 7][(cc >> 20) & 0xf][(dd >> 20) & 0xf] as usize;
            let n2 = xor_table[r][j * 24 + 8][(aa >> 16) & 0xf][(bb >> 16) & 0xf] as usize;
            let n3 = xor_table[r][j * 24 + 9][(cc >> 16) & 0xf][(dd >> 16) & 0xf] as usize;
            input[j * 4 + 1] = (xor_table[r][j * 24 + 10][n0][n1] << 4) | xor_table[r][j * 24 + 11][n2][n3];
            let n0 = xor_table[r][j * 24 + 12][(aa >> 12) & 0xf][(bb >> 12) & 0xf] as usize;
            let n1 = xor_table[r][j * 24 + 13][(cc >> 12) & 0xf][(dd >> 12) & 0xf] as usize;
            let n2 = xor_table[r][j * 24 + 14][(aa >> 8) & 0xf][(bb >> 8) & 0xf] as usize;
            let n3 = xor_table[r][j * 24 + 15][(cc >> 8) & 0xf][(dd >> 8) & 0xf] as usize;
            input[j * 4 + 2] = (xor_table[r][j * 24 + 16][n0][n1] << 4) | xor_table[r][j * 24 + 17][n2][n3];
            let n0 = xor_table[r][j * 24 + 18][(aa >> 4) & 0xf][(bb >> 4) & 0xf] as usize;
            let n1 = xor_table[r][j * 24 + 19][(cc >> 4) & 0xf][(dd >> 4) & 0xf] as usize;
            let n2 = xor_table[r][j * 24 + 20][(aa >> 0) & 0xf][(bb >> 0) & 0xf] as usize;
            let n3 = xor_table[r][j * 24 + 21][(cc >> 0) & 0xf][(dd >> 0) & 0xf] as usize;
            input[j * 4 + 3] = (xor_table[r][j * 24 + 22][n0][n1] << 4) | xor_table[r][j * 24 + 23][n2][n3];
        }
    }
    shift_rows(&mut input);
    for i in 0..16 {
        result[i] = last_t_boxes[i][input[i] as usize];
    }
    result
}

#[cfg(test)]
mod test {
    use crate::wbaes::{expend_keys, encrypt_block, encrypt_block_by_table};
    #[test]
    fn test_expand_key() {
        let key = [43, 126, 21, 22, 40, 174, 210, 166, 171, 247, 21, 136, 9, 207, 79, 60];
        println!("{:x?}", key);
        let input = [50, 67, 246, 168, 136, 90, 48, 141, 49, 49, 152, 162, 224, 55, 7, 52];
        // let input = [65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65];
        let w = expend_keys(&key, 4, 10);
        println!("{:?}", w); // 拿到了所有的轮密钥
        let block = encrypt_block(&input, &w);  // 传统的AES加密
        println!("{:x?}", block);
        let block_table = encrypt_block_by_table(&input, &w); // 通过置换表的方式实现的AES加密
        println!("{:x?}", block_table);
    }
}

Chow方法下的AES白盒的实现

在上述的查表过程中我们实际上T-Boxes、Ty-Boxes是存有密钥的，我们可以看到是在进行扩展密钥后将其丢入到T-Boxes、Ty-Boxes的生成中，对于此我们可以依据下面的公式进行对其进行攻击：

对应攻击代码：

fn attack_table(n: u32, i: usize) -> u8 {
    let ty: [[[u8; 4]; 256]; 4] = calculate_ty();
    let v0 = (n >> 24 & 0xFF) as u8;
    let v1 = (n >> 16 & 0xFF) as u8;
    let v2 = (n >> 8 & 0xFF) as u8;
    let v3 = (n >> 0 & 0xFF) as u8;
    let state = [v0, v1, v2, v3];
    for x in 0..=255 {
        if state[i] == ty[i][x][i] {
            let inv = x;
            return INV_SBOX[((inv >> 4) & 0xF) as usize][(inv & 0xF) as usize] as u8;
        }
    }
    return 0;
}
// test code
fn test_attack() {
    let key = [43, 126, 21, 22, 40, 174, 210, 166, 171, 247, 21, 136, 9, 207, 79, 60];
    let w = expend_keys(&key, 4, 10);
    let t_boxes = calculate_t_boxes(&w);
    for i in 0..16 {
        let inv = t_boxes[0][i][0];
        let result = INV_SBOX[((inv >> 4) & 0xF) as usize][(inv & 0xF) as usize] as u8;
        print!("{:?}, ", result);
    }
    print!("\n");
    let (ty_boxes, _) = calculate_ty_boxes(&w);
    let mut result = [0u8; 16];
    for i in 0..16 {
        let x = ty_boxes[0][i][0];
        let key = attack_table(x, i % 4);
        result[i] = key;
    }
    println!("{:?}", result);
}

因为原始的Ty-Boxes中依据上述的方式可以直接获取到对应的密钥，因此Chow的方案相当于是对Ty-Boxes进行了以此输入输出的编码。为了保护T-Boxes表，其采用了两个个双射函数f、g，依据下面公式得到一个新的表项：

如果是要保护两个表，比如R和T，那么可以找三个双射函数， f、h、g，然后按照如下构造:

混合双射

这里Chow的方案采用了一个可逆的线性变换，这个被称作混合双射，通俗点说，这个选择一个GF(2)有限域上面的一个可逆的随机矩阵，这个变换对于第 2 到第 9 轮都是一致的，具体过程如下:

对于第 2 到第 9 轮，选择 16 个 8bit -> 8bit的混合双射，这个作为 T-Boxes 的输入编码
对于第 1 到第 9 轮，选择 4 个 32bit -> 32bit的混合双射，这个作为 Ty-Boxes 的输出编码

至于为什么需要这么做，其实也比较好理解，观察上面的Encoding函数，如果第一轮也做一个输入编码，那么相比于原始的输入就多了一次变换，如果最后一轮如果做混合双射之后，那么最后的结果也就变了。

此处代码参考方案：https://github.com/balena/aes-whitebox

#include <iostream>
#include <NTL/mat_GF2.h>
#include "log.h"
constexpr uint8_t SBOX[16][16] = {
        {0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5, 0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76},
        {0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0, 0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0},
        {0xb7, 0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc, 0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15},
        {0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a, 0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75},
        {0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0, 0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84},
        {0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b, 0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf},
        {0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85, 0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c, 0x9f, 0xa8},
        {0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5, 0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2},
        {0xcd, 0x0c, 0x13, 0xec, 0x5f, 0x97, 0x44, 0x17, 0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73},
        {0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88, 0x46, 0xee, 0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb},
        {0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c, 0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79},
        {0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9, 0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08},
        {0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6, 0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a},
        {0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e, 0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e},
        {0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e, 0x94, 0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf},
        {0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68, 0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16}
};
constexpr int SHIFT_ROWS_TAB[16] = {
        0, 5, 10, 15,
        4, 9, 14, 3,
        8, 13, 2, 7,
        12, 1, 6, 11,
};
constexpr int InvShiftRowsTab[16] = {
        0, 13, 10, 7,
        4, 1, 14, 11,
        8, 5, 2, 15,
        12, 9, 6, 3,
};
template<typename T>
inline NTL::vec_GF2 from_scalar(T in);
template<>
inline NTL::vec_GF2 from_scalar(uint8_t in) {
    NTL::vec_GF2 result;
    result.SetLength(8);
    for (int i = 0; i < 8; i++) {
        result[7 - i] = ((in >> i) & 1);
    }
    return result;
}
template<>
inline NTL::vec_GF2 from_scalar(uint32_t in) {
    NTL::vec_GF2 result;
    result.SetLength(32);
    for (int i = 0; i < 32; i++) {
        result[31 - i] = ((in >> i) & 1);
    }
    return result;
}
template<typename T>
inline T to_scalar(const NTL::vec_GF2 &in);
template<>
inline uint8_t to_scalar(const NTL::vec_GF2 &in) {
    uint8_t result = 0;
    for (int i = 0; i < 2; i++) {
        long i0 = NTL::rep(in[i * 4 + 0]), i1 = NTL::rep(in[i * 4 + 1]),
                i2 = NTL::rep(in[i * 4 + 2]), i3 = NTL::rep(in[i * 4 + 3]);
        result = (result << 4) | (i0 << 3) | (i1 << 2) | (i2 << 1) | (i3 << 0);
    }
    return result;
}
template<>
inline uint32_t to_scalar(const NTL::vec_GF2 &in) {
    uint32_t result = 0;
    for (int i = 0; i < 8; i++) {
        long i0 = NTL::rep(in[i * 4 + 0]), i1 = NTL::rep(in[i * 4 + 1]),
                i2 = NTL::rep(in[i * 4 + 2]), i3 = NTL::rep(in[i * 4 + 3]);
        result = (result << 4) | (i0 << 3) | (i1 << 2) | (i2 << 1) | (i3 << 0);
    }
    return result;
}
template<typename T>
inline T mul(const NTL::mat_GF2 &mat, T x) {
    return to_scalar<T>(mat * from_scalar<T>(x));
}
NTL::mat_GF2 GenerateGF2RandomMatrix(int dimension) {
    NTL::mat_GF2 mat(NTL::INIT_SIZE, dimension, dimension);
    for (int i = 0; i < dimension; i++) {
        for (int j = 0; j < dimension; j++) {
            mat[i][j] = NTL::random_GF2();
        }
    }
    return mat;
}
NTL::mat_GF2 GenerateRandomGF2InvertibleMatrix(int dimension) {
    for (;;) {
        NTL::mat_GF2 result = GenerateGF2RandomMatrix(dimension);
        if (NTL::determinant(result) != 0)
            return result;
    }
}
inline void add_round_key(uint8_t state[16], const uint32_t round_key[4]) {
    for (int i = 0; i < 4; i++)
        for (int j = 0; j < 4; j++) {
            state[i * 4 + j] ^= round_key[i] >> ((3 - j) * 8);
        }
}
void add_round_key_after_shift(uint8_t state[16], const uint32_t round_key[4]) {
    for (int i = 0; i < 4; ++i) {
        for (int j = 0; j < 4; ++j) {
            state[i * 4 + j] ^= round_key[(j + i) % 4] >> ((3 - j) * 8);
        }
    }
}
void sub_bytes(uint8_t state[16]) {
    for (int i = 0; i < 4; ++i) {
        for (int j = 0; j < 4; ++j) {
            state[i * 4 + j] = SBOX[state[i * 4 + j] >> 4][state[i * 4 + j] & 0x0F];
        }
    }
}
void shift_rows(uint8_t state[16]) {
    uint8_t shifts[16] = {
            0, 5, 10, 15,
            4, 9, 14, 3,
            8, 13, 2, 7,
            12, 1, 6, 11,
    };
    const uint8_t in[16] = {
            state[0], state[1], state[2], state[3],
            state[4], state[5], state[6], state[7],
            state[8], state[9], state[10], state[11],
            state[12], state[13], state[14], state[15],
    };
    for (int i = 0; i < 16; i++) {
        state[i] = in[shifts[i]];
    }
}
void calculate_t_boxes(const uint32_t round_key[44], uint8_t t_boxes[10][16][256]) {
    for (int r = 0; r < 10; ++r) {
        for (int x = 0; x < 256; ++x) {
            uint8_t state[16] = {
                    (uint8_t) x, (uint8_t) x, (uint8_t) x, (uint8_t) x,
                    (uint8_t) x, (uint8_t) x, (uint8_t) x, (uint8_t) x,
                    (uint8_t) x, (uint8_t) x, (uint8_t) x, (uint8_t) x,
                    (uint8_t) x, (uint8_t) x, (uint8_t) x, (uint8_t) x
            };
            add_round_key_after_shift(state, &round_key[4 * r]);
            sub_bytes(state);
            if (r == 9) {
                add_round_key(state, &round_key[40]);
            }
            for (int i = 0; i < 16; ++i) {
                t_boxes[r][i][x] = state[i];
            }
        }
    }
}
uint8_t gmul(uint8_t ap, uint8_t bp) {
    uint8_t a = ap, b = bp, p = 0;
    while (a != 0 && b != 0) {
        if (b & 1) {
            p ^= a;
        }
        if (a & 0x80) {
            a = (a << 1) ^ 0x11b;
        } else {
            a = a << 1;
        }
        b >>= 1;
    }
    return p;
}
void calculate_ty(uint8_t ty[4][256][4]) {
    for (int x = 0; x < 256; ++x) {
        ty[0][x][0] = gmul(x, 2);
        ty[0][x][1] = gmul(x, 3);
        ty[0][x][2] = x;
        ty[0][x][3] = x;
        ty[1][x][0] = x;
        ty[1][x][1] = gmul(x, 2);
        ty[1][x][2] = gmul(x, 3);
        ty[1][x][3] = x;
        ty[2][x][0] = x;
        ty[2][x][1] = x;
        ty[2][x][2] = gmul(x, 2);
        ty[2][x][3] = gmul(x, 3);
        ty[3][x][0] = gmul(x, 3);
        ty[3][x][1] = x;
        ty[3][x][2] = x;
        ty[3][x][3] = gmul(x, 2);
    }
}
void calculate_ty_boxes(uint32_t round_key[44], uint32_t ty_boxes[10][16][256], uint8_t t_boxes_last[16][256],
                        uint32_t MBL[10][16][256]) {
    uint8_t t_boxes[10][16][256];
    uint8_t ty[4][256][4];
    calculate_t_boxes(round_key, t_boxes);
    calculate_ty(ty);
    for (int r = 0; r < 9; ++r) {
        for (int x = 0; x < 256; ++x) {
            for (int j = 0; j < 4; ++j) {
                for (int i = 0; i < 4; ++i) {
                    uint32_t v0 = ty[0][t_boxes[r][j * 4 + i][x]][i],
                            v1 = ty[1][t_boxes[r][j * 4 + i][x]][i],
                            v2 = ty[2][t_boxes[r][j * 4 + i][x]][i],
                            v3 = ty[3][t_boxes[r][j * 4 + i][x]][i];
                    ty_boxes[r][j * 4 + i][x] = (v0 << 24) | (v1 << 16) | (v2 << 8) | v3;
                    MBL[r][j * 4 + i][x] = x << ((3 - i) << 3);
                }
            }
        }
    }
    for (int x = 0; x < 256; ++x) {
        for (int i = 0; i < 16; ++i) {
            t_boxes_last[i][x] = t_boxes[9][i][x];
        }
    }
    // region MBL
    NTL::mat_GF2 MB[9][4];
    for (auto &r : MB) {
        for (auto &i : r) {
            i = GenerateRandomGF2InvertibleMatrix(32);
        }
    }
    for (int r = 0; r < 9; r++) {
        for (int x = 0; x < 256; x++) {
            for (int i = 0; i < 16; i++) {
                ty_boxes[r][i][x] = mul<uint32_t>(MB[r][i >> 2], ty_boxes[r][i][x]);
                MBL[r][i][x] = mul<uint32_t>(NTL::inv(MB[r][i >> 2]), MBL[r][i][x]);
            }
        }
    }
    NTL::mat_GF2 L[9][16];
    for (auto &r: L) {
        for (auto &i: r) {
            i = GenerateRandomGF2InvertibleMatrix(8);
        }
    }
    for (int r = 0; r < 9; ++r) {
        if (r > 0) {
            for (int i = 0; i < 16; ++i) {
                uint32_t copy_ty_boxes[256];
                for (int x = 0; x < 256; ++x) {
                    copy_ty_boxes[x] = ty_boxes[r][i][x];
                }
                for (int x = 0; x < 256; ++x) {
                    ty_boxes[r][i][x] = copy_ty_boxes[mul<uint8_t>(NTL::inv(L[r - 1][i]), x)];
                }
            }
        }
        for (int j = 0; j < 4; ++j) {
            for (int x = 0; x < 256; ++x) {
                uint32_t out0 = MBL[r][j * 4 + 0][x];
                uint32_t out1 = MBL[r][j * 4 + 1][x];
                uint32_t out2 = MBL[r][j * 4 + 2][x];
                uint32_t out3 = MBL[r][j * 4 + 3][x];
                MBL[r][j * 4 + 0][x] = (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 0]], out0 >> 24) << 24)
                                       | (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 1]], out0 >> 16) << 16)
                                       | (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 2]], out0 >> 8) << 8)
                                       | (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 3]], out0 >> 0) << 0);
                MBL[r][j * 4 + 1][x] = (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 0]], out1 >> 24) << 24)
                                       | (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 1]], out1 >> 16) << 16)
                                       | (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 2]], out1 >> 8) << 8)
                                       | (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 3]], out1 >> 0) << 0);
                MBL[r][j * 4 + 2][x] = (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 0]], out2 >> 24) << 24)
                                       | (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 1]], out2 >> 16) << 16)
                                       | (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 2]], out2 >> 8) << 8)
                                       | (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 3]], out2 >> 0) << 0);
                MBL[r][j * 4 + 3][x] = (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 0]], out3 >> 24) << 24)
                                       | (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 1]], out3 >> 16) << 16)
                                       | (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 2]], out3 >> 8) << 8)
                                       | (mul<uint8_t>(L[r][InvShiftRowsTab[j * 4 + 3]], out3 >> 0) << 0);
            }
        }
    }
    for (int i = 0; i < 16; i++) {
        uint8_t copy_ty_boxes_last[256];
        for (int x = 0; x < 256; x++) {
            copy_ty_boxes_last[x] = t_boxes_last[i][x];
        }
        for (int x = 0; x < 256; x++) {
            t_boxes_last[i][x] = copy_ty_boxes_last[mul<uint8_t>(NTL::inv(L[8][i]), x)];
        }
    }
    // endregion
}
void calculate_xor_table(uint8_t xor_tab[9][96][16][16]) {
    for (int r = 0; r < 9; ++r) {
        for (int n = 0; n < 96; ++n) {
            for (int i = 0; i < 16; ++i) {
                for (int j = 0; j < 16; ++j) {
                    xor_tab[r][n][i][j] = i ^ j;
                }
            }
        }
    }
}
// 这里只是加密一个分组, 链接模式也不实现了
void encrypt(
        uint8_t in[16],
        uint8_t xor_table[9][96][16][16],
        uint32_t ty_boxes[10][16][256], uint8_t t_boxes_last[16][256],
        uint32_t MBL[10][16][256]
) {
    for (int r = 0; r < 9; r++) {
        shift_rows(in);
        for (int j = 0; j < 4; ++j) {
            uint8_t n0, n1, n2, n3;
            uint32_t aa, bb, cc, dd;
            aa = ty_boxes[r][j * 4 + 0][in[j * 4 + 0]],
            bb = ty_boxes[r][j * 4 + 1][in[j * 4 + 1]],
            cc = ty_boxes[r][j * 4 + 2][in[j * 4 + 2]],
            dd = ty_boxes[r][j * 4 + 3][in[j * 4 + 3]];
            n0 = xor_table[r][j * 24 + 0][(aa >> 28) & 0xf][(bb >> 28) & 0xf];
            n1 = xor_table[r][j * 24 + 1][(cc >> 28) & 0xf][(dd >> 28) & 0xf];
            n2 = xor_table[r][j * 24 + 2][(aa >> 24) & 0xf][(bb >> 24) & 0xf];
            n3 = xor_table[r][j * 24 + 3][(cc >> 24) & 0xf][(dd >> 24) & 0xf];
            in[j * 4 + 0] = (xor_table[r][j * 24 + 4][n0][n1] << 4) | xor_table[r][j * 24 + 5][n2][n3];
            n0 = xor_table[r][j * 24 + 6][(aa >> 20) & 0xf][(bb >> 20) & 0xf];
            n1 = xor_table[r][j * 24 + 7][(cc >> 20) & 0xf][(dd >> 20) & 0xf];
            n2 = xor_table[r][j * 24 + 8][(aa >> 16) & 0xf][(bb >> 16) & 0xf];
            n3 = xor_table[r][j * 24 + 9][(cc >> 16) & 0xf][(dd >> 16) & 0xf];
            in[j * 4 + 1] = (xor_table[r][j * 24 + 10][n0][n1] << 4) | xor_table[r][j * 24 + 11][n2][n3];
            n0 = xor_table[r][j * 24 + 12][(aa >> 12) & 0xf][(bb >> 12) & 0xf];
            n1 = xor_table[r][j * 24 + 13][(cc >> 12) & 0xf][(dd >> 12) & 0xf];
            n2 = xor_table[r][j * 24 + 14][(aa >> 8) & 0xf][(bb >> 8) & 0xf];
            n3 = xor_table[r][j * 24 + 15][(cc >> 8) & 0xf][(dd >> 8) & 0xf];
            in[j * 4 + 2] = (xor_table[r][j * 24 + 16][n0][n1] << 4) | xor_table[r][j * 24 + 17][n2][n3];
            n0 = xor_table[r][j * 24 + 18][(aa >> 4) & 0xf][(bb >> 4) & 0xf];
            n1 = xor_table[r][j * 24 + 19][(cc >> 4) & 0xf][(dd >> 4) & 0xf];
            n2 = xor_table[r][j * 24 + 20][(aa >> 0) & 0xf][(bb >> 0) & 0xf];
            n3 = xor_table[r][j * 24 + 21][(cc >> 0) & 0xf][(dd >> 0) & 0xf];
            in[j * 4 + 3] = (xor_table[r][j * 24 + 22][n0][n1] << 4) | xor_table[r][j * 24 + 23][n2][n3];
            aa = MBL[r][j * 4 + 0][in[j * 4 + 0]];
            bb = MBL[r][j * 4 + 1][in[j * 4 + 1]];
            cc = MBL[r][j * 4 + 2][in[j * 4 + 2]];
            dd = MBL[r][j * 4 + 3][in[j * 4 + 3]];
            n0 = xor_table[r][j * 24 + 0][(aa >> 28) & 0xf][(bb >> 28) & 0xf];
            n1 = xor_table[r][j * 24 + 1][(cc >> 28) & 0xf][(dd >> 28) & 0xf];
            n2 = xor_table[r][j * 24 + 2][(aa >> 24) & 0xf][(bb >> 24) & 0xf];
            n3 = xor_table[r][j * 24 + 3][(cc >> 24) & 0xf][(dd >> 24) & 0xf];
            in[j * 4 + 0] = (xor_table[r][j * 24 + 4][n0][n1] << 4) | xor_table[r][j * 24 + 5][n2][n3];
            n0 = xor_table[r][j * 24 + 6][(aa >> 20) & 0xf][(bb >> 20) & 0xf];
            n1 = xor_table[r][j * 24 + 7][(cc >> 20) & 0xf][(dd >> 20) & 0xf];
            n2 = xor_table[r][j * 24 + 8][(aa >> 16) & 0xf][(bb >> 16) & 0xf];
            n3 = xor_table[r][j * 24 + 9][(cc >> 16) & 0xf][(dd >> 16) & 0xf];
            in[j * 4 + 1] = (xor_table[r][j * 24 + 10][n0][n1] << 4) | xor_table[r][j * 24 + 11][n2][n3];
            n0 = xor_table[r][j * 24 + 12][(aa >> 12) & 0xf][(bb >> 12) & 0xf];
            n1 = xor_table[r][j * 24 + 13][(cc >> 12) & 0xf][(dd >> 12) & 0xf];
            n2 = xor_table[r][j * 24 + 14][(aa >> 8) & 0xf][(bb >> 8) & 0xf];
            n3 = xor_table[r][j * 24 + 15][(cc >> 8) & 0xf][(dd >> 8) & 0xf];
            in[j * 4 + 2] = (xor_table[r][j * 24 + 16][n0][n1] << 4) | xor_table[r][j * 24 + 17][n2][n3];
            n0 = xor_table[r][j * 24 + 18][(aa >> 4) & 0xf][(bb >> 4) & 0xf];
            n1 = xor_table[r][j * 24 + 19][(cc >> 4) & 0xf][(dd >> 4) & 0xf];
            n2 = xor_table[r][j * 24 + 20][(aa >> 0) & 0xf][(bb >> 0) & 0xf];
            n3 = xor_table[r][j * 24 + 21][(cc >> 0) & 0xf][(dd >> 0) & 0xf];
            in[j * 4 + 3] = (xor_table[r][j * 24 + 22][n0][n1] << 4) | xor_table[r][j * 24 + 23][n2][n3];
        }
    }
    shift_rows(in);
    for (int i = 0; i < 16; i++) {
        in[i] = t_boxes_last[i][in[i]];
    }
}
int main() {
    // 这里密钥固定了，不实现密钥扩展算法了，只保留最核心的功能
    uint32_t w[44] = {
            729683222, 682545830, 2885096840, 164581180, 2700803607, 2287217841, 597899577, 711751173, 4072838642,
            2056698179, 1496678522, 1935275647, 1031817085, 1192689214, 505642564, 1836746811, 4014253377, 2823969663,
            3060868411, 3674975488, 3570517752, 2089000327, 3404904636, 301536700, 1837671290, 285949693, 3690563137,
            3389035517, 1314191118, 1600113139, 2225491890, 1319558223, 3939660577, 3045964498, 824964448, 2139957551,
            2893506291, 435870753, 684796225, 1465647214, 3491035560, 3387827593, 3779005640, 3059944614
    };
    uint8_t xor_table[9][96][16][16];
    uint32_t ty_boxes[10][16][256];
    uint8_t t_boxes_last[16][256];
    uint32_t MBL[10][16][256];
    calculate_xor_table(xor_table);
    calculate_ty_boxes(w, ty_boxes, t_boxes_last, MBL);
    uint8_t in[16] = {50, 67, 246, 168, 136, 90, 48, 141, 49, 49, 152, 162, 224, 55, 7, 52};
    encrypt(in, xor_table, ty_boxes, t_boxes_last, MBL);
    PRINT_ARRAY("%x,", in, 16);
    return 0;
}

AES白盒的攻击

对于此类问题一般采用DFA差分攻击，通过在AES的最后一次列混合过程中插入一些缺陷数据

为什么采用DFA：因为应用面广，硬件上容易实现。本质是用高能射线打向硬件，导致 1 比特的偏转，进而导致加密结果不同

具体测试过程如下：

我们拿一个AES程序，如我们采用一个原始的AES程序进行处理：https://github.com/TinyNiko/white_aes

还原ROUNDKEY

首先我们需要固定住输入。我的demo代码中是直接做了一个aes 加密，那么我们是可以知道原始 aeskey 的，后续只要验证，DFA攻击是否能还原出 key 就行。

1
2

const uint8_t *key = "\x6C\x28\x93\xF2\x1B\x61\x85\xE8\x56\x72\x38\xCB\x78\x18\x49\x45";
uint8_t plain[33] = "\x19\xec\x35\x72\xac\xca\xa7\x92\xbc\xb1\xe8\xe1\xdc\x59\x53\x66\x54\xbf\x08\x84 \x82\xa4\xef\x35\x5a\x66\x7f\xa1\x4e\x12\x24\xe9\x00";

记录第一次正确的 aes 加密结果，记为 golden_ref。

然后最关键的一步，在第 9 轮 列混合(Mix Column)之前，更改状态矩阵的一个字节数据，比如：

uint8_t *pp = (uint8_t *)state;
for (round = 1; ; ++round)
{
	SubBytes(state);
    ShiftRows(state);
    if (round == Nr) {
      break;
    }
    if(round == Nr - 1){
      // Modify index here
      pp[15] = 0x33;
    }
    MixColumns(state);
    AddRoundKey(round, state, RoundKey);
  }
  // Add round key to last round
  AddRoundKey(Nr, state, RoundKey);
}

我在第九轮的列混合前，加了一个判断，如果当前是第 9 轮，那么状态矩阵的最后一个字节变为 0x33(随便设置一下)。
然后记录这个时候的加密结果，看看和 golden_ref 是不是只有4个字节不一样，如果刚好四个字节不一样，那么这个关键点就找到了，如果只有1字节改变，
说明点找得太晚，如果有16字节不同说明点找得太早。

之后就是一个重复的过程，把状态矩阵里的每一个位置都改一遍，就可以得到 16 组不同的加密结果，也就是fault 数据。

然后利用 phonenixAES，把 golden_ref 和fault 数据 一起丢进去，最后可以得到一个Roundkey 10。至此，最关键的一步就完成了。

#!/usr/bin/env python3
import phoenixAES

with open('tracefile', 'wb') as t:
    t.write("""
868fc14bccc36e3c657b73271a32dcd7
ab8fc14bccc36e0f657b7e271a9edcd7
e98fc14bccc36ed9657b80271adcdcd7
d58fc14bccc36e5d657b6a271a38dcd7
da8fc14bccc36e0a657b62271a6fdcd7
865ac14b08c36e3c657b73201a3239d7
8613c14b86c36e3c657b73401a325cd7
86ecc14b82c36e3c657b73701a32afd7
86bdc14be4c36e3c657b73851a3257d7
868f004bcc0b6e3c617b73271a32dc7a
868f544bcc2c6e3c527b73271a32dc61
868f6f4bcccc6e3ce17b73271a32dc4a
868f394bcccc6e3c0e7b73271a32dc80
868fc1ecccc3993c657d73273c32dcd7
868fc1bfccc3a83c650f73274032dcd7
868fc1c7ccc38f3c651c7327da32dcd7
868fc1b1ccc33d3c658f73278632dcd7
""".encode('utf8'))

phoenixAES.crack_file('tracefile')
# Last round key #N found:
# 040D08DA68001026F3DC0D68897148B4

还原AES KEY

在拿到 RoundKey 之后，我们可以使用 https://github.com/SideChannelMarvels/Stark 中的 aes_keyschedule 反推出 aeskey，然后对这个
aeskey 进行验证即可。

另一个AES白盒解密的库

我们需要拿到最后一轮的AES的置换表(T-Boxes、Ty-Boxes合并后的表的最后一轮)内容。之后我们利用Github上的一个解密库：https://github.com/GrosQuildu/CryptoAttacks

from CryptoAttacks.Block.whitebox_aes_sage import *
from CryptoAttacks.Utils import *

TTyboxFinal = [
                        [
                                xxx
                                ...
                        ],
                        ...
                ]
key_recovered = recover_key_unprotected_wbaes_from_TTyboxFinal(TTyboxFinal)
key = matrix_to_array(key_recovered)
print(key)