-mem2reg: Promote Memory to Registers

在了解这个 pass 之前需要学习的背景知识是 LLVM IR Variables and Utilization

Description

-mem2reg pass 在做的事也是把 stack-allocated 的 variable 给 promote 成 SSA registers,这样就直接变成了 instruction-level computations,减少了很多不必要的 allocastore ,和 load instructions,进而提升performance。

Code Example

给定一段如下的 C code。

int sum(int n) {
    int result = 0;
    for (int i = 1; i <= n; i++) {
        result += i;
    }
    return result;
}

我们先把它 compile 成原始的 IR code。

define i32 @sum(i32 %n) {
entry:
    ; Allocate space on the stack for 'result' and 'i'
    %result = alloca i32
    %i = alloca i32
    ; Initialize 'result' to 0
    store i32 0, i32* %result
    ; Initialize 'i' to 1
    store i32 1, i32* %i
    ; Jump to the condition check of the loop
    br label %loop.cond

loop.cond:
    ; Load the value of 'i'
    %loaded_i = load i32, i32* %i
    ; Compare 'i' with 'n'
    %cmp = icmp sle i32 %loaded_i, %n
    ; Conditional branch based on the comparison result
    br i1 %cmp, label %loop.body, label %loop.end

loop.body:
    ; Load 'result' and 'i'
    %loaded_result = load i32, i32* %result
    %loaded_i = load i32, i32* %i
    ; Add 'i' to 'result'
    %sum = add i32 %loaded_result, %loaded_i
    ; Store the updated 'result' back
    store i32 %sum, i32* %result

    ; Increment 'i'
    %next_i = add i32 %loaded_i, 1
    store i32 %next_i, i32* %i
    ; Jump back to the loop condition check
    br label %loop.cond

loop.end:
    ; Load the final value of 'result'
    %final_result = load i32, i32* %result
    ; Return 'result'
    ret i32 %final_result
}

经过 -mem2reg transform 之后的 IR 可以变成如下所示

define i32 @sum(i32 %n) {
entry:

    ; 'result' and 'i' are promoted to SSA registers
    %result = phi i32 [ 0, %entry ], [ %sum, %loop.body ]
    %i = phi i32 [ 1, %entry ], [ %next_i, %loop.body ]
    ; Jump to the condition check of the loop
    br label %loop.cond

loop.cond:
    ; Compare 'i' with 'n'
    %cmp = icmp sle i32 %i, %n
    ; Conditional branch based on the comparison result
    br i1 %cmp, label %loop.body, label %loop.end

loop.body:
    ; Add 'i' to 'result' using the SSA registers directly
    %sum = add i32 %result, %i
    ; Increment 'i' using the SSA register directly
    %next_i = add i32 %i, 1
    ; Jump back to the loop condition check
    br label %loop.cond

loop.end:
    ; 'result' is already in SSA form and does not need loading
    ; Return 'result' directly
    ret i32 %result
}

可以看到,不必要的 memory access,包括 allocastore , 和 load instructions 都被 remove 了。性能得到了提升。