使用 LLVM 实现一个简单编译器
作者:tomoyazhang,騰訊 PCG 后臺開發工程師
1. 目標
這個系列來自 LLVM 的Kaleidoscope 教程,增加了我對代碼的注釋以及一些理解,修改了部分代碼。現在開始我們要使用 LLVM 實現一個編譯器,完成對如下代碼的編譯運行。
#?斐波那契數列函數定義 def?fib(x)if?x?<?3?then1elsefib(x?-?1)?+?fib(x?-?2)fib(40)#?函數聲明 extern?sin(arg) extern?cos(arg) extern?atan2(arg1?arg2)#?聲明后的函數可調用 atan2(sin(.4),?cos(42))這個語言稱為 Kaleidoscope, 從代碼可以看出,Kaleidoscope 支持函數、條件分支、數值計算等語言特性。為了方便,Kaleidoscope 唯一支持的數據類型為 float64, 所以示例中的所有數值都是 float64。
2. Lex
編譯的第一個步驟稱為 Lex, 詞法分析,其功能是將文本輸入轉為多個 tokens, 比如對于如下代碼:
atan2(sin(.4),?cos(42))就應該轉為:
tokens?=?["atan2",?"(",?"sin",?"(",?.4,?")",?",",?"cos",?"(",?42,?")",?")"]接下來我們使用 C++來寫這個 Lexer, 由于這是教程代碼,所以并沒有使用工程項目應有的設計:
//?如果不是以下5種情況,Lexer返回[0-255]的ASCII值,否則返回以下枚舉值 enum?Token?{TOKEN_EOF?=?-1,?????????//?文件結束標識符TOKEN_DEF?=?-2,?????????//?關鍵字defTOKEN_EXTERN?=?-3,??????//?關鍵字externTOKEN_IDENTIFIER?=?-4,??//?名字TOKEN_NUMBER?=?-5???????//?數值 };std::string?g_identifier_str;??//?Filled?in?if?TOKEN_IDENTIFIER double?g_number_val;???????????//?Filled?in?if?TOKEN_NUMBER//?從標準輸入解析一個Token并返回 int?GetToken()?{static?int?last_char?=?'?';//?忽略空白字符while?(isspace(last_char))?{last_char?=?getchar();}//?識別字符串if?(isalpha(last_char))?{g_identifier_str?=?last_char;while?(isalnum((last_char?=?getchar())))?{g_identifier_str?+=?last_char;}if?(g_identifier_str?==?"def")?{return?TOKEN_DEF;}?else?if?(g_identifier_str?==?"extern")?{return?TOKEN_EXTERN;}?else?{return?TOKEN_IDENTIFIER;}}//?識別數值if?(isdigit(last_char)?||?last_char?==?'.')?{std::string?num_str;do?{num_str?+=?last_char;last_char?=?getchar();}?while?(isdigit(last_char)?||?last_char?==?'.');g_number_val?=?strtod(num_str.c_str(),?nullptr);return?TOKEN_NUMBER;}//?忽略注釋if?(last_char?==?'#')?{do?{last_char?=?getchar();}?while?(last_char?!=?EOF?&&?last_char?!=?'\n'?&&?last_char?!=?'\r');if?(last_char?!=?EOF)?{return?GetToken();}}//?識別文件結束if?(last_char?==?EOF)?{return?TOKEN_EOF;}//?直接返回ASCIIint?this_char?=?last_char;last_char?=?getchar();return?this_char; }使用 Lexer 對之前的代碼處理結果為(使用空格分隔 tokens):
def?fib?(?x?)?if?x?<?3?then?1?else?fib?(?x?-?1?)?+?fib?(?x?-?2?)?fib?(?40?)?extern?sin?(?arg?) extern?cos?(?arg?)?extern?atan2?(?arg1?arg2?)?atan2?(?sin?(?0.4?)?,?cos?(?42?)?)Lexer 的輸入是代碼文本,輸出是有序的一個個 Token。
3. Parser
編譯的第二個步驟稱為 Parse, 其功能是將 Lexer 輸出的 tokens 轉為 AST (Abstract Syntax Tree)。我們首先定義表達式的 AST Node:
//?所有?`表達式`?節點的基類 class?ExprAST?{public:virtual?~ExprAST()?{} };//?字面值表達式 class?NumberExprAST?:?public?ExprAST?{public:NumberExprAST(double?val)?:?val_(val)?{}private:double?val_; };//?變量表達式 class?VariableExprAST?:?public?ExprAST?{public:VariableExprAST(const?std::string&?name)?:?name_(name)?{}private:std::string?name_; };//?二元操作表達式 class?BinaryExprAST?:?public?ExprAST?{public:BinaryExprAST(char?op,?std::unique_ptr<ExprAST>?lhs,std::unique_ptr<ExprAST>?rhs):?op_(op),?lhs_(std::move(lhs)),?rhs_(std::move(rhs))?{}private:char?op_;std::unique_ptr<ExprAST>?lhs_;std::unique_ptr<ExprAST>?rhs_; };//?函數調用表達式 class?CallExprAST?:?public?ExprAST?{public:CallExprAST(const?std::string&?callee,std::vector<std::unique_ptr<ExprAST>>?args):?callee_(callee),?args_(std::move(args))?{}private:std::string?callee_;std::vector<std::unique_ptr<ExprAST>>?args_; };為了便于理解,關于條件表達式的內容放在后面,這里暫不考慮。接著我們定義函數聲明和函數的 AST Node:
//?函數接口 class?PrototypeAST?{public:PrototypeAST(const?std::string&?name,?std::vector<std::string>?args):?name_(name),?args_(std::move(args))?{}const?std::string&?name()?const?{?return?name_;?}private:std::string?name_;std::vector<std::string>?args_; };//?函數 class?FunctionAST?{public:FunctionAST(std::unique_ptr<PrototypeAST>?proto,std::unique_ptr<ExprAST>?body):?proto_(std::move(proto)),?body_(std::move(body))?{}private:std::unique_ptr<PrototypeAST>?proto_;std::unique_ptr<ExprAST>?body_; };接下來我們要進行 Parse, 在正式 Parse 前,定義如下函數方便后續處理:
int?g_current_token;??//?當前待處理的Token int?GetNextToken()?{return?g_current_token?=?GetToken(); }首先我們處理最簡單的字面值:
//?numberexpr?::=?number std::unique_ptr<ExprAST>?ParseNumberExpr()?{auto?result?=?std::make_unique<NumberExprAST>(g_number_val);GetNextToken();return?std::move(result); }這段程序非常簡單,當前 Token 為 TOKEN_NUMBER 時被調用,使用 g_number_val,創建一個 NumberExprAST, 因為當前 Token 處理完畢,讓 Lexer 前進一個 Token, 最后返回。接著我們處理圓括號操作符、變量、函數調用:
//?parenexpr?::=?(?expression?) std::unique_ptr<ExprAST>?ParseParenExpr()?{GetNextToken();??//?eat?(auto?expr?=?ParseExpression();GetNextToken();??//?eat?)return?expr; }///?identifierexpr ///???::=?identifier ///???::=?identifier?(?expression,?expression,?...,?expression?) std::unique_ptr<ExprAST>?ParseIdentifierExpr()?{std::string?id?=?g_identifier_str;GetNextToken();if?(g_current_token?!=?'(')?{return?std::make_unique<VariableExprAST>(id);}?else?{GetNextToken();??//?eat?(std::vector<std::unique_ptr<ExprAST>>?args;while?(g_current_token?!=?')')?{args.push_back(ParseExpression());if?(g_current_token?==?')')?{break;}?else?{GetNextToken();??//?eat?,}}GetNextToken();??//?eat?)return?std::make_unique<CallExprAST>(id,?std::move(args));} }上面代碼中的 ParseExpression 與 ParseParenExpr 等存在循環依賴,這里按照其名字理解意思即可,具體實現在后面。我們將 NumberExpr、ParenExpr、IdentifierExpr 視為 PrimaryExpr, 封裝 ParsePrimary 方便后續調用:
///?primary ///???::=?identifierexpr ///???::=?numberexpr ///???::=?parenexpr std::unique_ptr<ExprAST>?ParsePrimary()?{switch?(g_current_token)?{case?TOKEN_IDENTIFIER:?return?ParseIdentifierExpr();case?TOKEN_NUMBER:?return?ParseNumberExpr();case?'(':?return?ParseParenExpr();default:?return?nullptr;} }接下來我們考慮如何處理二元操作符,為了方便,Kaleidoscope 只支持 4 種二元操作符,優先級為:
'<'?<?'+'?=?'-'?<?'*'即'<'的優先級最低,而'*'的優先級最高,在代碼中實現為:
//?定義優先級 const?std::map<char,?int>?g_binop_precedence?=?{{'<',?10},?{'+',?20},?{'-',?20},?{'*',?40}};//?獲得當前Token的優先級 int?GetTokenPrecedence()?{auto?it?=?g_binop_precedence.find(g_current_token);if?(it?!=?g_binop_precedence.end())?{return?it->second;}?else?{return?-1;} }對于帶優先級的二元操作符的解析,我們會將其分成多個片段。比如一個表達式:
a?+?b?+?(c?+?d)?*?e?*?f?+?g首先解析 a, 然后處理多個二元組:
[+,?b],?[+,?(c+d)],?[*,?e],?[*,?f],?[+,?g]即,復雜表達式可以抽象為一個 PrimaryExpr 跟著多個[binop, PrimaryExpr]二元組,注意由于圓括號屬于 PrimaryExpr, 所以這里不需要考慮怎么特殊處理(c+d),因為會被 ParsePrimary 自動處理。
//?parse //???lhs?[binop?primary]?[binop?primary]?... //?如遇到優先級小于min_precedence的操作符,則停止 std::unique_ptr<ExprAST>?ParseBinOpRhs(int?min_precedence,std::unique_ptr<ExprAST>?lhs)?{while?(true)?{int?current_precedence?=?GetTokenPrecedence();if?(current_precedence?<?min_precedence)?{//?如果當前token不是二元操作符,current_precedence為-1,?結束任務//?如果遇到優先級更低的操作符,也結束任務return?lhs;}int?binop?=?g_current_token;GetNextToken();??//?eat?binopauto?rhs?=?ParsePrimary();//?現在我們有兩種可能的解析方式//????*?(lhs?binop?rhs)?binop?unparsed//????*?lhs?binop?(rhs?binop?unparsed)int?next_precedence?=?GetTokenPrecedence();if?(current_precedence?<?next_precedence)?{//?將高于current_precedence的右邊的操作符處理掉返回rhs?=?ParseBinOpRhs(current_precedence?+?1,?std::move(rhs));}lhs?=std::make_unique<BinaryExprAST>(binop,?std::move(lhs),?std::move(rhs));//?繼續循環} }//?expression //???::=?primary?[binop?primary]?[binop?primary]?... std::unique_ptr<ExprAST>?ParseExpression()?{auto?lhs?=?ParsePrimary();return?ParseBinOpRhs(0,?std::move(lhs)); }最復雜的部分完成后,按部就班把 function 寫完:
//?prototype //???::=?id?(?id?id?...?id) std::unique_ptr<PrototypeAST>?ParsePrototype()?{std::string?function_name?=?g_identifier_str;GetNextToken();std::vector<std::string>?arg_names;while?(GetNextToken()?==?TOKEN_IDENTIFIER)?{arg_names.push_back(g_identifier_str);}GetNextToken();??//?eat?)return?std::make_unique<PrototypeAST>(function_name,?std::move(arg_names)); }//?definition?::=?def?prototype?expression std::unique_ptr<FunctionAST>?ParseDefinition()?{GetNextToken();??//?eat?defauto?proto?=?ParsePrototype();auto?expr?=?ParseExpression();return?std::make_unique<FunctionAST>(std::move(proto),?std::move(expr)); }//?external?::=?extern?prototype std::unique_ptr<PrototypeAST>?ParseExtern()?{GetNextToken();??//?eat?externreturn?ParsePrototype(); }最后,我們為頂層的代碼實現匿名 function:
//?toplevelexpr?::=?expression std::unique_ptr<FunctionAST>?ParseTopLevelExpr()?{auto?expr?=?ParseExpression();auto?proto?=?std::make_unique<PrototypeAST>("",?std::vector<std::string>());return?std::make_unique<FunctionAST>(std::move(proto),?std::move(expr)); }頂層代碼的意思是放在全局而不放在 function 內定義的一些執行語句比如變量賦值,函數調用等。編寫一個 main 函數:
int?main()?{GetNextToken();while?(true)?{switch?(g_current_token)?{case?TOKEN_EOF:?return?0;case?TOKEN_DEF:?{ParseDefinition();std::cout?<<?"parsed?a?function?definition"?<<?std::endl;break;}case?TOKEN_EXTERN:?{ParseExtern();std::cout?<<?"parsed?a?extern"?<<?std::endl;break;}default:?{ParseTopLevelExpr();std::cout?<<?"parsed?a?top?level?expr"?<<?std::endl;break;}}}return?0; }編譯:
clang++?main.cpp?`llvm-config?--cxxflags?--ldflags?--libs`輸入如下代碼進行測試:
def?foo(x?y)x?+?foo(y,?4)def?foo(x?y)x?+?yyextern?sin(a)得到輸出:
parsed?a?function?definition parsed?a?function?definition parsed?a?top?level?expr parsed?a?extern至此成功將 Lexer 輸出的 tokens 轉為 AST。
4. Code Generation to LLVM IR
終于開始 codegen 了,首先我們 include 一些 LLVM 頭文件,定義一些全局變量:
#include?"llvm/ADT/APFloat.h" #include?"llvm/ADT/STLExtras.h" #include?"llvm/IR/BasicBlock.h" #include?"llvm/IR/Constants.h" #include?"llvm/IR/DerivedTypes.h" #include?"llvm/IR/Function.h" #include?"llvm/IR/IRBuilder.h" #include?"llvm/IR/LLVMContext.h" #include?"llvm/IR/LegacyPassManager.h" #include?"llvm/IR/Module.h" #include?"llvm/IR/Type.h" #include?"llvm/IR/Verifier.h" #include?"llvm/Support/TargetSelect.h" #include?"llvm/Target/TargetMachine.h" #include?"llvm/Transforms/InstCombine/InstCombine.h" #include?"llvm/Transforms/Scalar.h" #include?"llvm/Transforms/Scalar/GVN.h"//?記錄了LLVM的核心數據結構,比如類型和常量表,不過我們不太需要關心它的內部 llvm::LLVMContext?g_llvm_context; //?用于創建LLVM指令 llvm::IRBuilder<>?g_ir_builder(g_llvm_context); //?用于管理函數和全局變量,可以粗淺地理解為類c++的編譯單元(單個cpp文件) llvm::Module?g_module("my?cool?jit",?g_llvm_context); //?用于記錄函數的變量參數 std::map<std::string,?llvm::Value*>?g_named_values;然后給每個 AST Class 增加一個 CodeGen 接口:
//?所有?`表達式`?節點的基類 class?ExprAST?{public:virtual?~ExprAST()?{}virtual?llvm::Value*?CodeGen()?=?0; };//?字面值表達式 class?NumberExprAST?:?public?ExprAST?{public:NumberExprAST(double?val)?:?val_(val)?{}llvm::Value*?CodeGen()?override;private:double?val_; };首先實現 NumberExprAST 的 CodeGen:
llvm::Value*?NumberExprAST::CodeGen()?{return?llvm::ConstantFP::get(g_llvm_context,?llvm::APFloat(val_)); }由于 Kaleidoscope 只有一種數據類型 FP64, 所以直接調用 ConstantFP 傳入即可,APFloat 是 llvm 內部的數據結構,用于存儲 Arbitrary Precision Float. 在 LLVM IR 中,所有常量是唯一且共享的,所以這里使用的 get 而不是 new/create。
然后實現 VariableExprAST 的 CodeGen:
llvm::Value*?VariableExprAST::CodeGen()?{return?g_named_values.at(name_); }由于 Kaleidoscope 的 VariableExpr 只存在于函數內對函數參數的引用,我們假定函數參數已經被注冊到 g_name_values 中,所以 VariableExpr 直接查表返回即可。
接著實現 BinaryExprAST, 分別 codegen lhs, rhs 然后創建指令處理 lhs, rhs 即可:
llvm::Value*?BinaryExprAST::CodeGen()?{llvm::Value*?lhs?=?lhs_->CodeGen();llvm::Value*?rhs?=?rhs_->CodeGen();switch?(op_)?{case?'<':?{llvm::Value*?tmp?=?g_ir_builder.CreateFCmpULT(lhs,?rhs,?"cmptmp");//?把?0/1?轉為?0.0/1.0return?g_ir_builder.CreateUIToFP(tmp,?llvm::Type::getDoubleTy(g_llvm_context),?"booltmp");}case?'+':?return?g_ir_builder.CreateFAdd(lhs,?rhs,?"addtmp");case?'-':?return?g_ir_builder.CreateFSub(lhs,?rhs,?"subtmp");case?'*':?return?g_ir_builder.CreateFMul(lhs,?rhs,?"multmp");default:?return?nullptr;} }實現 CallExprAST:
llvm::Value*?CallExprAST::CodeGen()?{//?g_module中存儲了全局變量/函數等llvm::Function*?callee?=?g_module.getFunction(callee_);std::vector<llvm::Value*>?args;for?(std::unique_ptr<ExprAST>&?arg_expr?:?args_)?{args.push_back(arg_expr->CodeGen());}return?g_ir_builder.CreateCall(callee,?args,?"calltmp"); }實現 ProtoTypeAST:
llvm::Value*?PrototypeAST::CodeGen()?{//?創建kaleidoscope的函數類型?double?(doube,?double,?...,?double)std::vector<llvm::Type*>?doubles(args_.size(),llvm::Type::getDoubleTy(g_llvm_context));//?函數類型是唯一的,所以使用get而不是new/createllvm::FunctionType*?function_type?=?llvm::FunctionType::get(llvm::Type::getDoubleTy(g_llvm_context),?doubles,?false);//?創建函數,?ExternalLinkage意味著函數可能不在當前module中定義,在當前module//?即g_module中注冊名字為name_,?后面可以使用這個名字在g_module中查詢llvm::Function*?func?=?llvm::Function::Create(function_type,?llvm::Function::ExternalLinkage,?name_,?&g_module);//?增加IR可讀性,設置function的argument?nameint?index?=?0;for?(auto&?arg?:?func->args())?{arg.setName(args_[index++]);}return?func; }實現 FunctionAST:
llvm::Value*?FunctionAST::CodeGen()?{//?檢查函數聲明是否已完成codegen(比如之前的extern聲明),?如果沒有則執行codegenllvm::Function*?func?=?g_module.getFunction(proto_->name());if?(func?==?nullptr)?{func?=?proto_->CodeGen();}//?創建一個Block并且設置為指令插入位置。//?llvm?block用于定義control?flow?graph,?由于我們暫不實現control?flow,?創建//?一個單獨的block即可llvm::BasicBlock*?block?=llvm::BasicBlock::Create(g_llvm_context,?"entry",?func);g_ir_builder.SetInsertPoint(block);//?將函數參數注冊到g_named_values中,讓VariableExprAST可以codegeng_named_values.clear();for?(llvm::Value&?arg?:?func->args())?{g_named_values[arg.getName()]?=?&arg;}//?codegen?body然后returnllvm::Value*?ret_val?=?body_->CodeGen();g_ir_builder.CreateRet(ret_val);llvm::verifyFunction(*func);return?func; }至此,所有 codegen 都已完成,修改 main:
int?main()?{GetNextToken();while?(true)?{switch?(g_current_token)?{case?TOKEN_EOF:?return?0;case?TOKEN_DEF:?{auto?ast?=?ParseDefinition();std::cout?<<?"parsed?a?function?definition"?<<?std::endl;ast->CodeGen()->print(llvm::errs());std::cerr?<<?std::endl;break;}case?TOKEN_EXTERN:?{auto?ast?=?ParseExtern();std::cout?<<?"parsed?a?extern"?<<?std::endl;ast->CodeGen()->print(llvm::errs());std::cerr?<<?std::endl;break;}default:?{auto?ast?=?ParseTopLevelExpr();std::cout?<<?"parsed?a?top?level?expr"?<<?std::endl;ast->CodeGen()->print(llvm::errs());std::cerr?<<?std::endl;break;}}}return?0; }輸入測試:
4?+?5def?foo(a?b)a*a?+?2*a*b?+?b*bfoo(2,?3)def?bar(a)foo(a,?4)?+?bar(31337)extern?cos(x)cos(1.234)得到輸出:
parsed?a?top?level?expr define?double?@0()?{ entry:ret?double?9.000000e+00 }parsed?a?function?definition define?double?@foo(double?%a,?double?%b)?{ entry:%multmp?=?fmul?double?%a,?%a%multmp1?=?fmul?double?2.000000e+00,?%a%multmp2?=?fmul?double?%multmp1,?%b%addtmp?=?fadd?double?%multmp,?%multmp2%multmp3?=?fmul?double?%b,?%b%addtmp4?=?fadd?double?%addtmp,?%multmp3ret?double?%addtmp4 }parsed?a?top?level?expr define?double?@1()?{ entry:%calltmp?=?call?double?@foo(double?2.000000e+00,?double?3.000000e+00)ret?double?%calltmp }parsed?a?function?definition define?double?@bar(double?%a)?{ entry:%calltmp?=?call?double?@foo(double?%a,?double?4.000000e+00)%calltmp1?=?call?double?@bar(double?3.133700e+04)%addtmp?=?fadd?double?%calltmp,?%calltmp1ret?double?%addtmp }parsed?a?extern declare?double?@cos(double)parsed?a?top?level?expr define?double?@2()?{ entry:%calltmp?=?call?double?@cos(double?1.234000e+00)ret?double?%calltmp }至此,我們已成功將 Parser 輸出的 AST 轉為 LLVM IR。
5. Optimizer
我們使用上一節的程序處理如下代碼:
def?test(x)1?+?2?+?x可以得到:
parsed?a?function?definition define?double?@test(double?%x)?{ entry:%addtmp?=?fadd?double?3.000000e+00,?%xret?double?%addtmp }可以看到,生成的指令直接是 1+2 的結果,而沒有 1 + 2 的指令,這種自動把常量計算完畢而不是生成加法指令的優化稱為 Constant Folding。
在大部分時候僅有這個優化仍然不夠,比如如下代碼:
def?test(x)(1?+?2?+?x)?*?(x?+?(1?+?2))可以得到編譯結果:
parsed?a?function?definition define?double?@test(double?%x)?{ entry:%addtmp?=?fadd?double?3.000000e+00,?%x%addtmp1?=?fadd?double?%x,?3.000000e+00%multmp?=?fmul?double?%addtmp,?%addtmp1ret?double?%multmp }生成了兩個加法指令,但最優做法只需要一個加法即可,因為乘法的兩邊 lhs 和 rhs 是相等的。
這需要其他的優化技術,llvm 以"passes"的形式提供,llvm 中的 passes 可以選擇是否啟用,可以設置 passes 的順序。
這里我們對每個函數單獨做優化,定義 g_fpm, 增加幾個 passes:
llvm::legacy::FunctionPassManager?g_fpm(&g_module);int?main()?{g_fpm.add(llvm::createInstructionCombiningPass());g_fpm.add(llvm::createReassociatePass());g_fpm.add(llvm::createGVNPass());g_fpm.add(llvm::createCFGSimplificationPass());g_fpm.doInitialization();... }在 FunctionAST 的 CodeGen 中增加一句:
llvm::Value*?ret_val?=?body_->CodeGen();g_ir_builder.CreateRet(ret_val);llvm::verifyFunction(*func);g_fpm.run(*func);?//?增加這句return?func;即啟動了對每個 function 的優化,接下來測試之前的代碼:
parsed?a?function?definition define?double?@test(double?%x)?{ entry:%addtmp?=?fadd?double?%x,?3.000000e+00%multmp?=?fmul?double?%addtmp,?%addtmpret?double?%multmp }可以看到,和我們期望的一樣,加法指令減少到一個。
6. Adding a JIT Compiler
由于 JIT 模式中我們需要反復創建新的 module, 所以我們將全局變量 g_module 改為 unique_ptr。
//?用于管理函數和全局變量,可以粗淺地理解為類c++的編譯單元(單個cpp文件) std::unique_ptr<llvm::Module>?g_module?=std::make_unique<llvm::Module>("my?cool?jit",?g_llvm_context);為了專注于 JIT,我們可以把優化的 passes 刪掉。
修改 ParseTopLevelExpr,給 PrototypeAST 命名為__anon_expr, 讓我們后面可以通過這個名字找到它。
//?toplevelexpr?::=?expression std::unique_ptr<FunctionAST>?ParseTopLevelExpr()?{auto?expr?=?ParseExpression();auto?proto?=std::make_unique<PrototypeAST>("__anon_expr",?std::vector<std::string>());return?std::make_unique<FunctionAST>(std::move(proto),?std::move(expr)); }然后我們從 llvm-project 中拷貝一份代碼 llvm/examples/Kaleidoscope/include/KaleidoscopeJIT.h 到本地再 include, 其定義了 KaleidoscopeJIT 類,關于這個類,在后面會做解讀,這里先不管。
定義全局變量 g_jit, 并使用 InitializeNativeTarget*函數初始化環境。
#include?"KaleidoscopeJIT.h"std::unique_ptr<llvm::orc::KaleidoscopeJIT>?g_jit;int?main()?{llvm::InitializeNativeTarget();llvm::InitializeNativeTargetAsmPrinter();llvm::InitializeNativeTargetAsmParser();g_jit.reset(new?llvm::orc::KaleidoscopeJIT);g_module->setDataLayout(g_jit->getTargetMachine().createDataLayout());... }修改 main 處理 top level expr 的代碼為:
auto?ast?=?ParseTopLevelExpr();std::cout?<<?"parsed?a?top?level?expr"?<<?std::endl;ast->CodeGen()->print(llvm::errs());std::cout?<<?std::endl;auto?h?=?g_jit->addModule(std::move(g_module));//?重新創建g_module在下次使用g_module?=std::make_unique<llvm::Module>("my?cool?jit",?g_llvm_context);g_module->setDataLayout(g_jit->getTargetMachine().createDataLayout());//?通過名字找到編譯的函數符號auto?symbol?=?g_jit->findSymbol("__anon_expr");//?強轉為C函數指針double?(*fp)()?=?(double?(*)())(symbol.getAddress().get());//?執行輸出std::cout?<<?fp()?<<?std::endl;g_jit->removeModule(h);break;輸入:
4?+?5def?foo(a?b)a*a?+?2*a*b?+?b*bfoo(2,?3)得到輸出:
parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:ret?double?9.000000e+00 }9 parsed?a?function?definition define?double?@foo(double?%a,?double?%b)?{ entry:%multmp?=?fmul?double?%a,?%a%multmp1?=?fmul?double?2.000000e+00,?%a%multmp2?=?fmul?double?%multmp1,?%b%addtmp?=?fadd?double?%multmp,?%multmp2%multmp3?=?fmul?double?%b,?%b%addtmp4?=?fadd?double?%addtmp,?%multmp3ret?double?%addtmp4 }parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%calltmp?=?call?double?@foo(double?2.000000e+00,?double?3.000000e+00)ret?double?%calltmp }25可以看到代碼已經順利執行,但現在的實現仍然是有問題的,比如上面的輸入,foo 函數的定義和調用是被歸在同一個 module 中,當第一次調用完成后,由于我們 removeModule, 第二次調用 foo 會失敗。
在解決這個問題之前,我們先把 main 函數內對不同 TOKEN 的處理拆成多個函數,如下:
void?ReCreateModule()?{g_module?=?std::make_unique<llvm::Module>("my?cool?jit",?g_llvm_context);g_module->setDataLayout(g_jit->getTargetMachine().createDataLayout()); }void?ParseDefinitionToken()?{auto?ast?=?ParseDefinition();std::cout?<<?"parsed?a?function?definition"?<<?std::endl;ast->CodeGen()->print(llvm::errs());std::cerr?<<?std::endl; }void?ParseExternToken()?{auto?ast?=?ParseExtern();std::cout?<<?"parsed?a?extern"?<<?std::endl;ast->CodeGen()->print(llvm::errs());std::cerr?<<?std::endl; }void?ParseTopLevel()?{auto?ast?=?ParseTopLevelExpr();std::cout?<<?"parsed?a?top?level?expr"?<<?std::endl;ast->CodeGen()->print(llvm::errs());std::cout?<<?std::endl;auto?h?=?g_jit->addModule(std::move(g_module));//?重新創建g_module在下次使用ReCreateModule();//?通過名字找到編譯的函數符號auto?symbol?=?g_jit->findSymbol("__anon_expr");//?強轉為C函數指針double?(*fp)()?=?(double?(*)())(symbol.getAddress().get());//?執行輸出std::cout?<<?fp()?<<?std::endl;g_jit->removeModule(h); }int?main()?{llvm::InitializeNativeTarget();llvm::InitializeNativeTargetAsmPrinter();llvm::InitializeNativeTargetAsmParser();g_jit.reset(new?llvm::orc::KaleidoscopeJIT);g_module->setDataLayout(g_jit->getTargetMachine().createDataLayout());GetNextToken();while?(true)?{switch?(g_current_token)?{case?TOKEN_EOF:?return?0;case?TOKEN_DEF:?ParseDefinitionToken();?break;case?TOKEN_EXTERN:?ParseExternToken();?break;default:?ParseTopLevel();?break;}}return?0; }為了解決第二次調用 foo 失敗的問題,我們需要讓 function 和 top level expr 處于不同的 Module, 而處于不同 Module 的話,CallExprAST 的 CodeGen 在當前 module 會找不到 function, 所以需要自動在 CallExprAST 做 CodeGen 時在當前 Module 聲明這個函數,即自動地增加 extern, 也就是在當前 Module 自動做對應 PrototypeAST 的 CodeGen.
首先,增加一個全局變量存儲從函數名到函數接口的映射,并增加一個查詢函數。
std::map<std::string,?std::unique_ptr<PrototypeAST>>?name2proto_ast;llvm::Function*?GetFunction(const?std::string&?name)?{llvm::Function*?callee?=?g_module->getFunction(name);if?(callee?!=?nullptr)?{??//?當前module存在函數定義return?callee;}?else?{//?聲明函數return?name2proto_ast.at(name)->CodeGen();} }更改 CallExprAST 的 CodeGen, 讓其使用上面定義的 GetFuntion:
llvm::Value*?CallExprAST::CodeGen()?{llvm::Function*?callee?=?GetFunction(callee_);std::vector<llvm::Value*>?args;for?(std::unique_ptr<ExprAST>&?arg_expr?:?args_)?{args.push_back(arg_expr->CodeGen());}return?g_ir_builder.CreateCall(callee,?args,?"calltmp"); }更改 FunctionAST 的 CodeGen, 讓其將結果寫入 name2proto_ast:
llvm::Value*?FunctionAST::CodeGen()?{PrototypeAST&?proto?=?*proto_;name2proto_ast[proto.name()]?=?std::move(proto_);??//?transfer?ownershipllvm::Function*?func?=?GetFunction(proto.name());//?創建一個Block并且設置為指令插入位置。//?llvm?block用于定義control?flow?graph,?由于我們暫不實現control?flow,?創建//?一個單獨的block即可llvm::BasicBlock*?block?=llvm::BasicBlock::Create(g_llvm_context,?"entry",?func);g_ir_builder.SetInsertPoint(block);//?將函數參數注冊到g_named_values中,讓VariableExprAST可以codegeng_named_values.clear();for?(llvm::Value&?arg?:?func->args())?{g_named_values[arg.getName()]?=?&arg;}//?codegen?body然后returnllvm::Value*?ret_val?=?body_->CodeGen();g_ir_builder.CreateRet(ret_val);llvm::verifyFunction(*func);return?func; }修改 ParseExternToken 將結果寫入 name2proto_ast:
void?ParseExternToken()?{auto?ast?=?ParseExtern();std::cout?<<?"parsed?a?extern"?<<?std::endl;ast->CodeGen()->print(llvm::errs());std::cerr?<<?std::endl;name2proto_ast[ast->name()]?=?std::move(ast); }修改 ParseDefinitionToken 讓其使用獨立 Module:
void?ParseDefinitionToken()?{auto?ast?=?ParseDefinition();std::cout?<<?"parsed?a?function?definition"?<<?std::endl;ast->CodeGen()->print(llvm::errs());std::cerr?<<?std::endl;g_jit->addModule(std::move(g_module));ReCreateModule(); }修改完畢,輸入測試:
def?foo(x)x?+?1foo(2)def?foo(x)x?+?2foo(2)extern?sin(x) extern?cos(x)sin(1.0)def?foo(x)sin(x)?*?sin(x)?+?cos(x)?*?cos(x)foo(4) foo(3)得到輸出:
parsed?a?function?definition define?double?@foo(double?%x)?{ entry:%addtmp?=?fadd?double?%x,?1.000000e+00ret?double?%addtmp }parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%calltmp?=?call?double?@foo(double?2.000000e+00)ret?double?%calltmp }3 parsed?a?function?definition define?double?@foo(double?%x)?{ entry:%addtmp?=?fadd?double?%x,?2.000000e+00ret?double?%addtmp }parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%calltmp?=?call?double?@foo(double?2.000000e+00)ret?double?%calltmp }4 parsed?a?extern declare?double?@sin(double)parsed?a?extern declare?double?@cos(double)parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%calltmp?=?call?double?@sin(double?1.000000e+00)ret?double?%calltmp }0.841471 parsed?a?function?definition define?double?@foo(double?%x)?{ entry:%calltmp?=?call?double?@sin(double?%x)%calltmp1?=?call?double?@sin(double?%x)%multmp?=?fmul?double?%calltmp,?%calltmp1%calltmp2?=?call?double?@cos(double?%x)%calltmp3?=?call?double?@cos(double?%x)%multmp4?=?fmul?double?%calltmp2,?%calltmp3%addtmp?=?fadd?double?%multmp,?%multmp4ret?double?%addtmp }parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%calltmp?=?call?double?@foo(double?4.000000e+00)ret?double?%calltmp }1 parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%calltmp?=?call?double?@foo(double?3.000000e+00)ret?double?%calltmp }1成功運行,執行正確!代碼可以正確解析 sin, cos 的原因在 KaleidoscopeJIT.h 中,截取其尋找符號的代碼。
JITSymbol?findMangledSymbol(const?std::string?&Name)?{ #ifdef?_WIN32//?The?symbol?lookup?of?ObjectLinkingLayer?uses?the?SymbolRef::SF_Exported//?flag?to?decide?whether?a?symbol?will?be?visible?or?not,?when?we?call//?IRCompileLayer::findSymbolIn?with?ExportedSymbolsOnly?set?to?true.////?But?for?Windows?COFF?objects,?this?flag?is?currently?never?set.//?For?a?potential?solution?see:?https://reviews.llvm.org/rL258665//?For?now,?we?allow?non-exported?symbols?on?Windows?as?a?workaround.const?bool?ExportedSymbolsOnly?=?false; #elseconst?bool?ExportedSymbolsOnly?=?true; #endif//?Search?modules?in?reverse?order:?from?last?added?to?first?added.//?This?is?the?opposite?of?the?usual?search?order?for?dlsym,?but?makes?more//?sense?in?a?REPL?where?we?want?to?bind?to?the?newest?available?definition.for?(auto?H?:?make_range(ModuleKeys.rbegin(),?ModuleKeys.rend()))if?(auto?Sym?=?CompileLayer.findSymbolIn(H,?Name,?ExportedSymbolsOnly))return?Sym;//?If?we?can't?find?the?symbol?in?the?JIT,?try?looking?in?the?host?process.if?(auto?SymAddr?=?RTDyldMemoryManager::getSymbolAddressInProcess(Name))return?JITSymbol(SymAddr,?JITSymbolFlags::Exported);#ifdef?_WIN32//?For?Windows?retry?without?"_"?at?beginning,?as?RTDyldMemoryManager?uses//?GetProcAddress?and?standard?libraries?like?msvcrt.dll?use?names//?with?and?without?"_"?(for?example?"_itoa"?but?"sin").if?(Name.length()?>?2?&&?Name[0]?==?'_')if?(auto?SymAddr?=RTDyldMemoryManager::getSymbolAddressInProcess(Name.substr(1)))return?JITSymbol(SymAddr,?JITSymbolFlags::Exported); #endifreturn?null可以看到,在之前定義的 Module 找不到后會在 host process 中尋找這個符號。
7. SSA
繼續給我們的 Kaleidoscope 添加功能之前,需要先介紹 SSA, Static Single Assignment,考慮下面代碼:
y?:=?1 y?:=?2 x?:=?y我們可以發現第一個賦值是不必須的,而且第三行使用的 y 來自第二行的賦值,改成 SSA 格式為
y_1?=?1 y_2?=?2 x_1?=?y_2改完可以方便編譯器進行優化,比如把第一個賦值刪去,于是我們可以給出 SSA 的定義:
每個變量僅且必須被賦值一次,原本代碼中的多次變量賦值會被賦予版本號然后視為不同變量;
每個變量在被使用之前必須被定義。
考慮如下 Control Flow Graph:
加上版本號:
可以看到,這里遇到一個問題,最下面的 block 里面的 y 應該使用 y1 還是 y2, 為了解決這個問題,插入一個特殊語句稱為 phi function, 其會根據 control flow 從 y1 和 y2 中選擇一個值作為 y3, 如下:
可以看到,對于 x 不需要 phi function, 因為兩個分支到最后的都是 x2。
8. Control Flow
我們現在實現的 Kaleidoscope 還不夠完善,缺少 if else 控制流,比如不支持如下代碼:
def?fib(x)if?x?<?3?then1elsefib(x?-?1)?+?fib(x?-?2)首先讓我們的 Lexer 能識別 if then else 三個關鍵字,增加 TOKEN 類型:
TOKEN_IF?=?-6,??????????//?ifTOKEN_THEN?=?-7,????????//?thenTOKEN_ELSE?=?-8,????????//?else增加識別規則:
//?識別字符串if?(isalpha(last_char))?{g_identifier_str?=?last_char;while?(isalnum((last_char?=?getchar())))?{g_identifier_str?+=?last_char;}if?(g_identifier_str?==?"def")?{return?TOKEN_DEF;}?else?if?(g_identifier_str?==?"extern")?{return?TOKEN_EXTERN;}?else?if?(g_identifier_str?==?"if")?{return?TOKEN_IF;}?else?if?(g_identifier_str?==?"then")?{return?TOKEN_THEN;}?else?if?(g_identifier_str?==?"else")?{return?TOKEN_ELSE;}?else?{return?TOKEN_IDENTIFIER;}}增加 IfExprAST:
//?if?then?else class?IfExprAST?:?public?ExprAST?{public:IfExprAST(std::unique_ptr<ExprAST>?cond,?std::unique_ptr<ExprAST>?then_expr,std::unique_ptr<ExprAST>?else_expr):?cond_(std::move(cond)),then_expr_(std::move(then_expr)),else_expr_(std::move(else_expr))?{}llvm::Value*?CodeGen()?override;private:std::unique_ptr<ExprAST>?cond_;std::unique_ptr<ExprAST>?then_expr_;std::unique_ptr<ExprAST>?else_expr_; };增加對 IfExprAST 的解析:
std::unique_ptr<ExprAST>?ParseIfExpr()?{GetNextToken();??//?eat?ifstd::unique_ptr<ExprAST>?cond?=?ParseExpression();GetNextToken();??//?eat?thenstd::unique_ptr<ExprAST>?then_expr?=?ParseExpression();GetNextToken();??//?eat?elsestd::unique_ptr<ExprAST>?else_expr?=?ParseExpression();return?std::make_unique<IfExprAST>(std::move(cond),?std::move(then_expr),std::move(else_expr)); }增加到 ParsePrimary 中:
//?primary //???::=?identifierexpr //???::=?numberexpr //???::=?parenexpr std::unique_ptr<ExprAST>?ParsePrimary()?{switch?(g_current_token)?{case?TOKEN_IDENTIFIER:?return?ParseIdentifierExpr();case?TOKEN_NUMBER:?return?ParseNumberExpr();case?'(':?return?ParseParenExpr();case?TOKEN_IF:?return?ParseIfExpr();default:?return?nullptr;} }完成了 lex 和 parse,接下來是最有意思的 codegen:
llvm::Value*?IfExprAST::CodeGen()?{llvm::Value*?cond_value?=?cond_->CodeGen();//?創建fcmp?one指令,?cond_value?=?(cond_value?!=?0.0)//?轉為1bit?(bool)類型cond_value?=?g_ir_builder.CreateFCmpONE(cond_value,?llvm::ConstantFP::get(g_llvm_context,?llvm::APFloat(0.0)),"ifcond");//?在每個function內我們會創建一個block,?這里一定在這個block內,根據block得到//?對應的上層functionllvm::Function*?func?=?g_ir_builder.GetInsertBlock()->getParent();//?為then?else以及最后的final創建blockllvm::BasicBlock*?then_block?=llvm::BasicBlock::Create(g_llvm_context,?"then",?func);llvm::BasicBlock*?else_block?=llvm::BasicBlock::Create(g_llvm_context,?"else");llvm::BasicBlock*?final_block?=llvm::BasicBlock::Create(g_llvm_context,?"ifcont");//?創建跳轉指令,根據cond_value選擇then_block/else_blockg_ir_builder.CreateCondBr(cond_value,?then_block,?else_block);//?codegen?then_block,?增加跳轉final_block指令g_ir_builder.SetInsertPoint(then_block);llvm::Value*?then_value?=?then_expr_->CodeGen();g_ir_builder.CreateBr(final_block);//?then語句內可能會有嵌套的if/then/else,?在嵌套的codegen時,會改變當前的//?InsertBlock,?我們需要有最終結果的那個block作為這里的then_blockthen_block?=?g_ir_builder.GetInsertBlock();//?在這里才加入是為了讓這個block位于上面的then里嵌套block的后面func->getBasicBlockList().push_back(else_block);//?與then類似g_ir_builder.SetInsertPoint(else_block);llvm::Value*?else_value?=?else_expr_->CodeGen();g_ir_builder.CreateBr(final_block);else_block?=?g_ir_builder.GetInsertBlock();//?codegen?finalfunc->getBasicBlockList().push_back(final_block);g_ir_builder.SetInsertPoint(final_block);llvm::PHINode*?pn?=?g_ir_builder.CreatePHI(llvm::Type::getDoubleTy(g_llvm_context),?2,?"iftmp");pn->addIncoming(then_value,?then_block);pn->addIncoming(else_value,?else_block);return?pn; }這里使用了上一節 SSA 中提到的 phi function,輸入:
def?foo(x)if?x?<?3?then1elsefoo(x?-?1)?+?foo(x?-?2)foo(1) foo(2) foo(3) foo(4)得到輸出:
parsed?a?function?definition define?double?@foo(double?%x)?{ entry:%cmptmp?=?fcmp?ult?double?%x,?3.000000e+00%booltmp?=?uitofp?i1?%cmptmp?to?double%ifcond?=?fcmp?one?double?%booltmp,?0.000000e+00br?i1?%ifcond,?label?%then,?label?%elsethen:?????????????????????????????????????????????;?preds?=?%entrybr?label?%ifcontelse:?????????????????????????????????????????????;?preds?=?%entry%subtmp?=?fsub?double?%x,?1.000000e+00%calltmp?=?call?double?@foo(double?%subtmp)%subtmp1?=?fsub?double?%x,?2.000000e+00%calltmp2?=?call?double?@foo(double?%subtmp1)%addtmp?=?fadd?double?%calltmp,?%calltmp2br?label?%ifcontifcont:???????????????????????????????????????????;?preds?=?%else,?%then%iftmp?=?phi?double?[?1.000000e+00,?%then?],?[?%addtmp,?%else?]ret?double?%iftmp }parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%calltmp?=?call?double?@foo(double?1.000000e+00)ret?double?%calltmp }1 parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%calltmp?=?call?double?@foo(double?2.000000e+00)ret?double?%calltmp }1 parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%calltmp?=?call?double?@foo(double?3.000000e+00)ret?double?%calltmp }2 parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%calltmp?=?call?double?@foo(double?4.000000e+00)ret?double?%calltmp }3成功完成了斐波那契數列的計算,接下來我們需要增加循環的支持,在此之前我們實現一個 printd 函數:
extern?"C"?double?printd(double?x)?{printf("%lf\n",?x);return?0.0; }編譯:
clang++?-g?main.cpp?\`llvm-config?--cxxflags?--ldflags?--libs\`?-Wl,-no-as-needed?-rdynamic輸入:
extern?printd(x)printd(12)得到輸出:
parsed?a?extern declare?double?@printd(double)parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%calltmp?=?call?double?@printd(double?1.200000e+01)ret?double?%calltmp }12.000000 0可以看到,我們成功給 Kaleiscope 添加了 printd 函數,接下來看我們需要實現的循環語法, 使用 C++代碼作為注釋:
def?printstar(n):for?i?=?1,?i?<?n,?1.0?in?#?for?(double?i?=?1.0;?i?<?n;?i?+=?1.0)printd(n)同樣,我們增加 for 和 in 的 TOKEN:
enum?Token?{TOKEN_EOF?=?-1,?????????//?文件結束標識符TOKEN_DEF?=?-2,?????????//?關鍵字defTOKEN_EXTERN?=?-3,??????//?關鍵字externTOKEN_IDENTIFIER?=?-4,??//?名字TOKEN_NUMBER?=?-5,??????//?數值TOKEN_IF?=?-6,??????????//?ifTOKEN_THEN?=?-7,????????//?thenTOKEN_ELSE?=?-8,????????//?elseTOKEN_FOR?=?-9,?????????//?forTOKEN_IN?=?-10??????????//?in };增加 TOKEN 的識別:
//?識別字符串if?(isalpha(last_char))?{g_identifier_str?=?last_char;while?(isalnum((last_char?=?getchar())))?{g_identifier_str?+=?last_char;}if?(g_identifier_str?==?"def")?{return?TOKEN_DEF;}?else?if?(g_identifier_str?==?"extern")?{return?TOKEN_EXTERN;}?else?if?(g_identifier_str?==?"if")?{return?TOKEN_IF;}?else?if?(g_identifier_str?==?"then")?{return?TOKEN_THEN;}?else?if?(g_identifier_str?==?"else")?{return?TOKEN_ELSE;}?else?if?(g_identifier_str?==?"for")?{return?TOKEN_FOR;}?else?if?(g_identifier_str?==?"in")?{return?TOKEN_IN;}?else?{return?TOKEN_IDENTIFIER;}}增加 ForExprAST:
//?for?in class?ForExprAST?:?public?ExprAST?{public:ForExprAST(const?std::string&?var_name,?std::unique_ptr<ExprAST>?start_expr,std::unique_ptr<ExprAST>?end_expr,std::unique_ptr<ExprAST>?step_expr,std::unique_ptr<ExprAST>?body_expr):?var_name_(var_name),start_expr_(std::move(start_expr)),end_expr_(std::move(end_expr)),step_expr_(std::move(step_expr)),body_expr_(std::move(body_expr))?{}llvm::Value*?CodeGen()?override;private:std::string?var_name_;std::unique_ptr<ExprAST>?start_expr_;std::unique_ptr<ExprAST>?end_expr_;std::unique_ptr<ExprAST>?step_expr_;std::unique_ptr<ExprAST>?body_expr_; };添加到 Primary 的解析中:
//?forexpr?::=?for?var_name?=?start_expr,?end_expr,?step_expr?in?body_expr std::unique_ptr<ExprAST>?ParseForExpr()?{GetNextToken();??//?eat?forstd::string?var_name?=?g_identifier_str;GetNextToken();??//?eat?var_nameGetNextToken();??//?eat?=std::unique_ptr<ExprAST>?start_expr?=?ParseExpression();GetNextToken();??//?eat?,std::unique_ptr<ExprAST>?end_expr?=?ParseExpression();GetNextToken();??//?eat?,std::unique_ptr<ExprAST>?step_expr?=?ParseExpression();GetNextToken();??//?eat?instd::unique_ptr<ExprAST>?body_expr?=?ParseExpression();return?std::make_unique<ForExprAST>(var_name,?std::move(start_expr),std::move(end_expr),?std::move(step_expr),std::move(body_expr)); } //?primary //???::=?identifierexpr //???::=?numberexpr //???::=?parenexpr std::unique_ptr<ExprAST>?ParsePrimary()?{switch?(g_current_token)?{case?TOKEN_IDENTIFIER:?return?ParseIdentifierExpr();case?TOKEN_NUMBER:?return?ParseNumberExpr();case?'(':?return?ParseParenExpr();case?TOKEN_IF:?return?ParseIfExpr();case?TOKEN_FOR:?return?ParseForExpr();default:?return?nullptr;} }開始 codegen:
llvm::Value*?ForExprAST::CodeGen()?{//?codegen?startllvm::Value*?start_val?=?start_expr_->CodeGen();//?獲取當前functionllvm::Function*?func?=?g_ir_builder.GetInsertBlock()->getParent();//?保存當前的blockllvm::BasicBlock*?pre_block?=?g_ir_builder.GetInsertBlock();//?新增一個loop?block到當前functionllvm::BasicBlock*?loop_block?=llvm::BasicBlock::Create(g_llvm_context,?"loop",?func);//?為當前block增加到loop_block的跳轉指令g_ir_builder.CreateBr(loop_block);//?開始在loop_block內增加指令g_ir_builder.SetInsertPoint(loop_block);llvm::PHINode*?var?=?g_ir_builder.CreatePHI(llvm::Type::getDoubleTy(g_llvm_context),?2,?var_name_.c_str());//?如果來自pre_block的跳轉,則取start_val的值var->addIncoming(start_val,?pre_block);//?現在我們新增了一個變量var,因為可能會被后面的代碼引用,所以要注冊到//?g_named_values中,其可能會和函數參數重名,但我們這里為了方便不管//?這個特殊情況,直接注冊到g_named_values中,g_named_values[var_name_]?=?var;//?在loop_block中增加body的指令body_expr_->CodeGen();//?codegen?step_exprllvm::Value*?step_value?=?step_expr_->CodeGen();//?next_var?=?var?+?step_valuellvm::Value*?next_value?=?g_ir_builder.CreateFAdd(var,?step_value,?"nextvar");//?codegen?end_exprllvm::Value*?end_value?=?end_expr_->CodeGen();//?end_value?=?(end_value?!=?0.0)end_value?=?g_ir_builder.CreateFCmpONE(end_value,?llvm::ConstantFP::get(g_llvm_context,?llvm::APFloat(0.0)),"loopcond");//?和if/then/else一樣,這里的block可能會發生變化,保存當前的blockllvm::BasicBlock*?loop_end_block?=?g_ir_builder.GetInsertBlock();//?創建循環結束后的blockllvm::BasicBlock*?after_block?=llvm::BasicBlock::Create(g_llvm_context,?"afterloop",?func);//?根據end_value選擇是再來一次loop_block還是進入after_blockg_ir_builder.CreateCondBr(end_value,?loop_block,?after_block);//?給after_block增加指令g_ir_builder.SetInsertPoint(after_block);//?如果是再次循環,取新的值var->addIncoming(next_value,?loop_end_block);//?循環結束,避免被再次引用g_named_values.erase(var_name_);//?return?0return?llvm::Constant::getNullValue(llvm::Type::getDoubleTy(g_llvm_context)); }輸入:
extern?printd(x)def?foo(x)if?x?<?3?then1elsefoo(x?-?1)?+?foo(x?-?2)for?i?=?1,?i?<?10,?1.0?inprintd(foo(i))輸出:
parsed?a?extern declare?double?@printd(double)parsed?a?function?definition define?double?@foo(double?%x)?{ entry:%cmptmp?=?fcmp?ult?double?%x,?3.000000e+00%booltmp?=?uitofp?i1?%cmptmp?to?double%ifcond?=?fcmp?one?double?%booltmp,?0.000000e+00br?i1?%ifcond,?label?%then,?label?%elsethen:?????????????????????????????????????????????;?preds?=?%entrybr?label?%ifcontelse:?????????????????????????????????????????????;?preds?=?%entry%subtmp?=?fsub?double?%x,?1.000000e+00%calltmp?=?call?double?@foo(double?%subtmp)%subtmp1?=?fsub?double?%x,?2.000000e+00%calltmp2?=?call?double?@foo(double?%subtmp1)%addtmp?=?fadd?double?%calltmp,?%calltmp2br?label?%ifcontifcont:???????????????????????????????????????????;?preds?=?%else,?%then%iftmp?=?phi?double?[?1.000000e+00,?%then?],?[?%addtmp,?%else?]ret?double?%iftmp }parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:br?label?%looploop:?????????????????????????????????????????????;?preds?=?%loop,?%entry%i?=?phi?double?[?1.000000e+00,?%entry?],?[?%nextvar,?%loop?]%calltmp?=?call?double?@foo(double?%i)%calltmp1?=?call?double?@printd(double?%calltmp)%nextvar?=?fadd?double?%i,?1.000000e+00%cmptmp?=?fcmp?ult?double?%i,?1.000000e+01%booltmp?=?uitofp?i1?%cmptmp?to?double%loopcond?=?fcmp?one?double?%booltmp,?0.000000e+00br?i1?%loopcond,?label?%loop,?label?%afterloopafterloop:????????????????????????????????????????;?preds?=?%loopret?double?0.000000e+00 }1.000000 1.000000 2.000000 3.000000 5.000000 8.000000 13.000000 21.000000 34.000000 55.000000 0成功遍歷了斐波那契數列。
9. User-Defined Operators
在 C++中,用戶可以重載操作符而不能增加操作符。在這里,我們將給 Kaleidoscope 增加一個功能,讓用戶可以增加二元操作符。
#?新增二元操作符?`>`,?優先級等于內置的?`<` def?binary>?10?(LHS?RHS)RHS?<?LHS#?新增二元操作符?`|`,?優先級為5 def?binary|?5?(LHS?RHS)if?LHS?then1else?if?RHS?then1else0#?新增二元操作符?`=`,優先級為9,這個操作符類似C++的?`==` def?binary=?9?(LHS?RHS)!(LHS?<?RHS?|?LHS?>?RHS)增加 TOKEN 的類型:
enum?Token?{...TOKEN_BINARY?=?-11,?????//?binary };增加 TOKEN 的識別:
//?從標準輸入解析一個Token并返回 int?GetToken()?{...//?識別字符串if?(isalpha(last_char))?{...if?(g_identifier_str?==?"def")?{return?TOKEN_DEF;}?else?if?(g_identifier_str?==?"extern")?{return?TOKEN_EXTERN;}?else?if?(g_identifier_str?==?"if")?{return?TOKEN_IF;}?else?if?(g_identifier_str?==?"then")?{return?TOKEN_THEN;}?else?if?(g_identifier_str?==?"else")?{return?TOKEN_ELSE;}?else?if?(g_identifier_str?==?"for")?{return?TOKEN_FOR;}?else?if?(g_identifier_str?==?"in")?{return?TOKEN_IN;}?else?if?(g_identifier_str?==?"binary")?{return?TOKEN_BINARY;}?else?{return?TOKEN_IDENTIFIER;}}... }我們把新增的二元操作符視為一個函數,所以不需要新增 AST,但是需要修改 PrototypeAST。
//?函數接口 class?PrototypeAST?{public:PrototypeAST(const?std::string&?name,?std::vector<std::string>?args,bool?is_operator?=?false,?int?op_precedence?=?0):?name_(name),args_(std::move(args)),is_operator_(is_operator),op_precedence_(op_precedence)?{}llvm::Function*?CodeGen();const?std::string&?name()?const?{?return?name_;?}int?op_precedence()?const?{?return?op_precedence_;?}bool?IsUnaryOp()?const?{?return?is_operator_?&&?args_.size()?==?1;?}bool?IsBinaryOp()?const?{?return?is_operator_?&&?args_.size()?==?2;?}//?like?`|`?in?`binary|`char?GetOpName()?{?return?name_[name_.size()?-?1];?}private:std::string?name_;std::vector<std::string>?args_;bool?is_operator_;int?op_precedence_; };修改 parse 部分:
//?prototype //???::=?id?(?id?id?...?id) //???::=?binary?binop?precedence?(id?id) std::unique_ptr<PrototypeAST>?ParsePrototype()?{std::string?function_name;bool?is_operator?=?false;int?precedence?=?0;switch?(g_current_token)?{case?TOKEN_IDENTIFIER:?{function_name?=?g_identifier_str;is_operator?=?false;GetNextToken();??//?eat?idbreak;}case?TOKEN_BINARY:?{GetNextToken();??//?eat?binaryfunction_name?=?"binary";function_name?+=?(char)(g_current_token);is_operator?=?true;GetNextToken();??//?eat?binopprecedence?=?g_number_val;GetNextToken();??//?eat?precedencebreak;}}std::vector<std::string>?arg_names;while?(GetNextToken()?==?TOKEN_IDENTIFIER)?{arg_names.push_back(g_identifier_str);}GetNextToken();??//?eat?)return?std::make_unique<PrototypeAST>(function_name,?arg_names,?is_operator,precedence); }修改 BinaryExprAST 的 CodeGen 處理自定義 Operator, 增加函數調用指令:
llvm::Value*?BinaryExprAST::CodeGen()?{llvm::Value*?lhs?=?lhs_->CodeGen();llvm::Value*?rhs?=?rhs_->CodeGen();switch?(op_)?{case?'<':?{llvm::Value*?tmp?=?g_ir_builder.CreateFCmpULT(lhs,?rhs,?"cmptmp");//?把?0/1?轉為?0.0/1.0return?g_ir_builder.CreateUIToFP(tmp,?llvm::Type::getDoubleTy(g_llvm_context),?"booltmp");}case?'+':?return?g_ir_builder.CreateFAdd(lhs,?rhs,?"addtmp");case?'-':?return?g_ir_builder.CreateFSub(lhs,?rhs,?"subtmp");case?'*':?return?g_ir_builder.CreateFMul(lhs,?rhs,?"multmp");default:?{//?user?defined?operatorllvm::Function*?func?=?GetFunction(std::string("binary")?+?op_);llvm::Value*?operands[2]?=?{lhs,?rhs};return?g_ir_builder.CreateCall(func,?operands,?"binop");}} }在 FunctionAST 的 CodeGen 時,注冊操作符優先級,從而讓自定義操作符被識別為操作符。
llvm::Value*?FunctionAST::CodeGen()?{PrototypeAST&?proto?=?*proto_;name2proto_ast[proto.name()]?=?std::move(proto_);??//?transfer?ownershipllvm::Function*?func?=?GetFunction(proto.name());if?(proto.IsBinaryOp())?{g_binop_precedence[proto.GetOpName()]?=?proto.op_precedence();}//?創建一個Block并且設置為指令插入位置。//?llvm?block用于定義control?flow?graph,?由于我們暫不實現control?flow,?創建//?一個單獨的block即可llvm::BasicBlock*?block?=llvm::BasicBlock::Create(g_llvm_context,?"entry",?func);g_ir_builder.SetInsertPoint(block);//?將函數參數注冊到g_named_values中,讓VariableExprAST可以codegeng_named_values.clear();for?(llvm::Value&?arg?:?func->args())?{g_named_values[arg.getName()]?=?&arg;}//?codegen?body然后returnllvm::Value*?ret_val?=?body_->CodeGen();g_ir_builder.CreateRet(ret_val);llvm::verifyFunction(*func);return?func; }輸入:
#?新增二元操作符?`>`,?優先級等于內置的?`<` def?binary>?10?(LHS?RHS)RHS?<?LHS1?>?2 2?>?1#?新增二元操作符?`|`,?優先級為5 def?binary|?5?(LHS?RHS)if?LHS?then1else?if?RHS?then1else01?|?0 0?|?1 0?|?0 1?|?1得到輸出:
parsed?a?function?definition define?double?@"binary>"(double?%LHS,?double?%RHS)?{ entry:%cmptmp?=?fcmp?ult?double?%RHS,?%LHS%booltmp?=?uitofp?i1?%cmptmp?to?doubleret?double?%booltmp }parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%binop?=?call?double?@"binary>"(double?1.000000e+00,?double?2.000000e+00)ret?double?%binop }0 parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%binop?=?call?double?@"binary>"(double?2.000000e+00,?double?1.000000e+00)ret?double?%binop }1 parsed?a?function?definition define?double?@"binary|"(double?%LHS,?double?%RHS)?{ entry:%ifcond?=?fcmp?one?double?%LHS,?0.000000e+00br?i1?%ifcond,?label?%then,?label?%elsethen:?????????????????????????????????????????????;?preds?=?%entrybr?label?%ifcont4else:?????????????????????????????????????????????;?preds?=?%entry%ifcond1?=?fcmp?one?double?%RHS,?0.000000e+00br?i1?%ifcond1,?label?%then2,?label?%else3then2:????????????????????????????????????????????;?preds?=?%elsebr?label?%ifcontelse3:????????????????????????????????????????????;?preds?=?%elsebr?label?%ifcontifcont:???????????????????????????????????????????;?preds?=?%else3,?%then2%iftmp?=?phi?double?[?1.000000e+00,?%then2?],?[?0.000000e+00,?%else3?]br?label?%ifcont4ifcont4:??????????????????????????????????????????;?preds?=?%ifcont,?%then%iftmp5?=?phi?double?[?1.000000e+00,?%then?],?[?%iftmp,?%ifcont?]ret?double?%iftmp5 }parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%binop?=?call?double?@"binary|"(double?1.000000e+00,?double?0.000000e+00)ret?double?%binop }1 parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%binop?=?call?double?@"binary|"(double?0.000000e+00,?double?1.000000e+00)ret?double?%binop }1 parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%binop?=?call?double?@"binary|"(double?0.000000e+00,?double?0.000000e+00)ret?double?%binop }0 parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%binop?=?call?double?@"binary|"(double?1.000000e+00,?double?1.000000e+00)ret?double?%binop }110. Mutable Variables
本節我們將讓 Kaleidoscope 支持可變變量,首先我們看如下 C 代碼:
int?G,?H; int?test(_Bool?Condition)?{int?X;if?(Condition)X?=?G;elseX?=?H;return?X; }由于變量 X 的值依賴于程序的執行路徑,會加入一個 phi node 來選取分支結果。上面代碼的 LLVM IR 如下:
@G?=?weak?global?i32?0???;?type?of?@G?is?i32* @H?=?weak?global?i32?0???;?type?of?@H?is?i32*define?i32?@test(i1?%Condition)?{ entry:br?i1?%Condition,?label?%cond_true,?label?%cond_falsecond_true:%X.0?=?load?i32*?@Gbr?label?%cond_nextcond_false:%X.1?=?load?i32*?@Hbr?label?%cond_nextcond_next:%X.2?=?phi?i32?[?%X.1,?%cond_false?],?[?%X.0,?%cond_true?]ret?i32?%X.2 }上面的 X 是符合 SSA 格式的,但是這里真正的難題是給可變變量賦值時怎么自動添加 phi node。我們先了解一些信息,LLVM 要求寄存器變量是 SSA 格式,但卻不允許內存對象是 SSA 格式。比如上面的例子中,G 和 H 就沒有版本號。在 LLVM 中,所有內存訪問都是顯示的 load/store 指令,并且不存在取內存地址的操作。注意上面的例子中,即使@G/@H 全局變量定義時用的 i32, 但其類型仍然是 i32*, 表示在全局數據區存放 i32 的空間地址。
現在假設我們想創建一個類似@G 但是在棧上的內存變量,基本指令如下:
define?i32?@example()?{entry:%X?=?alloca?i32???????????;?type?of?%X?is?i32*....%tmp?=?load?i32*?%X???????;?load?the?stack?value?%X?from?the?stack.%tmp2?=?add?i32?%tmp,?1???;?increment?itstore?i32?%tmp2,?i32*?%X??;?store?it?back...于是我們可以把上面使用 phi node 的 LLVM IR 改寫為使用棧上變量:
@G?=?weak?global?i32?0???;?type?of?@G?is?i32* @H?=?weak?global?i32?0???;?type?of?@H?is?i32*define?i32?@test(i1?%Condition)?{ entry:%X?=?alloca?i32???????????;?type?of?%X?is?i32*.br?i1?%Condition,?label?%cond_true,?label?%cond_falsecond_true:%X.0?=?load?i32*?@Gstore?i32?%X.0,?i32*?%X???;?Update?Xbr?label?%cond_nextcond_false:%X.1?=?load?i32*?@Hstore?i32?%X.1,?i32*?%X???;?Update?Xbr?label?%cond_nextcond_next:%X.2?=?load?i32*?%X???????;?Read?Xret?i32?%X.2 }于是我們找到了一個處理任意可變變量而且不需要創建 phi node 的辦法:
每個可變變量在棧上創建
變量讀取變為 load from stack
變量更新變為 store to stack
使用棧上地址作為變量地址
但是這會帶來一個新的問題,因為內存速度不如寄存器,大量使用棧會有性能問題。不過,LLVM 優化器有一個 pass 稱為"mem2reg", 專門將 stack 的使用自動地盡可能轉為使用 phi node, 下面為自動優化的結果:
@G?=?weak?global?i32?0 @H?=?weak?global?i32?0define?i32?@test(i1?%Condition)?{ entry:br?i1?%Condition,?label?%cond_true,?label?%cond_falsecond_true:%X.0?=?load?i32*?@Gbr?label?%cond_nextcond_false:%X.1?=?load?i32*?@Hbr?label?%cond_nextcond_next:%X.01?=?phi?i32?[?%X.1,?%cond_false?],?[?%X.0,?%cond_true?]ret?i32?%X.01}mem2reg 實現了一個稱為"iterated dominance frontier"的標準算法來自動創建 SSA 格式。對 mem2reg 的使用需要注意:
mem2reg 只能優化棧上變量,不會優化全局變量和堆上變量;
mem2reg 只優化 entry block 中的棧上變量創建, 因為在 entry block 中就意味著只創建一次;
如果對棧上變量有 load 和 store 之外的操作, mem2reg 也不會優化;
mem2reg 只能優化基本類型的棧上變量,比如指針,數值和數組。其中數組的大小必須為 1. 對于結構體和數組等的優化需要另一個稱為"sroa"的 pass。
因為我們后面需要啟用 mem2reg,我們先把優化器加回來,修改全局定義:
std::unique_ptr<llvm::Module>?g_module; std::unique_ptr<llvm::legacy::FunctionPassManager>?g_fpm;修改 ReCreateModule:
void?ReCreateModule()?{g_module?=?std::make_unique<llvm::Module>("my?cool?jit",?g_llvm_context);g_module->setDataLayout(g_jit->getTargetMachine().createDataLayout());g_fpm?=?std::make_unique<llvm::legacy::FunctionPassManager>(g_module.get());g_fpm->add(llvm::createInstructionCombiningPass());g_fpm->add(llvm::createReassociatePass());g_fpm->add(llvm::createGVNPass());g_fpm->add(llvm::createCFGSimplificationPass());g_fpm->doInitialization(); }在 FunctionAST::CodeGen 中執行優化器:
g_ir_builder.CreateRet(ret_val); llvm::verifyFunction(*func); g_fpm->run(*func);修改 main:
int?main()?{llvm::InitializeNativeTarget();llvm::InitializeNativeTargetAsmPrinter();llvm::InitializeNativeTargetAsmParser();g_jit.reset(new?llvm::orc::KaleidoscopeJIT);ReCreateModule();... }我們有兩種類型的變量,分別是函數參數以及 for 循環的變量,這里我們將這兩種變量也修改為使用內存,再讓 mem2reg 進行優化。因為所有的變量都會使用內存,修改 g_named_value 存儲的類型為 AllocaInst*:
std::map<std::string,?llvm::AllocaInst*>?g_named_values;編寫一個函數 CreateEntryBlockAlloca,簡化后續工作,其功能是往函數的 EntryBlock 的最開始的地方添加分配內存指令:
llvm::AllocaInst*?CreateEntryBlockAlloca(llvm::Function*?func,const?std::string&?var_name)?{llvm::IRBuilder<>?ir_builder(&(func->getEntryBlock()),func->getEntryBlock().begin());return?ir_builder.CreateAlloca(llvm::Type::getDoubleTy(g_llvm_context),?0,var_name.c_str()); }修改 VariableExprAST::CodeGen, 由于我們所有變量都放在內存你上,所以增加 load 指令:
llvm::Value*?VariableExprAST::CodeGen()?{llvm::AllocaInst*?val?=?g_named_values.at(name_);return?g_ir_builder.CreateLoad(val,?name_.c_str()); }接下來我們修改 for 循環里變量的 CodeGen:
llvm::Value*?ForExprAST::CodeGen()?{//?獲取當前functionllvm::Function*?func?=?g_ir_builder.GetInsertBlock()->getParent();//?將變量創建為棧上變量,不再是phi?nodellvm::AllocaInst*?var?=?CreateEntryBlockAlloca(func,?var_name_);//?codegen?startllvm::Value*?start_val?=?start_expr_->CodeGen();//?將初始值賦給varg_ir_builder.CreateStore(start_val,?var);//?新增一個loop?block到當前functionllvm::BasicBlock*?loop_block?=llvm::BasicBlock::Create(g_llvm_context,?"loop",?func);//?為當前block增加到loop_block的跳轉指令g_ir_builder.CreateBr(loop_block);//?開始在loop_block內增加指令g_ir_builder.SetInsertPoint(loop_block);//?現在我們新增了一個變量var,因為可能會被后面的代碼引用,所以要注冊到//?g_named_values中,其可能會和函數參數重名,但我們這里為了方便不管//?這個特殊情況,直接注冊到g_named_values中,g_named_values[var_name_]?=?var;//?在loop_block中增加body的指令body_expr_->CodeGen();//?codegen?step_exprllvm::Value*?step_value?=?step_expr_->CodeGen();//?var?=?var?+?step_valuellvm::Value*?cur_value?=?g_ir_builder.CreateLoad(var);llvm::Value*?next_value?=g_ir_builder.CreateFAdd(cur_value,?step_value,?"nextvar");g_ir_builder.CreateStore(next_value,?var);//?codegen?end_exprllvm::Value*?end_value?=?end_expr_->CodeGen();//?end_value?=?(end_value?!=?0.0)end_value?=?g_ir_builder.CreateFCmpONE(end_value,?llvm::ConstantFP::get(g_llvm_context,?llvm::APFloat(0.0)),"loopcond");//?和if/then/else一樣,這里的block可能會發生變化,保存當前的blockllvm::BasicBlock*?loop_end_block?=?g_ir_builder.GetInsertBlock();//?創建循環結束后的blockllvm::BasicBlock*?after_block?=llvm::BasicBlock::Create(g_llvm_context,?"afterloop",?func);//?根據end_value選擇是再來一次loop_block還是進入after_blockg_ir_builder.CreateCondBr(end_value,?loop_block,?after_block);//?給after_block增加指令g_ir_builder.SetInsertPoint(after_block);//?循環結束,避免被再次引用g_named_values.erase(var_name_);//?return?0return?llvm::Constant::getNullValue(llvm::Type::getDoubleTy(g_llvm_context)); }修改 FunctionAST::codegen()使得參數可變:
llvm::Value*?FunctionAST::CodeGen()?{PrototypeAST&?proto?=?*proto_;name2proto_ast[proto.name()]?=?std::move(proto_);??//?transfer?ownershipllvm::Function*?func?=?GetFunction(proto.name());if?(proto.IsBinaryOp())?{g_binop_precedence[proto.GetOpName()]?=?proto.op_precedence();}//?創建一個Block并且設置為指令插入位置。//?llvm?block用于定義control?flow?graph,?由于我們暫不實現control?flow,?創建//?一個單獨的block即可llvm::BasicBlock*?block?=llvm::BasicBlock::Create(g_llvm_context,?"entry",?func);g_ir_builder.SetInsertPoint(block);//?將函數參數注冊到g_named_values中,讓VariableExprAST可以codegeng_named_values.clear();for?(llvm::Value&?arg?:?func->args())?{//?為每個參數創建一個棧上變量,并賦初值,修改g_named_values使得后面的引用//?會引用這個棧上變量llvm::AllocaInst*?var?=?CreateEntryBlockAlloca(func,?arg.getName());g_ir_builder.CreateStore(&arg,?var);g_named_values[arg.getName()]?=?var;}//?codegen?body然后returnllvm::Value*?ret_val?=?body_->CodeGen();g_ir_builder.CreateRet(ret_val);llvm::verifyFunction(*func);g_fpm->run(*func);return?func; }輸入:
extern?printd(x)def?foo(x)if?x?<?3?then1elsefoo(x?-?1)?+?foo(x?-?2)for?i?=?1,?i?<?10,?1.0?inprintd(foo(i))輸出:
parsed?a?extern?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????[13/48988] declare?double?@printd(double)parsed?a?function?definition define?double?@foo(double?%x)?{ entry:%x1?=?alloca?double,?align?8store?double?%x,?double*?%x1,?align?8%cmptmp?=?fcmp?ult?double?%x,?3.000000e+00br?i1?%cmptmp,?label?%ifcont,?label?%elseelse:?????????????????????????????????????????????;?preds?=?%entry%subtmp?=?fadd?double?%x,?-1.000000e+00%calltmp?=?call?double?@foo(double?%subtmp)%subtmp5?=?fadd?double?%x,?-2.000000e+00%calltmp6?=?call?double?@foo(double?%subtmp5)%addtmp?=?fadd?double?%calltmp,?%calltmp6br?label?%ifcontifcont:???????????????????????????????????????????;?preds?=?%entry,?%else%iftmp?=?phi?double?[?%addtmp,?%else?],?[?1.000000e+00,?%entry?]ret?double?%iftmp }parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:%i?=?alloca?double,?align?8store?double?1.000000e+00,?double*?%i,?align?8br?label?%looploop:?????????????????????????????????????????????;?preds?=?%loop,?%entry%i1?=?phi?double?[?%nextvar,?%loop?],?[?1.000000e+00,?%entry?]%calltmp?=?call?double?@foo(double?%i1)%calltmp2?=?call?double?@printd(double?%calltmp)%nextvar?=?fadd?double?%i1,?1.000000e+00store?double?%nextvar,?double*?%i,?align?8%cmptmp?=?fcmp?ult?double?%nextvar,?1.000000e+01br?i1?%cmptmp,?label?%loop,?label?%afterloopafterloop:????????????????????????????????????????;?preds?=?%loopret?double?0.000000e+00 }1.000000 1.000000 2.000000 3.000000 5.000000 8.000000 13.000000 21.000000 34.000000 0可以看到,新版本的 IR 中已經沒有了 phi node, 接下來我們加入優化器:
g_fpm->add(llvm::createPromoteMemoryToRegisterPass());g_fpm->add(llvm::createInstructionCombiningPass());g_fpm->add(llvm::createReassociatePass());再次得到輸出:
parsed?a?extern declare?double?@printd(double)parsed?a?function?definition define?double?@foo(double?%x)?{ entry:%cmptmp?=?fcmp?ult?double?%x,?3.000000e+00br?i1?%cmptmp,?label?%ifcont,?label?%elseelse:?????????????????????????????????????????????;?preds?=?%entry%subtmp?=?fadd?double?%x,?-1.000000e+00%calltmp?=?call?double?@foo(double?%subtmp)%subtmp5?=?fadd?double?%x,?-2.000000e+00%calltmp6?=?call?double?@foo(double?%subtmp5)%addtmp?=?fadd?double?%calltmp,?%calltmp6br?label?%ifcontifcont:???????????????????????????????????????????;?preds?=?%entry,?%else%iftmp?=?phi?double?[?%addtmp,?%else?],?[?1.000000e+00,?%entry?]ret?double?%iftmp }parsed?a?top?level?expr define?double?@__anon_expr()?{ entry:br?label?%looploop:?????????????????????????????????????????????;?preds?=?%loop,?%entry%i1?=?phi?double?[?%nextvar,?%loop?],?[?1.000000e+00,?%entry?]%calltmp?=?call?double?@foo(double?%i1)%calltmp2?=?call?double?@printd(double?%calltmp)%nextvar?=?fadd?double?%i1,?1.000000e+00%cmptmp?=?fcmp?ult?double?%nextvar,?1.000000e+01br?i1?%cmptmp,?label?%loop,?label?%afterloopafterloop:????????????????????????????????????????;?preds?=?%loopret?double?0.000000e+00 }1.000000 1.000000 2.000000 3.000000 5.000000 8.000000 13.000000 21.000000 34.000000 0可以看到,棧上變量自動地變為寄存器變量,且 phi node 自動地被添加。
11. 完整代碼與參考資料
完整代碼見:
https://zhuanlan.zhihu.com/p/336929719
參考:
https://en.wikipedia.org/wiki/Static_single_assignment_form
https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/index.html
歡迎大家多多交流,共同進步。
總結
以上是生活随笔為你收集整理的使用 LLVM 实现一个简单编译器的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 速抢中秋月饼和红包封面!
- 下一篇: 保持生长不焦虑,非科班程序媛的进击