當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

如何将自定义代码生成TVM

發布時間：2023/11/28 生活经验 29 豆豆

生活随笔收集整理的這篇文章主要介紹了如何将自定义代码生成TVM 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

如何將自定義代碼生成TVM
如何將自定義代碼生成TVM
本文參考鏈接：
https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html
https://blog.csdn.net/weixin_42164269/article/details/104291635
簡介
深度學習針對的硬件設備的數量不斷增加，用戶需要在各種設備上實現高性能所需的知識。硬件后端提供者要么提供像MKLDNN或cuDNN類的庫，包含許多常用的深度學習運算符，要么提供諸如TensorRT這樣的框架，用戶以某種方式描述模型實現高性能。但是，用戶嘗試在新的庫或設備上工作時，必須學習新的編程接口。結果，對統一編程接口的需求變得越來越重要。
1）讓所有用戶和硬件后端提供者站在同一頁面上
2）提供一種可行的解決方案，允許專用硬件或庫僅支持具有極高性能的廣泛使用的運算符，但不支持的運算符回退到CPU / GPU等常規設備。
本文主要內容如下：
目錄
簡介

生成C代碼。
生成任何其它圖形表示。
實現一個C代碼生成器
實現【CodegenC】
運算符代碼生成
輸入變量的代碼生成
代碼發送
實現【CSourceCodegen 】
實現【GenCFunc 】
實現【CreateCSourceModule 】
注冊代碼生成
實現一個代碼生成表示
實現【ExampleJsonCodeGen 】
實現自定義runtime
實現構造函數
實現【GetFunction 】
實現運行
實現【SaveToBinary】和【LoadFromBinary 】
總結
在本開發人員指南中，演示了作為硬件后端提供者，如何輕松實現自定義代碼生成，注冊為Relay后端編譯器，支持硬件設備/庫。本文根據需要的不同圖形表示形式，涵蓋兩種類型的代碼生成器：
要生成C代碼。
如果硬件已經具有經過優化的C/C ++庫，如對CPU擁有Intel CBLAS / MKL，GPU擁有NVIDIA CUBLAS，這就是所需要的。幸運的是，C源代碼模塊與TVM runtime模塊完全兼容，生成的代碼可以由具有適當編譯標志的任何C / C ++編譯器進行編譯，唯一的任務就是實現一個為子圖生成C代碼的代碼生成器和一個C源模塊，集成到TVM runtime模塊中。在下一節中，將演示如何為硬件實現C代碼生成器。
生成任何其它圖形表示。
硬件可能需要其它形式的圖形表示形式，如JSON。在這種情況下，不僅需要實現代碼生成，還需要實現自定義的TVM runtime模塊，使TVM runtime知道應如何執行圖形表示。如果已經為硬件配備了完整的圖形執行引擎，如用于GPU的TensorRT，可以考慮采用這種解決方案。
在完成代碼生成和runtime后，可以讓客戶使用自定義標簽注釋模型使用。
實現一個C代碼生成器
在這一部分中，演示如何實現使用預實現的運算符函數，生成C代碼的代碼生成器。簡化起見，示例代碼生成器不依賴于第三方庫。相反，在C中手動實現了兩個宏：
#define CSOURCE_BINARY_OP_1D(p_ID_, p_OP_, p_DIM1_)
extern “C” void p_ID_(float* a, float* b, float* out) {
for (int64_t i = 0; i < p_DIM1_; ++i) {
out[i] = a[i] p_OP_ b[i];
}
}

#define CSOURCE_BINARY_OP_2D(p_ID_, p_OP_, p_DIM1_, p_DIM2_)
extern “C” void p_ID_(float* a, float* b, float* out) {
for (int64_t i = 0; i < p_DIM1_; ++i) {
for (int64_t j = 0; j < p_DIM2_; ++j) {
int64_t k = i * p_DIM2_ + j;
out[k] = a[k] p_OP_ b[k];
}
}
}
使用這兩個宏，可以為一維和二維張量，生成二進制運算符。如給定一個子圖如下。假設所有輸入都是二維張量，形狀為（10，10）。
c_compiler_input0
|
add <-- c_compiler_input1
|
subtract <-- c_compiler_input2
|
multiply <-- c_compiler_input3
|
out
目標是生成以下可編譯代碼執行子圖：
#include <tvm/runtime/c_runtime_api.h>
#include <tvm/runtime/packed_func.h>
#include <dlpack/dlpack.h>
#include
#include
#include

#define GCC_BINARY_OP_1D(p_ID_, p_OP_, p_DIM1_)
extern “C” void p_ID_(float* a, float* b, float* out) {
for (int64_t i = 0; i < p_DIM1_; ++i) {
out[i] = a[i] p_OP_ b[i];
}
}

#define GCC_BINARY_OP_2D(p_ID_, p_OP_, p_DIM1_, p_DIM2_)
extern “C” void p_ID_(float* a, float* b, float* out) {
for (int64_t i = 0; i < p_DIM1_; ++i) {
for (int64_t j = 0; j < p_DIM2_; ++j) {
int64_t k = i * p_DIM2_ + j;
out[k] = a[k] p_OP_ b[k];
}
}
}

// Note 1
GCC_BINARY_OP_2D(gcc_0_0, *, 10, 10);
GCC_BINARY_OP_2D(gcc_0_1, -, 10, 10);
GCC_BINARY_OP_2D(gcc_0_2, +, 10, 10);

// Note 2
extern “C” void gcc_0_(float* gcc_input0, float* gcc_input1,
float* gcc_input2, float* gcc_input3, float* out) {
float* buf_0 = (float*)malloc(4 * 100);
float* buf_1 = (float*)malloc(4 * 100);
gcc_0_2(gcc_input0, gcc_input1, buf_0);
gcc_0_1(buf_0, gcc_input2, buf_1);
gcc_0_0(buf_1, gcc_input3, out);
free(buf_0);
free(buf_1);
}

// Note 3
extern “C” int gcc_0_wrapper(DLTensor* arg0, DLTensor* arg1, DLTensor* arg2,
DLTensor* arg3, DLTensor* out) {
gcc_0_(static_cast<float*>(arg0->data), static_cast<float*>(arg1->data),
static_cast<float*>(arg2->data), static_cast<float*>(arg3->data),
static_cast<float*>(out->data));
return 0;
}
TVM_DLL_EXPORT_TYPED_FUNC(gcc_0, gcc_0_wrapper);
在這里，突出顯示上面代碼中標記的注釋：
Note1，子圖中三個節點的函數實現。
Note2，一個函數，通過分配中間緩沖區，調用相應函數執行子圖。
Note3，TVM runtime兼容的包裝函數。接受一個輸入張量和一個輸出張量的列表（最后一個參數），轉換為正確的數據類型，調用Note2中描述的子圖函數。此外，【TVM_DLL_EXPORT_TYPED_FUNC】是一個TVM宏，生成另一個函數【gcc_0】，【gcc_0】具有統一的函數參數，通過把所有的參數張量打包成【TVMArgs】。結果，TVM runtime可以直接調用gcc_0執行子圖，無需付出額外的努力。使用上面生成的代碼，TVM可以與圖的其余部分一起編譯，導出單個庫以進行部署。
在本節的其余部分，將逐步實現一個codegen生成上述代碼。自定義代碼源必須位于src/relay/backend/contrib//。在示例中，將代碼源命名為“ codegen_c”，將放在“此處https://github.com/apache/incubator-tvm/blob/master/src/relay/backend/contrib/codegen_c/codegen.cc下`_。可以隨時檢查文件，獲取完整的實現。
在此文件中實現兩個類，這是相互關系：
subgraph subgraph
TVM backend -----------------------------> CSourceCodegen -------------> CodegenC
^ | ^ |
| | | |
---------------------------------------- ------------------------
generated C source runtime module generated C code
當TVM后端在Relay中找到一個函數（子圖）時，使用已注冊的編譯器標記進行注釋（【ccompiler】在此示例中），TVM后端將調用【CSourceCodegen】，轉換該子圖。【CSourceCodegen】的成員函數【CreateCSourceModule】將
1）為子圖生成C代碼，
2）將生成的C代碼包裝到C源runtime模塊中，供TVM后端編譯和部署。
特別地，C代碼生成對于【CodegenC】類是透明的，因為提供了許多有用的實用程序，簡化代碼生成的實現。以下各節將以自底向上的順序實現這兩個類。
實現【CodegenC】
在src/relay/backend/contrib/codegen_c/codegen.cc中，先在【tvm.relay.contrib】名稱空間下，創建一個代碼生成類骨架：
#include <tvm/relay/expr_functor.h>
#include <tvm/relay/transform.h>
#include <tvm/relay/type.h>
#include <tvm/runtime/module.h>
#include <tvm/runtime/object.h>

#include
#include

#include “codegen_c.h”

namespace tvm {
namespace relay {
namespace contrib {

class CodegenC : public ExprVisitor, public CodegenCBase {
public:
explicit CodegenC(const std::string& id) { this->ext_func_id_ = id; }

void VisitExpr_(const VarNode* node) { ; }
void VisitExpr_(const CallNode* call) final { ; }
std::string JIT() { ; }

private:
/*! \brief The function id that represents a C source function. /
std::string ext_func_id_ = “”;
/! \brief The index of a wrapped C function. /
int func_idx = 0;
/! \brief The index of allocated buffers. /
int buf_idx_ = 0;
/! \brief The arguments of a C compiler compatible function. /
std::vectorstd::string ext_func_args_;
/! \brief The statements of a C compiler compatible function. /
std::vectorstd::string ext_func_body;
/! \brief The declaration statements of a C compiler compatible function. /
std::vectorstd::string func_decl_;
/! \brief The declaration statements of buffers. /
std::vectorstd::string buf_decl_;
/! \brief The name and index pairs for output. /
std::vector<std::pair<std::string, int>> out_;
}
【CodegenC】類繼承兩個類：【ExprVisitor】提供遍歷子圖，收集所需的信息并生成子圖的功能的能力，如【gcc_0_】; 【CodegenCBase】提供了生成包裝函數的功能和用法，如gcc_0上面的示例。可以看出，只需要在此codegen類中，實現三個函數即可工作。
運算符代碼生成
先實現【VisitExpr_(const CallNode call)】。遍歷子圖時，此函數訪問所有調用節點。每個調用節點都包含一個要卸載到硬件上的運算符。結果，需要按照拓撲順序使用正確的運算符，生成相應的C代碼。按以下步驟逐步實現此功能。

生成函數聲明
結果示例：【GCC_BINARY_OP_2D(gcc_0_0, , 10, 10);】
如上所示，要生成函數聲明，需要
1）函數名稱（例如gcc_0_0）
2）運算符的類型（如）
3）輸入張量形狀（如(10, 10)）。
幸運的是，可以從【CallNode】位置輕松獲取此信息：

std::ostringstream macro_stream;
std::ostringstream decl_stream;
std::ostringstream buf_stream;

// Generate a unique function name you like.
std::string func_name = ext_func_id_ + “_” + std::to_string(func_idx++);

// Make function declaration string.
macro_stream << “CSOURCE_BINARY_OP_” << call->args.size() << “D(” << func_name << ", ";

// Check the operator type.
if (IsOp(call, “add”)) {
macro_stream << “+”;
} else if (IsOp(call, “subtract”)) {
macro_stream << “-”;
} else if (IsOp(call, “multiply”)) {
macro_stream << “*”;
} else {
LOG(FATAL) << “Unrecognized op”;
}

// Extract the input tensor shape.
auto in_shape = GetShape(call->args[0]->checked_type());
for (size_t i = 0; i < in_shape.size(); ++i) {
macro_stream << ", " << in_shape[i];
}
macro_stream << “);”;
func_decl_.push_back(macro_stream.str());
可以看出，將生成的代碼放到類成員變量【func_decl_】。在完成遍歷整個子圖后，已經收集了所有必需的函數聲明，唯一需要做的就是讓由GCC進行編譯?！綱isitExpr_(const CallNode* call)】的實現，也遵循此概念。
2. 生成函數調用
結果示例：【gcc_0_0(buf_1, gcc_input3, out);】
生成函數聲明后，需要生成具有正確輸入和輸出的函數調用。要知道在調用此函數時，應放置哪些輸入或緩沖區，必須訪問參數：
bool first = true;
decl_stream << func_name << “(”;
for (size_t i = 0; i < call->args.size(); ++i) {
VisitExpr(call->args[i]); // Note 1
for (auto out : out_) {
if (!first) {
decl_stream << ", ";
}
first = false;
decl_stream << out.first;
}
}
// Note 2
同樣，要突出顯示以上代碼中的注釋：
Note1：【VisitExpr(call->args[i])】是遞歸調用，訪問當前函數的參數。參數可以是另一個節點的輸出或輸入張量。在示例實現中，確保每個節點在離開訪問器前，都更新一個類變量【out_】。這是一個例子：
arg_node arg_node <- Visit arg (Note 1) arg_node
| | |
curr_node <- Process curr_node curr_node <- Put “buf_0” as an input buffer

(a) out_ = {} (b) out_ = {} ? out_ = {(“buf_0”, 20)}
可以在上圖中看到，在訪問參數節點前，類變量【out_】為空，填充了【arg_node】輸出緩沖區的名稱和大小。結果，當完成訪問參數節點時，可以通過查看【out_】，應該放置適當的輸入緩沖區。將在本節末尾和下一節中找到更新【out_】的方式。
注意2：可能會注意到，在此步驟中沒有關閉函數調用字符串。當前的函數調用字符串，如下所示：【gcc_0_0(buf_1, gcc_input3】。這是因為沒有將最后一個參數（即輸出）放入此調用。函數調用的輸出可以是分配的臨時緩沖區，也可以是子圖輸出張量。簡化起見，在此示例中，為每個調用節點分配一個輸出緩沖區（下一步），將結果從最后一個緩沖區復制到輸出張量。
3.生成輸出緩沖區
結果示例：【float* buf_0 = (float*)malloc(4 * 100);】
如上一步所述，除了子圖輸入和輸出張量外，可能還需要緩沖區來保留中間結果。為了生成緩沖區，提取形狀信息以確定緩沖區的類型和大小：
// This example only supports single output.
auto type_node = call->checked_type().as();
CHECK(type_node != nullptr && runtime::TypeMatch(type_node->dtype, kDLFloat, 32))
<< “Only support single output tensor with float type”;

// Generate a unique buffer name.
std::string out = “buf_” + std::to_string(buf_idx_++);

// Extract the shape to be the buffer size.
auto out_shape = GetShape(call->checked_type());
int out_size = 1;
for (size_t i = 0; i < out_shape.size(); ++i) {
out_size *= out_shape[i];
}

// Make the buffer allocation and push to the buffer declarations.
buf_stream << "float* " << out << " = (float*)std::malloc(4 * " << out_size << “);”;
buf_decl_.push_back(buf_stream.str());
分配輸出緩沖區后，現在可以關閉函數調用字符串，將生成的函數調用，放到類變量【ext_func_body】。
decl_stream << ", " << out << “);”;
ext_func_body.push_back(decl_stream.str());
4. 更新輸出緩沖區
為了接受當前調用節點的輸出，作為輸入的下一個節點，知道應使用的緩沖區，需要在離開此訪問函數前，更新類變量【out_】。
out_.clear();
out_.push_back({out, out_size});
恭喜！已經完成了本文中最困難的功能。在接下來的兩節中，只需要組成此函數中的一些次要缺失部分。
輸入變量的代碼生成
回想一下，通過訪問調用節點的參數，收集輸入緩沖區的信息（上一節的第二步），處理了參數是另一個調用節點的情況（第四步）。在本節中，以【VarNode】示例，演示如何處理其它節點。
【VarNode】表示模型中的輸入張量。擁有的唯一的，但重要的信息是名稱提示（如data，weight等）。在訪問【VarNode】時，只需更新類變量【out_】傳遞名稱提示，以便后代調用節點可以生成正確的函數調用。
void VisitExpr_(const VarNode* node) {
ext_func_args_.push_back(node->name_hint());
out_.clear();
out_.push_back({node->name_hint(), 0});
}
在此示例中，假設要卸載的子圖僅具有調用節點和變量節點。如果子圖包含其它類型的節點，如TupleNode，需要訪問并繞過輸出緩沖區信息。
代碼發送
該【codegen】類的最后一部分是一個【JIT】函數，該函數為子圖發送C函數，將剛生成的C代碼用作函數體。除了前面幾節中生成的子圖函數外，需要一個包裝器函數，該函數具有統一的參數，TVM runtime可以調用和傳遞數據。幸運的是，繼承的基類已經提供了實現【JitImpl】來生成函數。例如，可以調用【JitImpl】如下：
JitImpl(“gcc_0” /* Subgraph symbol (ID) /,
{“gcc_input0”, “gcc_input1”, “gcc_input2”, “gcc_input3”} / Input arguments /,
{“float buf_0 = (float)malloc(4 * 20)”, …} / Buffer allocations /,
{“gcc_0_2(gcc_input0, gcc_input1, buf_0);”} / Function body /,
{“out”} / Output */);
上面的調用將生成三個函數（一個來自TVM包裝器宏）：

子圖函數【gcc_0_】（在函數名的末尾，還有一個下劃線），包含生成的所有C代碼以執行子圖。
裝飾函數【gcc_0__wrapper_】帶有【DLTensor】參數列表，該參數列表將數據轉換為正確的類型，調用【gcc_0_】。
TVM runtime兼容函數【gcc_0】具有TVM統一函數參數，可解壓縮TVM打包的張量，調用【gcc_0__wrapper_】。

因此，【JIT】實現過程中唯一需要做的，就是將生成的所有子圖函數代碼傳遞給【JitImpl】：
std::string JIT() {
// Write function macros
for (auto decl : func_decl_) {
code_stream_ << decl << “\n”;
}
return JitImpl(ext_func_id_, ext_func_args_, buf_decl_, ext_func_body, out_);
}
傳遞的所有的變量（【ext_func_id】等）都是類變量，在遍歷子圖時會被填充。
實現【CSourceCodegen 】
同樣，創建一個類框架，實現所需的功能。注意，繼承【CSourceModuleCodegenBase】
class CSourceCodegen : public CSourceModuleCodegenBase {
public:
// Pass a subgraph function, and generate the C code.
void GenCFunc(const Function& func) { ; }

// Use GenCFunc to generate the C code and wrap it as a C source module.
runtime::Module CreateCSourceModule(const NodeRef& ref) override { ; }

private:
std::ostringstream code_stream_;
};
實現【GenCFunc 】
【GenCFunc】只需使用【CodegenC】，只是實現遍歷Relay函數（子圖），獲得生成的C代碼即可。內置函數【GetExtSymbol】在Relay 函數中，檢索唯一的符號名稱（例如gcc_0），用作C函數名稱，因為該符號將用于DSOruntime查找。

void GenCFunc(const Function& func) {
CHECK(func.defined()) << “Input error: expect a Relay function.”;

// Record the external symbol for runtime lookup.
auto sid = GetExtSymbol(func);

CodeGenC builder(sid);
builder.VisitExpr(func->body);
code_stream_ << builder.JIT();
}
實現【CreateCSourceModule 】
該函數為外部庫創建一個runtime模塊。在此示例中，創建了一個【CSourceModule】，可以直接編譯，與TVM生成的DSOModule鏈接在一起。實現【CodegenC】后，實現此功能相對簡單：
runtime::Module CreateCSourceModule(const NodeRef& ref) override {
// Create headers
code_stream_ << “#include \n”;
code_stream_ << “#include \n”;
code_stream_ << “#include \n”;
code_stream_ << “#include <stdio.h>\n”;
code_stream_ << “#include \n”;
code_stream_ << “#include <tvm/runtime/c_runtime_api.h>\n”;
code_stream_ << “#include <dlpack/dlpack.h>\n”;

// Append some common macro for operator definition.
const char* operator_macro = R"op_macro(
#define CSOURCE_BINARY_OP_1D(p_ID_, p_OP_, p_DIM1_)
extern “C” void p_ID_(float* a, float* b, float* out) {
for (int64_t i = 0; i < p_DIM1_; ++i) {
out[i] = a[i] p_OP_ b[i];
}
}

code_stream_ << operator_macro << “\n\n”;

// Generate C code for the subgraph.
if (ref->IsInstance()) {
GenCFunc(Downcast(ref));
} else if (ref->IsInstancerelay::ModuleNode()) {
relay::Module mod = Downcastrelay::Module(ref);
for (const auto& it : mod->functions) {
GenCFunc(Downcast(it.second));
}
} else {
LOG(FATAL) << “The input ref is expected to be a Relay function or module”
<< “\n”;
}

// Create a CSourceModule
const auto* pf = runtime::Registry::Get(“module.csource_module_create”);
CHECK(pf != nullptr) << “Cannot find csource module to create the external runtime module”;
return (pf)(code_stream_.str(), “cc”);
}
注冊代碼生成
最后一步是將代碼生成器注冊到TVM后端。先實現一個簡單的函數，調用代碼生成器，生成一個runtime模塊。
runtime::Module CCompiler(const NodeRef& ref) {
CSourceCodegen csource;
return csource.CreateCSourceModule(ref);
}
最后，將此功能注冊到TVM后端：
TVM_REGISTER_GLOBAL(“relay.ext.ccompiler”).set_body_typed(CCompiler);
其中【ccompiler】是一個自定義標簽，用于TVM知道這是在用【ccompiler】注釋子圖時，應使用生成和卸載子圖的代碼生成器。
最后，一個好的做法是設置CMake配置標志，僅為客戶提供編譯器。先創建一個cmake文件【cmake/modules/contrib/CODEGENC.cmake】：
if(USE_CODEGENC)
file(GLOB CSOURCE_RELAY_CONTRIB_SRC src/relay/backend/contrib/codegen_c/codegen.cc)
list(APPEND COMPILER_SRCS ${CSOURCE_RELAY_CONTRIB_SRC})
endif(USE_CODEGENC)
這樣，用戶可以在配置TVM時，使用【config.cmake】以下命令，配置是否包括編譯器：
set(USE_CODEGENC ON)
為表示實現一個代碼生成
盡管已經演示了如何實現C代碼生成，但是硬件可能需要其它的圖形表示形式，如JSON。在這種情況下，可以修改【CodegenC】類，已經實現了自定義圖形表示，實現定制的runtime模塊，使TVM runtime知道，如何執行該圖形表示。
為了簡化，在本文中定義了一個名為“ ExampleJSON”的圖表示。ExampleJSON不是真正的JSON，僅僅是沒有控制流的圖的簡單表示。例如，假設有一個名為【subgraph_0】的子圖：
input0
|
add <-- input1
|
subtract <-- input2
|
multiply <-- input3
|
out
然后，該子圖的【ExampleJON】如下所示：
subgraph_0
input 0 10 10
input 1 10 10
input 2 10 10
input 3 10 10
add 4 inputs: 0 1 shape: 10 10
sub 5 inputs: 4 2 shape: 10 10
add 6 inputs: 5 3 shape: 10 10
【input】關鍵字聲明輸入張量的ID和形狀; 其它語句則以語法描述計算:
【 inputs: [input ID] shape: [shape]】
在本節中，目標是實現以下定制的TVM runtime模塊，執行【ExampleJSON】圖。
runtime::Module ExampleJsonCompiler(const NodeRef& ref) {
ExampleJsonCodeGen codegen(ref);
std::string code = codegen.gen(); // Note 1
const auto pf = runtime::Registry::Get(“module.examplejson_module_create”); // Note 2
CHECK(pf != nullptr) << “Cannot find ExampleJson module to create the external runtime module”;
return (*pf)(code);
}
TVM_REGISTER_GLOBAL(“relay.ext.examplejsoncompiler”).set_body_typed(ExampleJsonCompiler);
Note1：將實現自定義代碼生成，通過子圖生成ExampleJSON代碼字符串。
Note2：此行獲得指向用于創建定制runtime模塊的函數的指針?？梢钥吹讲捎昧藙倓偵傻腅xampleJSON格式的子圖代碼，初始化了runtime模塊。
在以下各節中，將介紹
1）如何實現【ExampleJsonCodeGen】
2）如何實現和注冊【examplejson_module_create】。
實現【ExampleJsonCodeGen 】
類似于C代碼生成器，從【ExprVisitor】派生了【ExampleJsonCodeGen】，利用訪問者模式，進行子圖遍歷的方法。另一方面，不需要繼承【CodegenCBase】，因為不需要TVM C ++裝飾器。codegen類的實現如下：
#include <tvm/relay/expr_functor.h>
#include <tvm/relay/transform.h>
#include <tvm/relay/type.h>
#include <tvm/runtime/module.h>
#include <tvm/runtime/object.h>

#include
#include

namespace tvm {
namespace relay {
namespace contrib {

class ExampleJsonCodeGen : public ExprVisitor {
public:
explicit ExampleJsonCodeGen();

// Note 1
void VisitExpr_(const VarNode* node) { /* Skip in this example. */ }
void VisitExpr_(const CallNode* call) final { /* Skip in this example. */ }// Note 2
std::string gen(NodeRef& ref) {this->code = "";if (ref->IsInstance<FunctionNode>()) {this->visit(Downcast<Function>(ref));} else if (ref->IsInstance<relay::ModuleNode>()) {relay::Module mod = Downcast<relay::Module>(ref);for (const auto& it : mod->functions) {this->visit(Downcast<Function>(it.second));}} else {LOG(FATAL) << "The input ref is expected to be a Relay function or module";}return this->code;
}

private:
/*! \brief The function id that represents a C source function. */
std::string code;
}
Note1：再次實現相應的訪問者函數，生成ExampleJSON代碼，將存儲到類變量【code】中（在本示例中，跳過了訪問器函數的實現，因為概念與C代碼基本相同）。完成圖訪問后，應該在【code】中有一個ExampleJSON圖。
Note2：定義了一個內部API gen，獲取子圖生成ExampleJSON代碼。該API可以采用喜歡的任意名稱。
下一步是實施自定義的runtime，輸出ExampleJsonCodeGen。
實現自定義runtime
在本節中，將逐步實現自定義的TVM runtime，將注冊到TVM runtime模塊。自定義的runtime應位于src/runtime/contrib//。在示例中，將runtime命名為“ example_ext_runtime”，將放在“ here <src / runtime / contrib / example_ext_runtime / example_ext_runtime.cc>” _下。隨時檢查此文件，獲取完整的實現。
再次，先定義一個自定義的runtime類，如下所示。該類必須從TVM派生【ModuleNode】，以便與其它TVM runtime模塊兼容。
#include <dmlc/logging.h>
#include <tvm/runtime/c_runtime_api.h>
#include <tvm/runtime/memory.h>
#include <tvm/runtime/module.h>
#include <tvm/runtime/ndarray.h>
#include <tvm/runtime/object.h>
#include <tvm/runtime/packed_func.h>
#include <tvm/runtime/registry.h>

#include
#include
#include
#include
#include
#include

namespace tvm {
namespace runtime {
class ExampleJsonModule : public ModuleNode {
public:
explicit ExampleJsonModule(std::string graph_json);

PackedFunc GetFunction(const std::string& name,
const ObjectPtr& sptr_to_self) final;

const char* type_key() const { return “examplejson”; }

void SaveToBinary(dmlc::Stream* stream) final;

static Module LoadFromBinary(void* strm);

static Module Create(const std::string& path);

std::string GetSource(const std::string& format = “”);

void Run(int id, const std::vector& inputs, int output);

void ParseJson(const std::string& json);

private:
/* \brief The json string that represents a computational graph. /
std::string graph_json_;
/ \brief The subgraph that being processed. /
std::string curr_subgraph_;
/! \brief A simple graph from subgraph id to node entries. /
std::map<std::string, std::vector > graph_;
/ \brief A simple pool to contain the tensor for each node in the graph. /
std::vector data_entry_;
/ \brief A mapping from node id to op name. */
std::vectorstd::string op_id_;
};
特別的，必須在【ExampleJsonModule】中，實現一些【ModuleNode】派生的函數：
構造函數：此類的構造函數應接受一個子圖（以表示形式），以所需的任何方式，進行處理和存儲。保存的子圖可由以下兩個函數使用。
【GetFunction】：這是此類中最重要的函數。當TVM runtime要使用編譯器標記執行子圖時，TVM runtime會從自定義runtime模塊調用此函數。提供函數名稱以及runtime參數，【GetFunction】應返回打包的函數實現，供TVM runtime執行。
【SaveToBinary】和【LoadFromBinary】：【SaveToBinary】將runtime模塊序列化為二進制格式，供以后部署。用戶使用【export_libraryAPI 】時，TVM將調用此函數。另一方面，由于現在使用自定義圖表示形式，因此必須確?！綥oadFromBinary】能夠通過采用【SaveToBinary】生成的序列化二進制文件，構造相同的runtime模塊。
【GetSource】（可選）：如果想查看生成的【ExampleJSON】代碼，可以實現此函數轉儲；否則，可以跳過實施。
其它功能和類變量將與上述必備功能的實現一起引入。
實現構造函數
explicit ExampleJsonModule(std::string graph_json) {
this->graph_json_ = graph_json;
ParseJson(this->graph_json_);
}
然后，實現【ParseJson】解析ExampleJSON格式的子圖，在內存中構造一個圖，供以后使用。由于在此示例中不支持帶有分支的子圖，因此僅使用數組，按順序存儲子圖中的每個節點。
void ParseJson(const std::string& json) {
std::string line;
std::string curr_subgraph;
std::stringstream ss(json);

while (std::getline(ss, line, ‘\n’)) {
std::stringstream ss2(line);
std::string token;
int id = 0;

ss2 >> token;
if (token.find("subgraph_") != std::string::npos) {curr_subgraph = token;continue;
}ss2 >> id;
if (op_id_.size() <= static_cast<size_t>(id)) {op_id_.resize(id + 1);data_entry_.resize(id + 1);
}int64_t total_elements = 1;
std::vector<int64_t> shape;
if (token == "input") {int64_t size = 0;while (ss2 >> size) {total_elements *= size;shape.push_back(size);}
} else {op_id_[id] = token; // Note 1bool shape_data = false;NodeEntry entry;while (ss2 >> token) {if (token == "shape:") {shape_data = true;} else if (shape_data) {total_elements *= std::stoll(token);shape.push_back(std::stoll(token));} else if (token != "inputs:") {entry.inputs.push_back(std::stoi(token));}}entry.id = id;entry.output = id;graph_[curr_subgraph].push_back(entry); // Note 2
}
DLContext ctx;
ctx.device_type = static_cast<DLDeviceType>(1);
ctx.device_id = 0;
data_entry_[id] = NDArray::Empty(shape, DLDataType{kDLFloat, 32, 1}, ctx); // Note 3

}
}
Note1：使用類變量【op_id_】將子圖節點ID，映射到運算符名稱（如【add】），可以在runtime調用相應的運算符函數。
Note2：使用類變量【graph_】，將子圖名稱映射到節點數組?！綠etFunction】將在runtime通過子圖ID查詢圖節點。
Note3：使用類變量【data_entry_】，將子圖節點ID映射到張量數據占位符。將在runtime將輸入和輸出放入相應的數據條目。
實現【GetFunction 】
構造后，應該準備好上述類變量。然后，實現【GetFunction】為TVM runtime，提供可執行的子圖函數：
PackedFunc GetFunction(const std::string& name,
const ObjectPtr& sptr_to_self) final {
if (this->graph_.find(name) != this->graph_.end()) {
this->curr_subgraph_ = name;
return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) {

  // Copy input tensors to corresponding data entries.for (auto i = 0; i < args.size(); ++i) {CHECK(args[i].type_code() == kNDArrayContainer || args[i].type_code() == kArrayHandle)<< "Expect NDArray or DLTensor as inputs\n";if (args[i].type_code() == kArrayHandle) {DLTensor* arg = args[i];this->data_entry_[i].CopyFrom(arg);} else {NDArray arg = args[i];this->data_entry_[i].CopyFrom(arg);}}// Execute the subgraph.for (const auto& it : this->graph_[this->curr_subgraph_]) {this->Run(it.id, it.inputs, it.output);}CHECK_GT(graph_.count(this->curr_subgraph_), 0U);// Copy the output from a data entry back to TVM runtime argument.auto out_idx = graph_[this->curr_subgraph_].back().output;if (args[args.size() - 1].type_code() == kArrayHandle) {DLTensor* arg = args[args.size() - 1];this->data_entry_[out_idx].CopyTo(arg);} else {NDArray arg = args[args.size() - 1];this->data_entry_[out_idx].CopyTo(arg);}*rv = data_entry_.back();
});

} else {
LOG(FATAL) << "Unknown subgraph: " << name << “\n”;
return PackedFunc();
}
}
可以看出，【GetFunction】由三個主要部分組成。第一部分將數據從TVM runtime參數復制到在構造函數中分配的相應數據條目。第二部分使用【Run】函數（將在以后實現）執行子圖，將結果保存到另一個數據條目中。第三部分將結果從輸出數據條目，復制回相應的TVM runtime參數進行輸出。
實現運行
現在讓實現【Run】函數。此函數接受：
1）一個子圖ID；
2）輸入數據條目索引的列表
3）輸出數據條目索引。
void Run(int id, const std::vector& inputs, int output) {
// Make a list data entry indexs.
std::vector args(inputs.begin(), inputs.end());
args.push_back(output);

// Initialize data holders.
std::vector values(args.size());
std::vector type_codes(args.size());

// Initialize a TVM arg setter with TVMValue and its type code.
TVMArgsSetter setter(values.data(), type_codes.data());

// Set each argument to its corresponding data entry.
if (op_id_[id] == “add” || op_id_[id] == “sub” || op_id_[id] == “mul”) {
for (size_t i = 0; i < args.size(); i++) {
setter(i, data_entry_[args[i]]);
}
}

// Invoke the corresponding operator function.
if (op_id_[id] == “add”) {
Add(values.data(), type_codes.data(), args.size());
} else if (op_id_[id] == “sub”) {
Sub(values.data(), type_codes.data(), args.size());
} else if (op_id_[id] == “mul”) {
Mul(values.data(), type_codes.data(), args.size());
} else {
LOG(FATAL) << "Unknown op: " << op_id_[id] << “\n”;
}
}
【Run】函數主要有兩個部分。第一部分分配一個【TVMValue】列表，映射相應的數據條目塊。這將成為運算符函數的參數。第二部分將調用運算符函數。雖然使用與前面的例子相同的C函數，可以用自定義引擎更換Add，Sub及Mul。只需要確保引擎將結果存儲到最后一個參數，就可以將傳輸回TVM runtime。
通過實現上述功能，自定義的代碼生成和runtime現在可以執行子圖。最后一步是注冊API（【examplejson_module_create】）以創建此模塊：
TVM_REGISTER_GLOBAL(“module.examplejson_module_create”)
.set_body_typed([](std::string code){
auto n = make_object(code);
return runtime::Module(n);
});
實現【SaveToBinary】和【LoadFromBinary 】
到目前為止，已經實現了自定義runtime的主要功能，可以用作其它TVM runtime。但是，當用戶要將已構建的runtime保存到磁盤以進行部署時，TVM不知道如何保存。這就是要實現【SaveToBinary】和【LoadFromBinary】的原因，告訴TVM如何保留和恢復自定義的runtime。
先實現【SaveToBinary】，允許用戶將該模塊保存在磁盤中的功能。
void SaveToBinary(dmlc::Stream* stream) final {
stream->Write(this->graph_json_);
}
可以發現此函數非常簡單?；叵胍幌?#xff0c;在構造函數中使用的唯一參數是一個子圖表示，只需要一個子圖表示，即可構造/恢復此定制的runtime模塊。結果，【SaveToBinary】只需將子圖寫入輸出DMLC流。當用戶使用【export_library】API導出模塊時，自定義模塊將是子圖的ExampleJSON流。
同樣，【LoadFromBinary】讀取子圖流，重新構建自定義的runtime模塊：
static Module LoadFromBinary(void* strm) {
dmlc::Stream* stream = static_castdmlc::Stream*(strm);
std::string graph_json;
stream->Read(&graph_json);
auto n = tvm::runtime::make_object(graph_json);
return Module(n);
}
需要注冊此函數，啟用相應的Python API：
TVM_REGISTER_GLOBAL(“module.loadbinary_examplejson”)
.set_body_typed(ExampleJsonModule::LoadFromBinary);
上面的注冊當用戶調用【tvm.runtime.load(lib_path)】API，導出的庫具有ExampleJSON流時，【LoadFromBinary】將被調用，創建相同的自定義runtime模塊。
另外，如果想直接從ExampleJSON文件支持模塊創建，可以實現一個簡單的函數，注冊Python API，如下所示：
static Module Create(const std::string& path) {
std::ifstream filep;
filep.open(path, std::ios::in);
std::string graph_json;
std::string line;
while (std::getline(filep, line)) {
graph_json += line;
graph_json += “\n”;
}
filep.close();
auto n = tvm::runtime::make_object(graph_json);
return Module(n);
}

TVM_REGISTER_GLOBAL(“module.loadfile_examplejson”)
.set_body([](TVMArgs args, TVMRetValue* rv) {
*rv = ExampleJsonModule::Create(args[0]);
});
用戶可以手動編寫/修改ExampleJSON文件，使用Python API 【tvm.runtime.load(“mysubgraph.examplejson”, “examplejson”)】構造自定義模塊。

總結
總之，這是一份清單供參考：
派生自【ExprVisitor】和【CodegenCBase】的代碼生成類和（僅對于C代碼生成），具有以下函數。
【VisitExpr_(const CallNode* call)】收集調用節點信息。
收集子圖信息所需的其它訪問器函數。
【JIT 】生成子圖代碼。
注冊代碼生成器。
創建【CSourceModule】的函數（用于C代碼生成）。
從【ModuleNode】派生的runtime模塊類，具有下面的函數（用于圖形表示）。
構造函數。
【GetFunction】生成TVM runtime兼容的【PackedFunc】。
【Run 】執行子圖。
注冊runtime創建API。
【SaveToBinary】和【LoadFromBinary】序列化/反序列化自定義的runtime模塊。
注冊【LoadFromBinary】API以支持【tvm.runtime.load(your_module_lib_path)】。
（可選）【Create】從表示中的子圖文件，支持定制的runtime模塊構造。
一個用于對用戶Relay程序進行注釋的注釋器，利用編譯器和runtime（TBA）。

參考鏈接：
https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html
https://blog.csdn.net/weixin_42164269/article/details/104291635

總結

以上是生活随笔為你收集整理的如何将自定义代码生成TVM的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： TVM代码流程分析
下一篇：英伟达TensorRT 8-bit In

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

生活经验

如何将自定义代码生成TVM

總結