當(dāng)前位置：首頁(yè) > 人文社科 > 生活经验 >内容正文

生活经验

TVM代码生成codegen

發(fā)布時(shí)間：2023/11/28 生活经验 51 豆豆

生活随笔收集整理的這篇文章主要介紹了 TVM代码生成codegen 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

TVM代碼生成codegen
硬件后端提供程序（例如Intel，NVIDIA，ARM等），提供諸如cuBLAS或cuDNN之類的內(nèi)核庫(kù)以及許多常用的深度學(xué)習(xí)內(nèi)核，或者提供框架例，如帶有圖形引擎的DNNL或TensorRT，使用戶以某種方式描述模型，實(shí)現(xiàn)高性能。此外，新興的深度學(xué)習(xí)加速器還具有自己的編譯器，內(nèi)核庫(kù)或runtime框架。
當(dāng)用戶嘗試在新的內(nèi)核庫(kù)或設(shè)備上工作時(shí)，必須學(xué)習(xí)新的編程接口。對(duì)統(tǒng)一編程接口的需求變得越來(lái)越重要，使所有用戶和硬件后端提供程序都在同一頁(yè)面上。
為了與廣泛使用的深度學(xué)習(xí)框架共享編程接口，許多硬件設(shè)備提供商已嘗試將其設(shè)備后端集成到TensorFlow。由于TensorFlow沒有為新的后端提供正式的后端接口，必須破解TensorFlow進(jìn)行注冊(cè)，這需要對(duì)許多源文件進(jìn)行更改，從而使將來(lái)的維護(hù)變得困難。
本文演示了作為硬件后端提供程序，如何輕松利用自帶代碼生成（BYOC）框架，將硬件設(shè)備的內(nèi)核庫(kù)/編譯器/框架集成到TVM。利用BYOC框架的最重要優(yōu)點(diǎn)，設(shè)備的所有相關(guān)源文件都是獨(dú)立的，設(shè)備的代碼源/Runtime可插入TVM代碼庫(kù)。這意味著
1）使用代碼源的TVM代碼庫(kù)將在上游兼容
2）TVM用戶可以根據(jù)需要選擇啟用代碼源/runtime。
在本文的其余部分中，首先說(shuō)明可能需要帶有BYOC的TVM的情況，然后概述BYOC編譯和runtime流程。然后，分步說(shuō)明如何使用英特爾DNNL（又名MKL-DNN，OneDNN），作為運(yùn)行示例，將供應(yīng)商庫(kù)或執(zhí)行引擎與BYOC集成到TVM。
將ASIC加速器帶入TVM
首先，做一個(gè)場(chǎng)景來(lái)說(shuō)明，為什么要將加速器引入TVM，以及BYOC框架可以期待哪些功能。
想象一下，剛剛構(gòu)建了一個(gè)具有ARM CPU和出色的加速器的邊緣設(shè)備平臺(tái)，該平臺(tái)為常見的圖像分類模型，提供了出色的性能。換句話說(shuō)，加速器在Conv2D，ReLU，GEMM和其他廣泛使用的CNN算子上表現(xiàn)良好。
不幸的是，目標(biāo)檢測(cè)模型也越來(lái)越受歡迎，并且客戶需要在平臺(tái)上同時(shí)運(yùn)行圖像分類和目標(biāo)檢測(cè)模型。盡管加速器能夠執(zhí)行目標(biāo)檢測(cè)模型中的幾乎所有算子，但缺少一個(gè)算子（例如，非最大抑制，NMS）。
讓TVM執(zhí)行不受支持的算子
由于TVM具有用于不同后端的多個(gè)代碼源，開源社區(qū)很容易在短時(shí)間內(nèi)在CPU或GPU上實(shí)現(xiàn)新的算子。理想情況下，如果將加速器的編譯流程與BYOC集成到TVM，TVM將執(zhí)行Relay圖分區(qū)，以將部分圖卸載到加速器，而其它圖保持在TVM上。因此，可以申明平臺(tái)能夠運(yùn)行所有模型，而不必?fù)?dān)心新的算子。
自定義圖形級(jí)優(yōu)化
ASIC加速器必須具有編譯流程。可能是以下情況之一：
生成圖形表示并將其提供給圖形引擎：可能擁有圖形引擎，該引擎能夠在加速器上執(zhí)行圖形（或神經(jīng)網(wǎng)絡(luò)模型）。例如，英特爾DNNL和NVIDIA TensorRT都使用引擎來(lái)運(yùn)行整個(gè)圖形或模型，因此能夠1）減少算子之間的內(nèi)存事務(wù)，以及2）通過(guò)算子融合優(yōu)化圖形執(zhí)行。
為了實(shí)現(xiàn)以上兩個(gè)優(yōu)化，需要在編譯期間處理圖形。例如，Conv2D和偏差加法是TVM中的兩個(gè)單獨(dú)的算子，可能是加速器上的一個(gè)算子（具有偏差加法功能的Conv2D）。在這種情況下，需要通過(guò)將conv2d - add圖形模式替換為your_conv2d_with_bias節(jié)點(diǎn)來(lái)優(yōu)化圖形。
如果編譯流程屬于這種情況，建議閱讀本文中的所有其余部分，但跳過(guò)將DNNL帶到TVM：C源代碼生成。
生成匯編代碼并將其編譯為可執(zhí)行的二進(jìn)制文件：如果沒有像前面那樣的平臺(tái)的端到端執(zhí)行框架，則可能有編譯器以ISA的匯編代碼編譯程序。為了將匯編代碼提供給編譯器，將需要一個(gè)代碼生成器來(lái)從Relay圖生成和優(yōu)化匯編代碼。
如果編譯流程屬于這種情況，建議閱讀本文中的所有其余部分，但跳過(guò)將DNNL引入TVM：JSON Codegen / Runtime。
BYOC的工作方式
簡(jiǎn)要解釋BYOC框架是如何工作的。簡(jiǎn)而言之，給定圖1中的Relay圖，BYOC框架執(zhí)行以下步驟：

圖1：原始Relay圖。
1.圖注解
制作用戶提供的Relay圖，第一步是在圖中注釋可能卸載到加速器的節(jié)點(diǎn)。需要遵循“將DNNL引入TVM：注釋規(guī)則”，實(shí)現(xiàn)受支持的算子的白名單，或定制組合算子的圖形模式列表。示例注釋結(jié)果如圖2所示。

圖2：帶注解的圖。
2.圖變換
第二步是基于注釋對(duì)圖形進(jìn)行轉(zhuǎn)換和優(yōu)化。具體來(lái)說(shuō)，BYOC執(zhí)行以下轉(zhuǎn)換。
2.1：合并編譯器區(qū)域：如圖2所示，圖中現(xiàn)在有許多“區(qū)域”可以卸載到加速器中，實(shí)際上可以合并其中的一些區(qū)域，減少數(shù)據(jù)傳輸和內(nèi)核啟動(dòng)開銷。因此，步驟2.1使用貪婪算法來(lái)合并盡可能多的那些區(qū)域，同時(shí)保證功能正確性。結(jié)果如圖3所示。

圖3：合并編譯器區(qū)域后。
2.2：分區(qū)圖：對(duì)于上一步中的每個(gè)區(qū)域，創(chuàng)建一個(gè)帶有屬性的Relay函數(shù)，Compiler以指示該Relay函數(shù)應(yīng)該完全卸載到加速器上，如圖4所示。

圖4：圖分區(qū)之后。
3.代碼生成
現(xiàn)在知道應(yīng)該卸載Relay圖的哪一部分了。將每個(gè)Relay功能依次發(fā)送Compiler=your_accelerator到代碼生成器。代碼生成器應(yīng)將Relay函數(shù)編譯為與編譯流程相匹配的形式。可以是C源代碼或任何文本格式。
最后，所有已編譯的函數(shù)將與其它未卸載的Relay函數(shù)一起.so由TVM export_libraryPython API序列化為單個(gè)文件。換句話說(shuō)，.so運(yùn)行此流程后，用戶將僅獲得一個(gè)文件。
4.runtime
需要實(shí)現(xiàn)Runtime以初始化圖形引擎（如果適用）并執(zhí)行已編譯的函數(shù)。在推理期間，當(dāng)TVMRuntime遇到圖4中的相應(yīng)函數(shù)調(diào)用時(shí)，TVM Runtime（即圖形Runtime或VM）將利用Runtime來(lái)調(diào)用已卸載的函數(shù)。Runtime負(fù)責(zé)使用給定的輸入張量啟動(dòng)編譯后的函數(shù)。數(shù)組并將結(jié)果填充到輸出張量數(shù)組中。
在本文的其余部分，以DNNL為例，演示如何使用BYOC框架實(shí)現(xiàn)上述工作流程。本文中所有引用的代碼和行號(hào)均基于TVM存儲(chǔ)庫(kù)的master分支commit 8a0249c。
將DNNL帶到TVM：注釋規(guī)則
BYOC框架提供了兩種描述受支持的算子和模式的方法，可以同時(shí)使用。以DNNL為例，說(shuō)明如何使用。將代碼源的注釋規(guī)則放在下python/tvm/relay/op/contrib/your_codegen_name.py。
單一運(yùn)營(yíng)商規(guī)則
可以使用BYOC API直觀地指定加速器支持哪些Relay算子。例如，使用以下代碼段構(gòu)建一條規(guī)則，該規(guī)則說(shuō)DNNL代碼源支持Conv2D：
@tvm.ir.register_op_attr(“nn.conv2d”, “target.dnnl”)
def _dnnl_conv2d_wrapper(attrs, args):
return True
這target.dnnl將向Relaynn.conv2d算子注冊(cè)一個(gè)新屬性。通過(guò)這種方式，BYOC注釋可以target.dnnl()為圖中的每個(gè)算子調(diào)用以檢查DNNL代碼源中是否支持。
另一方面，為每個(gè)算子編寫上面的代碼段可能很繁瑣。對(duì)于DNNL實(shí)施，實(shí)現(xiàn)了一個(gè)輔助函數(shù)_register_external_op_helper，更簡(jiǎn)潔：
def _register_external_op_helper(op_name, supported=True):
@tvm.ir.register_op_attr(op_name, “target.dnnl”)
def _func_wrapper(attrs, args):
return supported
return _func_wrapper

_register_external_op_helper(“nn.batch_norm”)
_register_external_op_helper(“nn.conv2d”)
_register_external_op_helper(“nn.dense”)
_register_external_op_helper(“nn.relu”)
_register_external_op_helper(“add”)
_register_external_op_helper(“subtract”)
_register_external_op_helper(“multiply”)
在上面的示例中，指定了DNNL代碼源可以支持的算子列表。
圖形模式規(guī)則
加速器或編譯器可能已將某些模式（例如Conv2D + add + ReLU）優(yōu)化為單個(gè)指令或API。在這種情況下，可以指定從圖形模式到指令/ API的映射。對(duì)于DNNL，Conv2D API已經(jīng)包含了偏差加法，并且允許連接下一個(gè)ReLU，可以將DNNL稱為以下代碼片段：
DNNLConv2d(const bool has_bias = false, const bool has_relu = false) {
// … skip …
auto conv_desc = dnnl::convolution_forward::desc(
dnnl::prop_kind::forward_inference,
dnnl::algorithm::convolution_direct,
conv_src_md, conv_weights_md, conv_bias_md, conv_dst_md,
strides_dims, padding_dims_l, padding_dims_r);

// Attach ReLU
dnnl::primitive_attr attr;
if (has_relu) {
dnnl::post_ops ops;
ops.append_eltwise(1.f, dnnl::algorithm::eltwise_relu, 0.f, 0.f);
attr.set_post_ops(ops);
}

auto conv2d_prim_desc = dnnl::convolution_forward::primitive_desc(
conv_desc, attr, engine_);
// … skip …
在這種情況下，除了用于單個(gè)conv2d，想映射圖模式conv2d+relu到DNNLConv2d(false, true)，并映射conv2d+add+relu到DNNLConv2d(true, true)。可以使用以下代碼片段實(shí)現(xiàn)此目的：
def make_pattern(with_bias=True):
data = wildcard()
weight = wildcard()
bias = wildcard()
conv = is_op(‘nn.conv2d’)(data, weight)
if with_bias:
conv_out = is_op(‘a(chǎn)dd’)(conv, bias)
else:
conv_out = conv
return is_op(‘nn.relu’)(conv_out)

@register_pattern_table(“dnnl”)
def pattern_table():
conv2d_bias_relu_pat = (“dnnl.conv2d_bias_relu”, make_pattern(with_bias=True))
conv2d_relu_pat = (“dnnl.conv2d_relu”, make_pattern(with_bias=False))
dnnl_patterns = [conv2d_bias_relu_pat, conv2d_relu_pat]
return dnnl_patterns
在DNNL示例中，實(shí)現(xiàn)了兩個(gè)具有不同名稱的模式，以便可以輕松地在代碼生成中識(shí)別。注意，這些模式以Relay模式語(yǔ)言實(shí)現(xiàn)。
使用模式表，然后可以使用從Relay傳遞來(lái)執(zhí)行
%1 = nn.conv2d(%data, %weight, …)
%2 = add(%1, %bias)
%3 = nn.relu(%2)
至
%1 = fn(%input1, %input2, %input3,
Composite=“dnnl.conv2d_bias_relu”,
PartitionedFromPattern=“nn.conv2d_add_nn.relu_”) {
%1 = nn.conv2d(%input1, %input2, …)
%2 = add(%1, %input3)
nn.relu(%2)
}
%2 = %1(%data, %weight, %bias)
因此，DNNL代碼生成器可以獲取模式名稱conv2d_bias_relu并映射%1到DNNLConv2d(true, true)。
復(fù)合函數(shù)中還有一個(gè)名為“ PartitionedFromPattern”的屬性。如果模式包含wildcard算子，這可能會(huì)有所幫助。例如，可能有一個(gè)模式表(“conv2d_with_something”, conv2d -> *)：
def make_pattern(with_bias=True):
data = wildcard()
weight = wildcard()
conv = is_op(‘nn.conv2d’)(data, weight)
return wildcard()(conv)
在這種情況下，將獲得帶有的復(fù)合函數(shù)Composite=conv2d_with_something，但是不知道實(shí)際匹配的圖形。那就是PartitionedFromPattern起作用的地方。通過(guò)查看匹配圖是否為conv2d -> add或conv2d -> relu，可以知道是否PartitionedFromPattern為nn.conv2d_add_或nn.conv2d_nn.relu_。
將DNNL引入TVM：Relay圖轉(zhuǎn)換
利用上一步中的注釋規(guī)則，現(xiàn)在可以應(yīng)用BYOCRelay傳遞列表，以將Relay圖從圖1轉(zhuǎn)換為圖4：
mod = create_relay_module_from_model() # Output: Figure 1
mod = transform.MergeComposite(pattern_table)(mod)
mod = transform.AnnotateTarget([“dnnl”])(mod) # Output: Figure 2
mod = transform.MergeCompilerRegions()(mod) # Output: Figure 3
mod = transform.PartitionGraph()(mod) # Output: Figure 4
可以看出，每個(gè)Relay傳遞都可以映射到在BYOC工作原理中引入的步驟。
將DNNL引入TVM：JSON代碼生成/Runtime
現(xiàn)在，實(shí)現(xiàn)將Relay圖序列化為JSON表示的DNNL代碼源，然后實(shí)現(xiàn)DNNL JSONRuntime以反序列化并執(zhí)行該圖。如果嘗試實(shí)現(xiàn)一個(gè)代碼生成器來(lái)生成C兼容程序，則可能需要直接進(jìn)入下一部分。
為了使DNNL JSON的代碼生成/運(yùn)行在TVM就這個(gè)例子中工作，確保DNNL可以在機(jī)器上，并與建立TVMset(USE_DNNL_CODEGEN ON)中config.cmake。
DNNL代碼生成在中實(shí)現(xiàn)src/relay/backend/contrib/dnnl/codegen.cc。在此文件中以兩種形式實(shí)現(xiàn)了DNNLUSE_JSON_RUNTIME代碼生成，在跟蹤代碼時(shí)，可以專注于宏所覆蓋的部分。
首先使用TVM注冊(cè)API（L510），注冊(cè)代碼源。該注冊(cè)使TVM編譯引擎使用Compiler= 來(lái)分發(fā)Relay功能relay.ext.。然后，實(shí)現(xiàn)DNNL編譯器（L490）的入口函數(shù)。閱讀代碼段中嵌入的注釋以獲取詳細(xì)信息：
runtime::Module DNNLCompiler(const ObjectRef& ref) {
// “ref” should be the paritioned Relay function with kCompiler=dnnl.
CHECK(ref->IsInstance());
auto func = Downcast(ref);

// Get the function name as the symbol to match in runtime.
auto func_name = GetExtSymbol(func);

// Serialize the function to a JSON string (introduce later).
DNNLJSONSerializer serializer(func_name, func);
serializer.serialize();
std::string graph_json = serializer.GetJSON();

// The constant tensor names that have been bound to the module.
// All constant tensors will be serialzied along with the JSON graph
// when export_library is invoked.
auto params = serializer.GetParams();

// The function to create DNNL JSON runtime (introduce later).
const auto* pf = runtime::Registry::Get(“runtime.DNNLJSONRuntimeCreate”);
CHECK(pf != nullptr) << “Cannot find JSON runtime module to create”;

// Create a DNNL runtime module that can run the serialized function.
auto mod = (*pf)(func_name, graph_json, params);
return mod;
}
TVM_REGISTER_GLOBAL(“relay.ext.dnnl”).set_body_typed(DNNLCompiler);
注意，每個(gè)Runtime模塊僅負(fù)責(zé)一個(gè)Relay功能，這意味著可能在單個(gè).so文件中包含多個(gè)DNNLRuntime模塊。
DNNL JSON序列化
接下來(lái)，實(shí)現(xiàn)DNNL JSON序列化器（L429）。從BYOC JSON代碼生成器（src / relay / backend / contrib / codegen_json / codegen_json.h）派生了它。DNNL JSON序列化程序中的特殊過(guò)程嘗試，將組合函數(shù)調(diào)用序列化為DNNL JSON Runtime，可以解釋的JSON節(jié)點(diǎn)。假設(shè)有一個(gè)與pattern匹配的復(fù)合函數(shù)dnnl.conv2d_relu，那么BYOC JSON代碼生成器將生成以下JSON節(jié)點(diǎn)：
{
op: “kernel”,
name: “dnnl.conv2d_relu”,
inputs: [[0, 0, 0], [1, 0, 0]],
attrs: {
PartitionedFromPattern: [“nn.conv2d_nn.relu_”],
shape: [1, 32, 14, 14]
}
}
問(wèn)題在于，在Runtime仍然需要Conv2D屬性，例如padding和stride，但是BYOC JSON序列化器僅附加復(fù)合函數(shù)的屬性，而不附加主體算子。另一方面，定制的DNNL JSON序列化程序?qū)⒌谝粋€(gè)也是唯一的Conv2D的屬性附加到復(fù)合函數(shù)中，以生成以下JSON節(jié)點(diǎn)：
{
op: “kernel”,
name: “dnnl.conv2d_relu”,
inputs: [[0, 0, 0], [1, 0, 0]],
attrs: {
shape: [1, 32, 14, 14],
data_layout: [“NCHW”],
kernel_layout: [“OIHW”],
strides: [1, 1],
padding: [1, 1, 1, 1]
}
}
從DNNL JSON序列化器可以看出，可以自定義序列化器以生成JSON中的任何形式，只要JSON Runtime可以解釋它們即可。
DNNL JSON Runtime
然后，實(shí)現(xiàn)DNNL JSON Runtime以解釋和執(zhí)行序列化的JSON圖。放在下面src/runtime/contrib/dnnl/dnnl_json_runtime.cc。
同樣，首先注冊(cè)兩個(gè)API來(lái)創(chuàng)建Runtime，以便可以在任何地方使用。在runtime.DNNLJSONRuntimeCreate被序列化后的上一部分中使用，并且runtime.module.loadbinary_dnnl_json裝載時(shí)也可以使用.so了。
// Create a DNNL JSON runtime to interpret and execute the given JSON graph.
runtime::Module DNNLJSONRuntimeCreate(String symbol_name, String graph_json,
const Array& const_names) {
auto n = make_object(symbol_name, graph_json, const_names);
return runtime::Module(n);
}
TVM_REGISTER_GLOBAL(“runtime.DNNLJSONRuntimeCreate”)
.set_body_typed(DNNLJSONRuntimeCreate);

TVM_REGISTER_GLOBAL(“runtime.module.loadbinary_dnnl_json”)
.set_body_typed(JSONRuntimeBase::LoadFromBinary);
現(xiàn)在，解釋DNNL JSON Runtime實(shí)現(xiàn)。基本的類結(jié)構(gòu)為：
class DNNLJSONRuntime : public JSONRuntimeBase {
const char* type_key() const { return “dnnl_json”; }
void Init(const Array& consts) override {
// Initialize the DNNL graph engine.
BuildEngine();

// Setup constants entries for weights.
CHECK_EQ(consts.size(), const_idx_.size())<< "The number of input constants must match the number of required.";
SetupConstants(consts);

}

void Run() override {
// 1. Fill in the input buffers.
// 2. Invoke the engine through intepreting the stream.
// 3. Read and fill output buffers.
}
}
該Init功能是負(fù)責(zé)通過(guò)解釋JSON圖形字符串，建設(shè)DNNL引擎（見L93的BuildEngine），并填補(bǔ)了固定的權(quán)重，以相應(yīng)的數(shù)據(jù)輸入緩沖區(qū)（SetupConstant在JSON運(yùn)行基類來(lái)實(shí)現(xiàn)，需要調(diào)用它在Init）。注意，即使運(yùn)行了多次推理，該函數(shù)也只會(huì)被調(diào)用一次。
接下來(lái)，Run函數(shù)（L64）首先將輸入張量（可能來(lái)自用戶輸入或恒定權(quán)重）寫入在構(gòu)建DNNL引擎時(shí)初始化的相應(yīng)DNNL存儲(chǔ)緩沖區(qū)。然后啟動(dòng)DNNL引擎以執(zhí)行JSON圖。最后，將DNNL輸出存儲(chǔ)緩沖區(qū)寫回到相應(yīng)的輸出張量。
由于DNNL JSONRuntime中的其余實(shí)現(xiàn)都是DNNL特有的，不再細(xì)說(shuō)。想強(qiáng)調(diào)一點(diǎn)，盡管DNNL JSONRuntime是一個(gè)很好的開始，但JSON Runtime可以完全自定義以滿足要求。
將DNNL帶到TVM：C源代碼生成
現(xiàn)在，讓實(shí)現(xiàn)DNNL代碼生成器，該代碼生成器生成C源代碼，該源代碼調(diào)用DNNL API來(lái)執(zhí)行Relay圖。注意，如果嘗試實(shí)現(xiàn)一個(gè)代碼生成器，生成其它圖形表示形式（如JSON格式），則可能需要閱讀將DNNL帶到TVM：JSON代碼生成器/Runtime，并跳過(guò)本節(jié)。
為了能夠在TVM CODEGEN對(duì)這個(gè)例子的工作DNNL C源代碼，確保DNNL可以在機(jī)器上，并與建立TVMset(USE_DNNL_CODEGEN C_SRC)中config.cmake。
DNNL代碼生成在中實(shí)現(xiàn)src/relay/backend/contrib/dnnl/codegen.cc。由于在這個(gè)文件用于說(shuō)明目的實(shí)現(xiàn)的代碼，生成DNNL兩種形式，可以專注于部分不被覆蓋USE_JSON_RUNTIME宏跟蹤代碼時(shí)。
首先，使用TVM注冊(cè)API（L510）注冊(cè)代碼源。該注冊(cè)使TVM編譯引擎使用Compiler= 來(lái)分發(fā)Relay功能relay.ext.。然后，實(shí)現(xiàn)DNNL編譯器的入口函數(shù)（L490）：
runtime::Module DNNLCompiler(const ObjectRef& ref) {
DNNLModuleCodegen dnnl;
return dnnl.CreateCSourceModule(ref);
}
TVM_REGISTER_GLOBAL(“relay.ext.dnnl”).set_body_typed(DNNLCompiler);
注意，每個(gè)Runtime模塊僅負(fù)責(zé)一個(gè)Relay功能，這意味著可能在單個(gè).so文件中包含多個(gè)DNNL Runtime模塊。
然后，在L362中派生CSourceModuleCodegenBase實(shí)施。而負(fù)責(zé)其它模塊級(jí)過(guò)程，如序列化的，只需要實(shí)現(xiàn)在所述DNNL代碼生成函數(shù)（L389）：DNNLModuleCodegenCSourceModuleCodegenBaseCreateCSourceModule
runtime::Module CreateCSourceModule(const ObjectRef& ref) override {
// Include headers
// …skip…
code_stream_ << “#include <dnnl/dnnl_kernel.h>\n”;
// …skip…

// "ref" should be the paritioned Relay function with kCompiler=dnnl.
CHECK(ref->IsInstance<FunctionNode>());
auto res = GenDNNLFunc(Downcast<Function>(ref));// "code" is the generated C code with DNNL APIs.
std::string code = code_stream_.str();// "res" is a tuple of constant weights (symbols, values).
// All constant tensors will be serialzied along with the generated C code
// when export_library is invoked.
String sym = std::get<0>(res);
Array<String> variables = std::get<1>(res);// Create a CSource module with all above artifacts.
const auto* pf = runtime::Registry::Get("runtime.CSourceModuleCreate");
CHECK(pf != nullptr) << "Cannot find csource module to create the external runtime module";
return (*pf)(code, "c", sym, variables);

}
接下來(lái)，實(shí)現(xiàn)GenDNNLFunc（L365）來(lái)使用DNNL API生成可編譯的C代碼，如下所示。參閱嵌入的注釋，以獲取與TVM C源Runtime模塊兼容的功能接口的說(shuō)明。
// The example Relay graph: conv2d -> add -> relu.
#include
#include
#include
#include
#include <tvm/runtime/c_runtime_api.h>
#include <tvm/runtime/container.h>
#include <tvm/runtime/packed_func.h>
#include <dlpack/dlpack.h>
#include <dnnl/dnnl_kernel.h>
using namespace tvm::runtime;
using namespace tvm::runtime::contrib;

// Execute the conv2d->add->relu graph with DNNL.
extern “C” void dnnl_0_(float* dnnl_0_i0, float* dnnl_0_i1,
float* dnnl_0_i2, float* out0) {
// Allocate intermediate buffers.
float* buf_0 = (float*)std::malloc(4 * 4608);
float* buf_1 = (float*)std::malloc(4 * 4608);
float* buf_2 = (float*)std::malloc(4 * 4608);

// Pre-implemented op-based DNNL functions.
dnnl_conv2d(dnnl_0_i0, dnnl_0_i1, buf_0, 1, 32, 14, 14, 32, 1, 0, 0, 3, 3, 1, 1);
dnnl_add(buf_0, dnnl_0_i2, buf_1, 1, 32, 12, 12);
dnnl_relu(buf_1, buf_2, 1, 32, 12, 12);

// Copy the final output to the corresponding buffer.
std::memcpy(out0, buf_2, 4 * 4608);
std::free(buf_0);
std::free(buf_1);
std::free(buf_2);
}

// The wrapper function with all arguments in DLTensor type.
extern “C” int dnnl_0_wrapper_(DLTensor* arg0,
DLTensor* arg1,
DLTensor* arg2,
DLTensor* out0) {

// Cast all DLTensor to primitive type buffers and invoke the above
// execution function.
dnnl_0_(static_cast<float*>(arg0->data),
static_cast<float*>(arg1->data),
static_cast<float*>(arg2->data),
static_cast<float*>(out0->data));
return 0;
}

// The TVM macro to generate TVM runtime compatible function “dnnl_0”
// from our generated “dnnl_0_wrapper_”.
TVM_DLL_EXPORT_TYPED_FUNC(dnnl_0, dnnl_0_wrapper_);
注意，預(yù)先實(shí)現(xiàn)的基于op的DNNL函數(shù)位于src / runtime / contrib / dnnl / dnnl.cc中。
由于本文中的其余實(shí)現(xiàn)src/relay/backend/contrib/dnnl/codegen.cc都過(guò)于DNNL而無(wú)法在本文中進(jìn)行詳細(xì)介紹。主要思想是實(shí)現(xiàn)一個(gè)Relay圖訪問(wèn)者（L138），訪問(wèn)給定的Relay函數(shù)并生成上面的C代碼。只要代碼生成器能夠生成與TVM Runtime兼容的C代碼，就可以完全自定義代碼生成器以符合要求。
C源代碼編譯
可能已經(jīng)注意到，輸出的DNNLCompiler是一個(gè)帶有生成的C代碼的文本格式的模塊，該模塊尚未被編譯gcc為可執(zhí)行二進(jìn)制文件。實(shí)際上，生成的C代碼將在用戶調(diào)用時(shí)進(jìn)行編譯export_libray(mod)，如以下代碼片段所示：
def update_lib(lib):
# Include the path of src/runtime/contrib/dnnl/dnnl.cc
test_dir = os.path.dirname(os.path.realpath(os.path.expanduser(file)))
source_dir = os.path.join(test_dir, “…”, “…”, “…”)
contrib_path = os.path.join(source_dir, “src”, “runtime”, “contrib”)

# Setup the gcc flag to compile DNNL code.
kwargs = {}
kwargs["options"] = ["-O2", "-std=c++14", "-I" + contrib_path]
tmp_path = util.tempdir()
lib_name = 'lib.so'
lib_path = tmp_path.relpath(lib_name)# The generated C code with DNNL APIs is compiled to a binary lib.so.
lib.export_library(lib_path, fcompile=False, **kwargs)# Load the lib.so back to a runtime module.
lib = runtime.load_module(lib_path)
return lib

with tvm.transform.PassContext(opt_level=3):
json, lib, param = relay.build(mod, target=target, params=params)
lib = update_lib(lib)
rt_mod = tvm.contrib.graph_runtime.create(json, lib, ctx)
將DNNL引入TVM：使用DNNL Codegen / Runtime構(gòu)建TVM
最后，在構(gòu)建TVM時(shí)創(chuàng)建cmake / modules / contrib / DNNL.cmake，包含DNNL代碼源。出于演示目的，DNNL代碼生成器在同一cmake文件中具有兩個(gè)實(shí)現(xiàn)。只能根據(jù)需要專注于其中之一。
在準(zhǔn)備好cmake文件之后，現(xiàn)在用戶可以set(USE_DNNL_CODEGEN ON)在其中指定build/config.cmake啟用DNNL代碼生成。

總結(jié)

以上是生活随笔為你收集整理的TVM代码生成codegen的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

生活经验

TVM代码生成codegen

總結(jié)