使用Runtime执行推理(C++)
使用Runtime執行推理(C++)
概述
通過MindSpore Lite模型轉換后,需在Runtime中完成模型的推理執行流程。本教程介紹如何使用C++接口編寫推理代碼。
Runtime總體使用流程如下圖所示:
包含的組件及功能如下所述:
? Model:MindSpore Lite使用的模型,通過用戶構圖或直接加載網絡,來實例化算子原型的列表。
? Lite Session:提供圖編譯的功能,并調用圖執行器進行推理。
? Scheduler:算子異構調度器,根據異構調度策略,為每一個算子選擇合適的kernel,構造kernel list,并切分子圖。
? Executor:圖執行器,執行kernel list,動態分配和釋放Tensor。
? Operator:算子原型,包含算子的屬性,以及shape、data type和format的推導方法。
? Kernel:算子庫提供算子的具體實現,提供算子forward的能力。
? Tensor:MindSpore Lite使用的Tensor,提供了Tensor內存操作的功能和接口。
更多C++ API說明,請參考 API文檔。
讀取模型
在MindSpore Lite中,模型文件是從模型轉換工具轉換得到的.ms文件。進行模型推理時,需要從文件系統加載模型,并進行模型解析,這部分操作主要在Model中實現。Model持有權重數據、算子屬性等模型數據。
模型通過Model類的靜態Import方法從內存數據中創建。函數返回的Model實例是一個指針,通過new創建,不再需要時,需要用戶通過delete釋放。
如果對運行時內存有較大的限制,可以在Model被圖編譯以后,使用Free接口來降低內存占用。但一旦調用了某個Model的Free接口,該Model就不能再進行圖編譯了。
創建會話
使用MindSpore Lite執行推理時,LiteSession是推理的主入口,通過LiteSession我們可以進行圖編譯、圖執行。
創建上下文
上下文會保存會話所需的一些基本配置參數,用于指導圖編譯和圖執行,其定義如下:
MindSpore Lite支持異構推理,推理時的后端配置信息由Context中的device_list_指定,默認存放CPU的DeviceContext。在進行圖編譯時,會根據device_list_中不同的后端配置信息進行算子選型調度。目前僅支持兩種異構,CPU和GPU異構或者CPU和NPU異構。 當配置GPU的DeviceContext時,優先使用GPU推理;當配置NPU的DeviceContext時,優先使用NPU推理。
device_list_[0]必須是CPU的DeviceContext, device_list_[1]是GPU的DeviceContext或者NPU的DeviceContext。暫時不支持同時設置CPU, GPU和NPU三個DeviceContext。
MindSpore Lite內置一個進程共享的線程池,推理時通過thread_num_指定線程池的最大線程數,默認為2線程,推薦最多不超過4個線程,否則可能會影響性能。
MindSpore Lite支持動態內存分配和釋放,如果沒有指定allocator,推理時會生成一個默認的allocator,也可以通過Context方法在多個Context中共享內存分配器。
如果用戶通過new創建Context,不再需要時,需要用戶通過delete釋放。一般在創建完LiteSession后,Context即可釋放。
創建會話
有兩種方式可以創建會話:
? 第一種方法是使用上一步創建得到的Context,調用LiteSession的靜態static LiteSession *CreateSession(const lite::Context *context)方法來創建LiteSession。函數返回的LiteSession實例是一個指針,通過new創建,不再需要時,需要用戶通過delete釋放。
? 第二種方法是使用上一步創建得到的Context,以及已經從文件讀入的模型buffer和buffer的size,通過調用LiteSession的靜態static LiteSession *CreateSession(const char *model_buf, size_t size, const lite::Context *context)方法來創建LiteSession。函數返回的LiteSession實例是一個指針,通過new創建,不再需要時,需要用戶通過delete釋放。
第二種方法中使用的CreateSession接口是一個簡化流程的接口,使用這個接口可以簡化調用流程。該接口的功能實現了三個接口的功能:單入參的CreateSession接口,Import接口和CompileGraph接口。
使用示例
下面示例代碼演示了Context的創建,以及在兩個LiteSession間共享內存池的功能:
auto context = new (std::nothrow) lite::Context;
if (context == nullptr) {
MS_LOG(ERROR) << “New context failed while running %s”, modelName.c_str();
return RET_ERROR;
}
// CPU device context has default values.
auto &cpu_decice_info = context->device_list_[0].device_info_.cpu_device_info_;
// The large core takes priority in thread and core binding methods. This parameter will work in the BindThread interface. For specific binding effect, see the “Run Graph” section.
cpu_decice_info.cpu_bind_mode_ = HIGHER_CPU;
// If GPU device context is set. The preferred backend is GPU, which means, if there is a GPU operator, it will run on the GPU first, otherwise it will run on the CPU.
DeviceContext gpu_device_ctx{DT_GPU, {false}};
// The GPU device context needs to be push_back into device_list to work.
context->device_list_.push_back(gpu_device_ctx);
// Configure the number of worker threads in the thread pool to 2, including the main thread.
context->thread_num_ = 2;
// Allocators can be shared across multiple Contexts.
auto *context2 = new Context();
context2->thread_num_ = context->thread_num_;
context2->allocator = context->allocator;
auto &cpu_decice_info2 = context2->device_list_[0].device_info_.cpu_device_info_;
cpu_decice_info2.cpu_bind_mode_ = cpu_decice_info->cpu_bind_mode_;
// Use Context to create Session.
auto session1 = session::LiteSession::CreateSession(context);
// After the LiteSession is created, the Context can be released.
delete (context);
if (session1 == nullptr) {
MS_LOG(ERROR) << “CreateSession failed while running %s”, modelName.c_str();
return RET_ERROR;
}
// session1 and session2 can share one memory pool.
// Assume we have read a buffer from a model file named model_buf, and the size of buffer named model_buf_size
// Use Context、model_buf and model_buf_size to create Session.
auto session2 = session::LiteSession::CreateSession(model_buf, model_buf_size, context2);
// After the LiteSession is created, the Context can be released.
delete (context2);
if (session2 == nullptr) {
MS_LOG(ERROR) << “CreateSession failed while running %s”, modelName.c_str();
return RET_ERROR;
}
圖編譯
可變維度
使用MindSpore Lite進行推理時,在已完成會話創建與圖編譯之后,如果需要對輸入的shape進行Resize,則可以通過對輸入的tensor重新設置shape,然后調用LiteSession的Resize接口。
某些網絡是不支持可變維度,會提示錯誤信息后異常退出,比如,模型中有MatMul算子,并且MatMul的一個輸入Tensor是權重,另一個輸入Tensor是輸入時,調用可變維度接口會導致輸入Tensor和權重Tensor的Shape不匹配,最終導致推理失敗。
使用示例
下面代碼演示如何對MindSpore Lite的輸入進行Resize:
// Assume we have created a LiteSession instance named session.
auto inputs = session->GetInputs();
std::vector resize_shape = {1, 128, 128, 3};
// Assume the model has only one input,resize input shape to [1, 128, 128, 3]
std::vector<std::vector> new_shapes;
new_shapes.push_back(resize_shape);
session->Resize(inputs, new_shapes);
圖編譯
在圖執行前,需要調用LiteSession的CompileGraph接口進行圖編譯,進一步解析從文件中加載的Model實例,主要進行子圖切分、算子選型調度。這部分會耗費較多時間,所以建議LiteSession創建一次,編譯一次,多次執行。
/// \brief Compile MindSpore Lite model.
///
/// \note CompileGraph should be called before RunGraph.
///
/// \param[in] model Define the model to be compiled.
///
/// \return STATUS as an error code of compiling graph, STATUS is defined in errorcode.h.
virtual int CompileGraph(lite::Model *model) = 0;
使用示例
下面代碼演示如何進行圖編譯:
// Assume we have created a LiteSession instance named session and a Model instance named model before.
// The methods of creating model and session can refer to “Import Model” and “Create Session” two sections.
auto ret = session->CompileGraph(model);
if (ret != RET_OK) {
std::cerr << “CompileGraph failed” << std::endl;
// session and model need to be released by users manually.
delete (session);
delete (model);
return ret;
}
model->Free();
輸入數據
獲取輸入Tensor
在圖執行前,需要將輸入數據拷貝到模型的輸入Tensor。
MindSpore Lite提供兩種方法來獲取模型的輸入Tensor。
- 使用GetInputsByTensorName方法,根據模型輸入Tensor的名稱來獲取模型輸入Tensor中連接到輸入節點的Tensor。
- /// \brief Get input MindSpore Lite MSTensors of model by tensor name.
- ///
- /// \param[in] tensor_name Define tensor name.
- ///
- /// \return MindSpore Lite MSTensor.
- virtual mindspore::tensor::MSTensor *GetInputsByTensorName(const std::string &tensor_name) const = 0;
- 使用GetInputs方法,直接獲取所有的模型輸入Tensor的vector。
- /// \brief Get input MindSpore Lite MSTensors of model.
- ///
- /// \return The vector of MindSpore Lite MSTensor.
- virtual std::vector<tensor::MSTensor *> GetInputs() const = 0;
數據拷貝
當獲取到模型的輸入,就需要向Tensor中填入數據。通過MSTensor的Size方法來獲取Tensor應該填入的數據大小,通過data_type方法來獲取Tensor的數據類型,通過MSTensor的MutableData方法來獲取可寫的指針。
/// \brief Get byte size of data in MSTensor.
///
/// \return Byte size of data in MSTensor.
virtual size_t Size() const = 0;
/// \brief Get the pointer of data in MSTensor.
///
/// \note The data pointer can be used to both write and read data in MSTensor.
///
/// \return The pointer points to data in MSTensor.
virtual void *MutableData() const = 0;
使用示例
下面示例代碼演示了從LiteSession中獲取整圖輸入MSTensor,并且向其中灌入模型輸入數據的過程:
// Assume we have created a LiteSession instance named session.
auto inputs = session->GetInputs();
// Assume that the model has only one input tensor.
auto in_tensor = inputs.front();
if (in_tensor == nullptr) {
std::cerr << “Input tensor is nullptr” << std::endl;
return -1;
}
// It is omitted that users have read the model input file and generated a section of memory buffer: input_buf, as well as the byte size of input_buf: data_size.
if (in_tensor->Size() != data_size) {
std::cerr << “Input data size is not suit for model input” << std::endl;
return -1;
}
auto *in_data = in_tensor->MutableData();
if (in_data == nullptr) {
std::cerr << “Data of in_tensor is nullptr” << std::endl;
return -1;
}
memcpy(in_data, input_buf, data_size);
// Users need to free input_buf.
// The elements in the inputs are managed by MindSpore Lite so that users do not need to free inputs.
需要注意的是:
? MindSpore Lite的模型輸入Tensor中的數據排布必須是NHWC。
? 模型的輸入input_buf是用戶從磁盤讀取的,當拷貝給模型輸入Tensor以后,用戶需要自行釋放input_buf。
? GetInputs和GetInputsByTensorName方法返回的vector不需要用戶釋放。
圖執行
執行會話
MindSpore Lite會話在進行圖編譯以后,即可使用LiteSession的RunGraph進行模型推理。
virtual int RunGraph(const KernelCallBack &before = nullptr, const KernelCallBack &after = nullptr) = 0;
綁核
MindSpore Lite內置線程池支持綁核、解綁操作,通過調用BindThread接口,可以將線程池中的工作線程綁定到指定CPU核,用于性能分析。綁核操作與創建LiteSession時用戶指定的上下文有關,綁核操作會根據上下文中的綁核策略進行線程與CPU的親和性設置。
/// \brief Attempt to bind or unbind threads in the thread pool to or from the specified cpu core.
///
/// \param[in] if_bind Define whether to bind or unbind threads.
virtual void BindThread(bool if_bind) = 0;
需要注意的是,綁核是一個親和性操作,不保證一定能綁定到指定的CPU核,會受到系統調度的影響。而且綁核后,需要在執行完代碼后進行解綁操作。示例如下:
// Assume we have created a LiteSession instance named session.
session->BindThread(true);
auto ret = session->RunGraph();
if (ret != mindspore::lite::RET_OK) {
std::cerr << “RunGraph failed” << std::endl;
delete session;
return -1;
}
session->BindThread(false);
綁核參數有兩種選擇:大核優先和中核優先。
判定大核和中核的規則其實是根據CPU核的頻率而不是根據CPU的架構,對于沒有大中小核之分的CPU架構,在該規則下也可以區分大核和中核。
綁定大核優先是指線程池中的線程從頻率最高的核開始綁定,第一個線程綁定在頻率最高的核上,第二個線程綁定在頻率第二高的核上,以此類推。
對于中核優先,中核的定義是根據經驗來定義的,默認設定中核是第三和第四高頻率的核,當綁定策略為中核優先時,會優先綁定到中核上,當中核不夠用時,會往小核上進行綁定。
回調運行
MindSpore Lite可以在調用RunGraph時,傳入兩個KernelCallBack函數指針來回調推理模型,相比于一般的圖執行,回調運行可以在運行過程中獲取額外的信息,幫助開發者進行性能分析、Bug調試等。額外的信息包括:
? 當前運行的節點名稱
? 推理當前節點前的輸入輸出Tensor
? 推理當前節點后的輸入輸出Tensor
/// \brief callbackParam defines input arguments for callback function.
struct CallBackParam {
std::string name_callback_param; /< node name argument */
std::string type_callback_param; /< node type argument */
};
/// \brief Kernelcallback defines the function pointer for callback.
using KernelCallBack = std::function<bool(std::vector<tensor::MSTensor *> inputs, std::vector<tensor::MSTensor *> outputs, const CallBackParam &opInfo)>;
使用示例
下面示例代碼演示了使用LiteSession進行圖編譯,并定義了兩個回調函數作為前置回調指針和后置回調指針,傳入到RunGraph接口進行回調推理,并演示了一次圖編譯,多次圖執行的使用場景:
// Assume we have created a LiteSession instance named session and a Model instance named model before.
// The methods of creating model and session can refer to “Import Model” and “Create Session” two sections.
auto ret = session->CompileGraph(model);
if (ret != RET_OK) {
std::cerr << “CompileGraph failed” << std::endl;
// session and model need to be released by users manually.
delete (session);
delete (model);
return ret;
}
// Copy input data into the input tensor. Users can refer to the “Input Data” section. We uses random data here.
auto inputs = session->GetInputs();
for (auto in_tensor : inputs) {
in_tensor = inputs.front();
if (in_tensor == nullptr) {
std::cerr << “Input tensor is nullptr” << std::endl;
return -1;
}
// When calling the MutableData method, if the data in MSTensor is not allocated, it will be malloced. After allocation, the data in MSTensor can be considered as random data.
(void) in_tensor->MutableData();
}
// Definition of callback function before forwarding operator.
auto before_call_back_ = [&](const std::vector<mindspore::tensor::MSTensor *> &before_inputs,
const std::vector<mindspore::tensor::MSTensor *> &before_outputs,
const session::CallBackParam &call_param) {
std::cout << "Before forwarding " << call_param.name_callback_param << std::endl;
return true;
};
// Definition of callback function after forwarding operator.
auto after_call_back_ = [&](const std::vector<mindspore::tensor::MSTensor *> &after_inputs,
const std::vector<mindspore::tensor::MSTensor *> &after_outputs,
const session::CallBackParam &call_param) {
std::cout << "After forwarding " << call_param.name_callback_param << std::endl;
return true;
};
// Call the callback function when performing the model inference process.
ret = session_->RunGraph(before_call_back_, after_call_back_);
if (ret != RET_OK) {
MS_LOG(ERROR) << “Run graph failed.”;
return RET_ERROR;
}
// CompileGraph would cost much time, a better solution is calling CompileGraph only once and RunGraph much more times.
for (size_t i = 0; i < 10; i++) {
auto ret = session_->RunGraph();
if (ret != RET_OK) {
MS_LOG(ERROR) << “Run graph failed.”;
return RET_ERROR;
}
}
// session and model needs to be released by users manually.
delete (session);
delete (model);
獲取輸出
獲取輸出Tensor
MindSpore Lite在執行完推理后,就可以獲取模型的推理結果。
MindSpore Lite提供四種方法來獲取模型的輸出MSTensor。
- 使用GetOutputsByNodeName方法,根據模型輸出節點的名稱來獲取模型輸出MSTensor中連接到該節點的Tensor的vector。
- /// \brief Get output MindSpore Lite MSTensors of model by node name.
- ///
- /// \param[in] node_name Define node name.
- ///
- /// \return The vector of MindSpore Lite MSTensor.
- virtual std::vector<tensor::MSTensor *> GetOutputsByNodeName(const std::string &node_name) const = 0;
- 使用GetOutputByTensorName方法,根據模型輸出Tensor的名稱來獲取對應的模型輸出MSTensor。
- /// \brief Get output MindSpore Lite MSTensors of model by tensor name.
- ///
- /// \param[in] tensor_name Define tensor name.
- ///
- /// \return Pointer of MindSpore Lite MSTensor.
- virtual mindspore::tensor::MSTensor *GetOutputByTensorName(const std::string &tensor_name) const = 0;
- 使用GetOutputs方法,直接獲取所有的模型輸出MSTensor的名稱和MSTensor指針的一個map。
- /// \brief Get output MindSpore Lite MSTensors of model mapped by tensor name.
- ///
- /// \return The map of output tensor name and MindSpore Lite MSTensor.
- virtual std::unordered_map<std::string, mindspore::tensor::MSTensor *> GetOutputs() const = 0;
當獲取到模型的輸出Tensor,就需要向Tensor中填入數據。通過MSTensor的Size方法來獲取Tensor應該填入的數據大小,通過data_type方法來獲取Tensor的數據類型,通過MSTensor的MutableData方法來獲取可讀寫的內存指針。
/// \brief Get byte size of data in MSTensor.
///
/// \return Byte size of data in MSTensor.
virtual size_t Size() const = 0;
/// \brief Get data type of the MindSpore Lite MSTensor.
///
/// \note TypeId is defined in mindspore/mindspore/core/ir/dtype/type_id.h. Only number types in TypeId enum are
/// suitable for MSTensor.
///
/// \return MindSpore Lite TypeId of the MindSpore Lite MSTensor.
virtual TypeId data_type() const = 0;
/// \brief Get the pointer of data in MSTensor.
///
/// \note The data pointer can be used to both write and read data in MSTensor.
///
/// \return The pointer points to data in MSTensor.
virtual void *MutableData() const = 0;
使用示例
下面示例代碼演示了使用GetOutputs接口獲取輸出MSTensor,并打印了每個輸出MSTensor的前十個數據或所有數據:
// Assume we have created a LiteSession instance named session before.
auto output_map = session->GetOutputs();
// Assume that the model has only one output node.
auto out_node_iter = output_map.begin();
std::string name = out_node_iter->first;
// Assume that the unique output node has only one output tensor.
auto out_tensor = out_node_iter->second;
if (out_tensor == nullptr) {
std::cerr << “Output tensor is nullptr” << std::endl;
return -1;
}
// Assume that the data format of output data is float 32.
if (out_tensor->data_type() != mindspore::TypeId::kNumberTypeFloat32) {
std::cerr << “Output of lenet should in float32” << std::endl;
return -1;
}
auto *out_data = reinterpret_cast<float *>(out_tensor->MutableData());
if (out_data == nullptr) {
std::cerr << “Data of out_tensor is nullptr” << std::endl;
return -1;
}
// Print the first 10 float data or all output data of the output tensor.
std::cout << "Output data: ";
for (size_t i = 0; i < 10 && i < out_tensor->ElementsNum(); i++) {
std::cout << " " << out_data[i];
}
std::cout << std::endl;
// The elements in outputs do not need to be free by users, because outputs are managed by the MindSpore Lite.
需要注意的是,GetOutputsByNodeName、GetOutputByTensorName和GetOutputs方法返回的vector或map不需要用戶釋放。
下面示例代碼演示了使用GetOutputsByNodeName接口獲取輸出MSTensor的方法:
// Assume we have created a LiteSession instance named session before.
// Assume that model has a output node named output_node_name_0.
auto output_vec = session->GetOutputsByNodeName(“output_node_name_0”);
// Assume that output node named output_node_name_0 has only one output tensor.
auto out_tensor = output_vec.front();
if (out_tensor == nullptr) {
std::cerr << “Output tensor is nullptr” << std::endl;
return -1;
}
下面示例代碼演示了使用GetOutputByTensorName接口獲取輸出MSTensor的方法:
// Assume we have created a LiteSession instance named session.
// We can use GetOutputTensorNames method to get all name of output tensor of model which is in order.
auto tensor_names = session->GetOutputTensorNames();
// Assume we have created a LiteSession instance named session before.
// Use output tensor name returned by GetOutputTensorNames as key
for (auto tensor_name : tensor_names) {
auto out_tensor = session->GetOutputByTensorName(tensor_name);
if (out_tensor == nullptr) {
std::cerr << “Output tensor is nullptr” << std::endl;
return -1;
}
}
獲取版本號
MindSpore Lite提供了Version方法可以獲取版本號,包含在include/version.h頭文件中,調用該方法可以得到版本號字符串。
使用示例
下面代碼演示如何獲取MindSpore Lite的版本號:
#include “include/version.h”
std::string version = mindspore::lite::Version();
Session并行
MindSpore Lite支持多個LiteSession并行推理,但不支持多個線程同時調用單個LiteSession的RunGraph接口。
單Session并行
MindSpore Lite不支持多線程并行執行單個LiteSession的推理,否則會得到以下錯誤信息:
ERROR [mindspore/lite/src/lite_session.cc:297] RunGraph] 10 Not support multi-threading
多Session并行
MindSpore Lite支持多個LiteSession同時進行推理的場景,每個LiteSession的線程池和內存池都是獨立的。
使用示例
下面代碼演示了如何創建多個LiteSession,并且并行執行推理的過程:
#include
#include “src/common/file_utils.h”
#include “include/model.h”
#include “include/version.h”
#include “include/context.h”
#include “include/lite_session.h”
mindspore::session::LiteSession *GenerateSession(mindspore::lite::Model *model) {
if (model == nullptr) {
std::cerr << “Read model file failed while running” << std::endl;
return nullptr;
}
auto context = new (std::nothrow) mindspore::lite::Context;
if (context == nullptr) {
std::cerr << “New context failed while running” << std::endl;
return nullptr;
}
auto session = mindspore::session::LiteSession::CreateSession(context);
delete (context);
if (session == nullptr) {
std::cerr << “CreateSession failed while running” << std::endl;
return nullptr;
}
auto ret = session->CompileGraph(model);
if (ret != mindspore::lite::RET_OK) {
std::cout << “CompileGraph failed while running” << std::endl;
delete (session);
return nullptr;
}
auto msInputs = session->GetInputs();
for (auto msInput : msInputs) {
(void)msInput->MutableData();
}
return session;
}
int main(int argc, const char **argv) {
size_t size = 0;
char *graphBuf = mindspore::lite::ReadFile(“test.ms”, &size);
if (graphBuf == nullptr) {
std::cerr << “Read model file failed while running” << std::endl;
return -1;
}
auto model = mindspore::lite::Model::Import(graphBuf, size);
if (model == nullptr) {
std::cerr << “Import model file failed while running” << std::endl;
delete;
return -1;
}
delete;
auto session1 = GenerateSession(model);
if (session1 == nullptr) {
std::cerr << “Generate session 1 failed” << std::endl;
delete(model);
return -1;
}
auto session2 = GenerateSession(model);
if (session2 == nullptr) {
std::cerr << “Generate session 2 failed” << std::endl;
delete(model);
return -1;
}
model->Free();
std::thread thread1(&{
auto status = session1->RunGraph();
if (status != 0) {
std::cerr << "Inference error " << status << std::endl;
return;
}
std::cout << “Session1 inference success” << std::endl;
});
std::thread thread2(&{
auto status = session2->RunGraph();
if (status != 0) {
std::cerr << "Inference error " << status << std::endl;
return;
}
std::cout << “Session2 inference success” << std::endl;
});
thread1.join();
thread2.join();
delete (session1);
delete (session2);
delete (model);
return 0;
}
總結
以上是生活随笔為你收集整理的使用Runtime执行推理(C++)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: HiLink LiteOS IoT芯
- 下一篇: Ascend昇腾计算