當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

Ascend Pytorch算子适配层开发

發布時間：2023/11/28 生活经验 36 豆豆

生活随笔收集整理的這篇文章主要介紹了 Ascend Pytorch算子适配层开发小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Ascend Pytorch算子適配層開發
適配方法
找到和PyTorch算子功能對應的NPU TBE算子，根據算子功能計算出輸出Tensor的size，再根據TBE算子原型構造對應的input/output/attr，傳遞給ACL完成TBE算子的執行。
說明：
TBE算子實現的源文件存放路徑由開發套件包Toolkit的安裝方式決定：
? 若使用root用戶安裝，則存放在：/usr/local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe/impl/
? 若使用非root用戶安裝，則存放在：~/.local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe/impl/
開發者可以通過查看算子實現源文件，確定算子的功能。
存放路徑和命名格式
對NPU的TBE算子適配文件保存在pytorch/aten/src/ATen/native/npu目錄下，命名風格采用大駝峰，命名格式：<算子名> + .cpp，如：AddKernelNpu.cpp。
適配步驟
須知：
適配代碼基于C++開發。

引入依賴頭文件。
#include “ATen/native/npu/utils/CalcuOpUtil.h”
#include “ATen/native/npu/utils/KernelNpuOutputSize.h”
#include “ATen/native/npu/utils/NpuUtils.h”
說明：
"CalcuOpUtil.h"文件中主要包含與ACL接口相關的函數。
"KernelNpuOutputSize.h"中主要包含算子輸出shape的推導函數。
"NpuUtils.h"文件中主要包含公共能力的函數。
定義Add算子適配主體函數。
結合native_functions.yaml 中 add算子的分發定義，算子適配中應包含如下函數：
o add_npu_input 構造輸入的NPUTensorDesc對象
o add_npu_output 構造輸出的NPUTensorDesc對象
o add_npu_attr 構造NPU TBE Add算子attr屬性
o add_out_npu 算子適配函數（yaml中npu派發函數，支持傳入輸出tensor），other參數支持 Tensor & Scalar
o add_npu 算子適配函數(yaml中npu派發函數)，other參數支持 Tensor & Scalar
實現函數 add_npu_input。
將NPU適配函數(add_npu_input)的輸入構造成NPUTensorDesc對象。
// 輸入參數為"self": “Tensor"和"other”: "Tensor"時，適配函數add_npu_input的實現
SmallVector<NPUTensorDesc, N> add_npu_input(const Tensor& self,const Tensor& other) {
bool isSelfWrapped = CalcuOpUtil::is_scalar_wrapped_to_tensor(self);
bool isOtherWrapped = CalcuOpUtil::is_scalar_wrapped_to_tensor(other);
auto inputs = CalcuOpUtil::create_npu_input_tensor_desc({self, other});

// ‘t + 2’ to work with any type of tensor, not just LongTensor (which is what
// integersin Python represent).
if (isSelfWrapped && (!isOtherWrapped)) {
inputs[0].scalarType = other.scalar_type();
} else if (isOtherWrapped && (!isSelfWrapped)) {
inputs[1].scalarType = self.scalar_type();
}

return inputs;
}
// 輸入參數為"self": “Tensor"和"other”: "Scalar"時，適配函數add_npu_input的實現
SmallVector<NPUTensorDesc, N> add_npu_input(const Tensor& self,const Scalar& other) {
return CalcuOpUtil::create_npu_input_tensor_desc({self});
}
實現函數 add_npu_output。
將函數 add_npu_output的輸出tensor對象構造成NPUTensorDesc對象。
// 輸出參數為 “Tensor” 時，適配函數add_npu_output的實現
SmallVector<NPUTensorDesc, N> add_npu_output(const Tensor& result) {
return CalcuOpUtil::create_npu_output_tensor_desc({result});
}
說明：
一般來說，算子的輸出不需要特殊處理，直接調用CreateNpuOutputTensorDesc即可。
實現函數 add_npu_attr。
根據NPU TBE算子原型中所需的attr規格，將參數適配成NPU TBE算子原型所需要的attr屬性。
// 輸入參數為"other": “Tensor"和"alpha”: “Scalar"時，對應的適配函數add_npu_attr實現
SmallVector<NPUAttrDesc, N> add_npu_attr(const Tensor& self, const Tensor& other, Scalar alpha) {
float value = CalcuOpUtil::get_scalar_float_value(alpha);
NPUAttrDesc npuAttrScalar = NPUAttrDesc(“alpha”, value);
SmallVector<NPUAttrDesc, N> attrs = {npuAttrScalar};
return attrs;
}
// 輸入參數為"other”: “Scalar"和"alpha”: "Scalar"時，對應的適配函數adds_npu_attr實現
SmallVector<NPUAttrDesc, N> adds_npu_attr(const Tensor& self,const Scalar& other,const Scalar& alpha) {
float otherValue = CalcuOpUtil::get_scalar_float_value(other);
float alphaValue = CalcuOpUtil::get_scalar_float_value(alpha);
float value = otherValue * alphaValue;
NPUAttrDesc npuAttrValue = NPUAttrDesc(“value”, value);
SmallVector<NPUAttrDesc, N> attrs = {npuAttrValue};
return attrs;
}
實現函數 add_out_npu。
Tensor& add_out_npu(Tensor& result, const Tensor& self, const Tensor& other, Scalar alpha) {

if (other.dim() == 0 && !other.is_npu()) {

    adds_out_npu(result, self, other.item(), alpha);

} else if (self.dim() == 0 && !self.is_npu()) {

    adds_out_npu(result, other, self.item(), alpha);

```
} else {
```

    // constructs the input and output NPUTensorDesc

    auto inputs = add_npu_input(self, other);

    auto outputs = add_npu_output({result});

    // constructs the attr of the NPUAttrDesc

    auto attrs = add_npu_attr(self, other, alpha);

```
    // executing the NPU operator   
```

    CalcuOpUtil::execute_npu_operate("Axpy", inputs, outputs, attrs);

```
}
```
```
return result;
```

}
說明：
add_out_npu和add_npu的差別是add_out_npu支持顯示指定輸出tensor，往輸出tensor中寫入結果。
26. 實現函數 add_npu。
a. 定義并實現算子的shape推導函數，根據輸入參數計算輸出的size。
Shape推導函數定義規范：
“NPU適配函數名稱” + “" + “output” + "” + “size”，如add_npu_output_size()；
說明：
? Shape推導函數定義和實現存放在 pytorch/aten/src/ATen/native/npu/utils，對應的頭文件和實現在 KernelNpuOutPutSize.h 和 KernelNpuOutPutSize.cpp中。
? 在KernelNpuOutPutSize.h中，函數存放位置按照函數名字排序。
//輸入參數為"self": “Tensor"和"other”: "Tensor"時，Shape推導該函數
SmallVector<int64_t, SIZE> add_npu_output_size(const Tensor& self,const Tensor& other) {
return broadcast_ops_npu_output_size(self, other); //定義Shape推導函數
}

// 輸入參數為"self": “Tensor"和"other”: “Scalar"時，Shape推導該函數
IntArrayRef add_npu_output_size(const Tensor& self, const Scalar& other) {
return input_same_output_size(self);
}
說明：
broadcast_ops_npu_output_size函數的作用是：當兩個參數符合PyTorch廣播機制時，函數會將兩個參數自動擴展為相等大小
b. 調用對應的shape推導函數計算輸出的size。
c. 根據輸出的size調用at::empty_with_ format創建輸出Tensor，函數支持指定輸出Tensor的format，默認為NCHW格式。
說明：
當前制定的Format設置規則為重型算子錨點擴散+連續性法則混合規則。
? 重型算子如卷積、Matmul，只支持某種特定format，適配時顯示指定為其需要的format，format向周邊擴散。
? 而連續性法則指的是算子對格式不敏感，算子format指定為與第一個輸入tensor的format相同即可。
? NPU中的卷積只支持NC1HWC0格式，所以需要顯式指定為NC1HWC0格式
d. 將構造好的輸出Tensor和其他參數傳給add_out_npu進行運算
e. // 輸入參數為"self”: “Tensor"和"other”: “Tensor"時，對應的適配函數add_npu實現
f. //調用對應的Shape推導函數計算輸出的size
g. Tensor add_npu(const Tensor& self, const Tensor& other, Scalar alpha) {
h. Tensor outputTensor = add_dest_output(self, other);
i. auto outputSize = add_npu_output_size(self, other);
j.
k. //根據輸出的size調用at::empty_with_format創建輸出Tensor，函數支持指定輸出Tensor的format，默認為NCHW格式
l. Tensor result = at::empty_with_format(outputSize, outputTensor.options(), CalcuOpUtil::get_tensor_npu_format(outputTensor));
m.
n. //將構造好的輸出Tensor和其他參數傳給add_out_npu進行運算
o. add_out_npu(result, self, other, alpha);
p. return result;
q. }
r.
s. // 輸入參數為"self”: “Tensor"和"other”: "Scalar"時，對應的適配函數add_npu實現
t. //調用對應的Shape推導函數計算輸出的size
u. Tensor add_npu(const Tensor& self, Scalar other, Scalar alpha) {
v. auto outputSize = add_npu_output_size(self, other);
w.
x. //根據輸出的size調用at::empty_with_format創建輸出Tensor，函數支持指定輸出Tensor的format，默認為NCHW格式
y. Tensor result = at::empty_with_format(outputSize, self.options(), CalcuOpUtil::get_tensor_npu_format(self));
z.
aa. //將構造好的輸出Tensor和其他參數傳給add_out_npu進行運算
bb. adds_out_npu(result, self, other, alpha);
cc. return result;
}

總結

以上是生活随笔為你收集整理的Ascend Pytorch算子适配层开发的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Ascend昇腾计算
下一篇： Ascend Pytorch算子功能验证

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

生活经验

Ascend Pytorch算子适配层开发

總結