當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

苹果的 Metal 工程

發布時間：2025/3/20 编程问答 15 豆豆

生活随笔收集整理的這篇文章主要介紹了苹果的 Metal 工程小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Basic Buffers

當向頂點著色器傳遞數據過多（大于 4096 字節）時， setVertexBytes:length:atIndex: 方法不允許使用，應該使用 setVertexBytes:length:atIndex: 方法提高性能。
這時，參數應該是 MTLBuffer類型，可以被 GPU 訪問的內存。
_vertexBuffer.contents 方法返回可以被 CPU 訪問的內存接口，即這塊兒內存被 CPU 和 GPU 共享。

Basic Texturing

MTLPixelFormatBGRA8Unorm 的像素格式。

2D 紋理的坐標

Reading a texel is also known as sampling

Hello Compute

data-parallel computations using the GPU.

在 GPU 發展歷史中，并行處理的架構一直沒有變化，而處理核心的可編程特性越來越強。這使得 GPU 從 fixed-function pipeline 轉向 programmable pipeline，也使得通用 GPU 編程 (GPGPU) 變得可行。

一個 MTLComputePipelineState 對象可以直接由一個 kernel function 生成。

// Create a compute kernel function id <MTLFunction> kernelFunction = [defaultLibrary newFunctionWithName:@"grayscaleKernel"]; // Create a compute kernel _computePipelineState = [_device newComputePipelineStateWithFunction:kernelFunction

把圖像分塊并行處理

// Set the compute kernel's thread group size of 16x16_threadgroupSize = MTLSizeMake(16, 16, 1);// Calculate the number of rows and columsn of thread groups given the width of our input image.// Ensure we cover the entire image (or more) so we process every pixel._threadgroupCount.width = (_inputTexture.width + _threadgroupSize.width - 1) / _threadgroupSize.width;_threadgroupCount.height = (_inputTexture.height + _threadgroupSize.height - 1) / _threadgroupSize.height;// Since we're only dealing with a 2D data set, set depth to 1_threadgroupCount.depth = 1;[computeEncoder dispatchThreadgroups:_threadgroupCountthreadsPerThreadgroup:_threadgroupSize];

CPU and GPU Synchronization

CPU 和 GPU 是兩個異步的處理器，但是它們共享緩存，因此需要在并行的同時避免同時讀寫數據。

在上圖中，每一幀中，CPU 和 GPU 不會同時工作，雖然避免了同時讀寫數據，但是降低了性能。

在上圖中，CPU 和 GPU 會同時讀寫相同的數據，引起競爭。

可以用多個緩沖區來達到提高性能和避免數據同時讀寫的問題。CPU 和 GPU 不同時讀寫相同的緩沖區。
當 GPU 執行完 command buffer 后，會調用這個 handler。

[commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> buffer) {dispatch_semaphore_signal(block_sema); }];

LOD with Function Specialization

level of detail (LOD)

細節越逼真，消耗的資源越多。因此要在性能和細節的豐富度之間做權衡。

if(highLOD) {// Render high-quality model } else if(mediumLOD) {// Render medium-quality model } else if(lowLOD) {// Render low-quality model }

但是使用 GPU 寫出上面的代碼的話，性能不高。GPU 可以并行的指令數依賴于為函數分配的寄存器數目。GPU 編譯器需要為函數分配可能用到的最大數目寄存器，即使有些分支永遠不可能執行。因此，分支語句顯著增加了需要的寄存器數目，并顯著降低了 GPU 的并行數目。

轉載于:https://www.cnblogs.com/huahuahu/p/ping-guo-de-Metal-gong-cheng.html

總結

以上是生活随笔為你收集整理的苹果的 Metal 工程的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【iCore4 双核心板_ARM】例程三
下一篇：消费者做出购买决策的流程