c语言4x4矩形转置,最快的转置4x4字节矩阵的方法。
7
Let me rephrase your question: you're asking for a C- or C++-only solution that is portable. Then:
讓我重新解釋一下你的問題:你要求的是一個可移植的C或c++的解決方案。然后:
void transpose(uint32_t const in[4], uint32_t out[4]) {
// A B C D A E I M
// E F G H B F J N
// I J K L C G K O
// M N O P D H L P
out[0] = in[0] & 0xFF000000U; // A . . .
out[1] = in[1] & 0x00FF0000U; // . F . .
out[2] = in[2] & 0x0000FF00U; // . . K .
out[3] = in[3] & 0x000000FFU; // . . . P
out[1] |= (in[0] << 8) & 0xFF000000U; // B F . .
out[2] |= (in[0] << 16) & 0xFF000000U; // C . K .
out[3] |= (in[0] << 24); // D . . P
out[0] |= (in[1] >> 8) & 0x00FF0000U; // A E . .
out[2] |= (in[1] << 8) & 0x00FF0000U; // C G K .
out[3] |= (in[1] << 16) & 0x00FF0000U; // D H . P
out[0] |= (in[2] >> 16) & 0x0000FF00U; // A E I .
out[1] |= (in[2] >> 8) & 0x0000FF00U; // B F J .
out[3] |= (in[2] << 8) & 0x0000FF00U; // D H L P
out[0] |= (in[3] >> 24); // A E I M
out[1] |= (in[3] >> 8) & 0x000000FFU; // B F J N
out[2] |= (in[3] << 8) & 0x000000FFU; // C G K O
}
I don't see how it could be answered any other way, since then you'd be depending on a particular compiler compiling it in a particular way, etc.
我看不出它是怎么回答的,因為你會依賴于特定的編譯器以特定的方式編譯它,等等。
Of course if those manipulations themselves can be somehow simplified, it'd help. So that's the only avenue of further pursuit here. Nothing stands out so far, but then it's been a long day for me.
當然,如果這些操作本身可以被簡化,它會有所幫助。所以這是唯一的進一步追求的途徑。到目前為止,一切都還不明朗,但對我來說,這是漫長的一天。
So far, the cost is 12 shifts, 12 ORs, 16 ANDs. If the compiler and platform are any good, it can be done in 9 32 bit registers.
到目前為止,成本是12個班,12個,16個。如果編譯器和平臺是好的,可以在9 32位寄存器中完成。
If the compiler is very sad, or the platform doesn't have a barrel shifter, then some casting could help extol the fact that the shifts and masks are just byte extractions:
如果編譯器很悲傷,或者平臺沒有一個桶移器,那么一些轉換可以幫助說明轉換和掩碼只是字節提取的事實:
void transpose(uint8_t const in[16], uint8_t out[16]) {
// A B C D A E I M
// E F G H B F J N
// I J K L C G K O
// M N O P D H L P
out[0] = in[0]; // A . . .
out[1] = in[4]; // A E . .
out[2] = in[8]; // A E I .
out[3] = in[12]; // A E I M
out[4] = in[1]; // B . . .
out[5] = in[5]; // B F . .
out[6] = in[9]; // B F J .
out[7] = in[13]; // B F J N
out[8] = in[2]; // C . . .
out[9] = in[6]; // C G . .
out[10] = in[10]; // C G K .
out[11] = in[14]; // C G K O
out[12] = in[3]; // D . . .
out[13] = in[7]; // D H . .
out[14] = in[11]; // D H L .
out[15] = in[15]; // D H L P
}
If you really want to shuffle it in-place, then the following would do.
如果你真的想把它放在合適的位置,那么下面的就可以了。
void transpose(uint8_t m[16]) {
std::swap(m[1], m[4]);
std::swap(m[2], m[8]);
std::swap(m[3], m[12]);
std::swap(m[6], m[9]);
std::swap(m[7], m[13]);
std::swap(m[11], m[14]);
}
The byte-oriented versions may well produce worse code on modern platforms. Only a benchmark can tell.
面向字節的版本可能會在現代平臺上產生更糟糕的代碼。只有一個基準可以說明。
總結
以上是生活随笔為你收集整理的c语言4x4矩形转置,最快的转置4x4字节矩阵的方法。的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: c语言while退出无限循环,请教:为什
- 下一篇: w ndows10应用商店游,来了,微软