ac狗显卡太超模了
2026-05-24 11:07:14
发布于:广东
以下代码由ai生成:
#include <bits/stdc++.h>
using namespace std;
int main() {
const int TOTAL_SIZE = 100000000;
const int CHUNK_SIZE = 524288;
const int LOOPS = TOTAL_SIZE / CHUNK_SIZE;
vector<float> a(CHUNK_SIZE, 1.0f);
vector<float> b(CHUNK_SIZE, 2.0f);
vector<float> c(CHUNK_SIZE, 0.0f);
float* pA = a.data();
float* pB = b.data();
float* pC = c.data();
auto start = chrono::high_resolution_clock::now();
float final_check = 0.0f;
for (int i = 0; i < LOOPS; ++i) {
#pragma omp target teams distribute parallel for num_teams(512) thread_limit(1024) \
map(to: pA[0:CHUNK_SIZE], pB[0:CHUNK_SIZE]) map(from: pC[0:CHUNK_SIZE])
for (int j = 0; j < CHUNK_SIZE; ++j) {
pC[j] = pA[j] + pB[j];
}
final_check += pC[0];
}
auto end = chrono::high_resolution_clock::now();
chrono::duration<double, std::milli> elapsed = end - start;
cout << elapsed.count() << "ms";
return 0;
}
上面代码是用ac狗的显卡跑1e8次加法运算
(编译了好久)
输出结果是30ms上下
也就是说如果榨干ac狗评测机的显卡,执行1e8次运算只需30ms(cpu≈100ms)
这里空空如也















有帮助,赞一个