ac狗显卡太超模了

A飞行员

2026-05-24 11:07:14

发布于：广东

7阅读

0回复

0点赞

以下代码由ai生成：

#include <bits/stdc++.h>
using namespace std;

int main() {
    const int TOTAL_SIZE = 100000000; 
    const int CHUNK_SIZE = 524288;    
    const int LOOPS = TOTAL_SIZE / CHUNK_SIZE;

    vector<float> a(CHUNK_SIZE, 1.0f);
    vector<float> b(CHUNK_SIZE, 2.0f);
    vector<float> c(CHUNK_SIZE, 0.0f);

    float* pA = a.data();
    float* pB = b.data();
    float* pC = c.data();

    auto start = chrono::high_resolution_clock::now();

    float final_check = 0.0f;
    for (int i = 0; i < LOOPS; ++i) {
        #pragma omp target teams distribute parallel for num_teams(512) thread_limit(1024) \
                    map(to: pA[0:CHUNK_SIZE], pB[0:CHUNK_SIZE]) map(from: pC[0:CHUNK_SIZE])
        for (int j = 0; j < CHUNK_SIZE; ++j) {
            pC[j] = pA[j] + pB[j];
        }
        final_check += pC[0]; 
    }

    auto end = chrono::high_resolution_clock::now();
    chrono::duration<double, std::milli> elapsed = end - start;

    cout << elapsed.count() << "ms";

    return 0;
}

上面代码是用ac狗的显卡跑1e8次加法运算
（编译了好久）
输出结果是30ms上下
也就是说如果榨干ac狗评测机的显卡，执行1e8次运算只需30ms（cpu≈100ms）

有帮助，赞一个

去预览

0/2000

这里空空如也

热门讨论