CANN/mat-chem-sim-pred:PID FOPDT 基础 GEMM 拟合基准测试

发布时间:2026/7/4 22:12:33
CANN/mat-chem-sim-pred:PID FOPDT 基础 GEMM 拟合基准测试 PidFopdtBasisGemmFit 测试报告【免费下载链接】mat-chem-sim-pred面向工业领域聚焦计算仿真、预测两大核心场景构建面向流程工业机理数据双轮驱动的领域计算层推动AI for Science在材料化学领域的深度应用。项目地址: https://gitcode.com/cann/mat-chem-sim-pred测试环境设备Ascend910B3device 3机器node202构建cmake -S . -B build -DCMAKE_BUILD_TYPERelease -DSOC_VERSIONAscend910B3CPU 基线64 线程完整 fit包含dot y_centered basis_t和 best reduce测试命令source /usr/local/Ascend/ascend-toolkit/set_env.sh cd prediction/ProcessControl/PIDModelFit/pid_fopdt_basis_gemm_fit cmake -S . -B build -DCMAKE_BUILD_TYPERelease -DSOC_VERSIONAscend910B3 cmake --build build -j 2 export LD_LIBRARY_PATH$PWD/build:$PWD/build/lib:${LD_LIBRARY_PATH:-} ./build/test_aclnn_pid_fopdt_basis_gemm_fit 3 ./build/benchmark_pid_fopdt_basis_gemm_pipeline 3 64 1024 256 5 2 64正确性smoke 已通过PidFopdtBasisGemmFit smoke best_sse[1, 12] best_k[1.5, 2] best_idx[2, 1] PASSEDpipeline benchmark 与 CPU reference 对比max_abs_sse0.00378418 max_rel_sse0.00378418 max_abs_k1.54972e-06 idx_diff_count0性能结果B64,N1024,M256口径耗时对 CPU 64T 加速比CPU 64T 完整 fit8.74037 ms1.00xNPU resident e2e0.303587 ms28.79xNPU cold e2e0.989354 ms8.83xFOPDT 扩展规模配置CPU 64T 完整 fitNPU resident e2eNPU cold e2eresident 加速比cold 加速比B128,N1024,M51228.6413 ms0.308415 ms1.52882 ms92.87x18.73xB256,N1024,M51255.6340 ms0.306237 ms1.06366 ms181.67x52.30xB128,N2048,M51261.8390 ms0.371604 ms1.00696 ms166.41x61.41x口径说明resident e2e输入已在 Device只统计aclnnMatmul custom reduce best result D2H。cold e2e统计输入 H2D、aclnnMatmul custom reduce和 best result D2H。dot[B, M]常驻 Device不回传 Host直接作为 reduce 算子的输入。结论FOPDT basis-GEMM pipeline 在 resident 和 cold e2e 两种口径下均显著快于 CPU 64 线程完整 fit适合作为多回路、多候选 PID 模型辨识的 NPU 主线实现。【免费下载链接】mat-chem-sim-pred面向工业领域聚焦计算仿真、预测两大核心场景构建面向流程工业机理数据双轮驱动的领域计算层推动AI for Science在材料化学领域的深度应用。项目地址: https://gitcode.com/cann/mat-chem-sim-pred创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考