High-performance CUDA kernel generation and benchmarking framework
View the Project on GitHub jasonlarkin/cuda-stencil-benchmark
Stencil Order Verification
Claim
Evidence 1) Coefficient match: the code applies
(-8.33333333e-2) = −1/12 to (x±2), (y±2), (z±2)1.333333330 = 4/3 to (x±1), (y±1), (z±1)r5
This is the standard 4th‑order stencil for ∂²/∂x² etc., then scaled by r2,r3,r4 = 1/h².2) Convergence test:
analysis/verify_order.py computes the 3D Laplacian of f(x,y,z)=sin(axx)+sin(byy)+sin(czz) using this stencil and compares to the analytic laplacian.Run (two paths)
python analysis/verify_order.py → analysis/stencil_convergence.png (slope ~4 in double‑precision arithmetic).cd cpu_bench && make verify_order && python order_sweep.py --exe ./verify_ordermake verify_order_fp64 && python order_sweep.py --exe ./verify_order_fp64cpu_bench/stencil_convergence_sweep.png, cpu_bench/stencil_convergence_cpp_fp64.png.ε/h², producing apparent slopes near −2 on practical meshes. Lower frequencies/coarser h push curves toward the 4th‑order asymptote.