High-performance CUDA kernel generation and benchmarking framework
View the Project on GitHub jasonlarkin/cuda-stencil-benchmark
The framework follows a three-phase workflow:
tasks/)fctd3d/ (3D stencil), matmul/ (matrix multiplication)prompts/)system.md: System-level constraints and prioritiestask_template.md: Task-specific generation templatefew_shots.md: Example kernels for few-shot learninginclude/)dataobj: Multi-dimensional array representationprofiler: Timing instrumentationKernel() (CPU), Kernel_cuda() (CUDA)cpu_bench/)cuda/)bench_cuda.cpp)kernels/XXX_name/)dev_bw.cu)run_benchmark.sh)tests/)analysis/)Task Spec → LLM Prompt → Generated Kernel
↓
Compilation Test
↓
Correctness Test (vs CPU)
↓
Performance Benchmark
↓
Analysis & Feedback
↓
Next Iteration
tasks/cuda/kernels/ (once CUDA infrastructure is migrated)Kernel_cuda() functionanalysis/