High-performance CUDA kernel generation and benchmarking framework
View the Project on GitHub jasonlarkin/cuda-stencil-benchmark
CUDA Stencil Benchmark is a framework for systematically generating, validating, and optimizing CUDA kernels for 3D finite-difference stencil computations using LLM-guided code generation.
LLM-Guided Kernel Generation: Use language models to generate optimized CUDA kernels from task specifications and reference implementations.
Correctness-First Validation: Ensure numerical parity between generated CUDA kernels and CPU reference implementations before performance optimization.
Systematic Performance Analysis: Characterize kernel performance using roofline methodology to understand memory-bound vs compute-bound behavior.
Iterative Optimization Loop: Establish a feedback-driven workflow where kernel generation, correctness testing, and performance analysis inform subsequent optimization attempts.