Project Goal

Overview

CUDA Stencil Benchmark is a framework for systematically generating, validating, and optimizing CUDA kernels for 3D finite-difference stencil computations using LLM-guided code generation.

Primary Objectives

LLM-Guided Kernel Generation: Use language models to generate optimized CUDA kernels from task specifications and reference implementations.
Correctness-First Validation: Ensure numerical parity between generated CUDA kernels and CPU reference implementations before performance optimization.
Systematic Performance Analysis: Characterize kernel performance using roofline methodology to understand memory-bound vs compute-bound behavior.
Iterative Optimization Loop: Establish a feedback-driven workflow where kernel generation, correctness testing, and performance analysis inform subsequent optimization attempts.

Target Applications

3D finite-difference time-domain (FDTD) stencil computations
High-performance scientific computing kernels
GPU acceleration of memory-bound numerical methods
Research into LLM-assisted code generation for HPC

Success Criteria

Correctness: Generated kernels maintain numerical parity with CPU reference (tolerance: 1e-5)
Performance: Achieve significant GPU speedup (target: 5-10×) over CPU baseline
Reproducibility: Automated testing and benchmarking framework
Documentation: Clear workflow and methodology documentation

Non-Goals

General-purpose CUDA code generation (focused on stencil computations)
PyTorch integration (pure C++/CUDA implementation)
Production deployment (research/benchmarking framework)