CUDA Stencil Benchmark
High-performance CUDA kernel generation and benchmarking framework for GPU optimization with correctness validation and performance characterization using LLM-guided code generation.
Quick Navigation
Core Documentation
- GOAL.md: Project objectives and success criteria
- DESIGN.md: System architecture and design decisions
- WORKFLOW.md: LLM-guided kernel generation workflow
- RESULTS.md: Performance results and analysis
Technical Documentation
Overview
This framework demonstrates a systematic approach to generating and optimizing CUDA kernels using language models, with emphasis on:
- Correctness-First Validation: Automated numerical parity verification
- Performance Analysis: Roofline methodology for quantitative characterization
- Iterative Optimization: Feedback-driven workflow for continuous improvement
- Extensible Design: Support for multiple computational patterns
Key Results
- Baseline Performance: ~70.4 GF/s on T4 GPU, reaching ~92% of roofline ceiling
- Correctness Validation: Automated parity testing identified optimization issues
- Iterative Development: Multiple kernel attempts demonstrate systematic workflow
- Workflow Generality: Extended to matrix multiplication to explore broader applicability
For detailed results and analysis, see RESULTS.md.
Getting Started
See the main README.md for build instructions and quick start guide.