High-performance CUDA kernel generation and benchmarking framework
View the Project on GitHub jasonlarkin/cuda-stencil-benchmark
main (stable, production-ready)
↑ merge
develop (integration branch)
↑ merge
feature/cuda-kernels (CUDA kernel development)
git checkout develop
git checkout -b feature/cuda-kernels
cuda/kernels/XXX_name/kernel.cukernel-XXX: [description]git checkout develop
git merge --no-ff feature/cuda-kernels
Kernel additions:
kernel-XXX: Add [description] kernel (LLM-generated)
Fixes:
kernel-XXX: Fix [issue type] - [description]
Validation:
kernel-XXX: [Status] - [result summary]
git checkout develop
git checkout -b feature/cuda-kernels
# Kernel 000: Baseline
git add cuda/kernels/000_baseline/
git commit -m "kernel-000: Add baseline z-coalesced CUDA kernel (LLM-generated)"
git commit -m "kernel-000: Fix compilation errors"
git commit -m "kernel-000: Fix correctness, passes parity test"
git commit -m "kernel-000: Performance analysis - near roofline, 9x speedup"
# Kernel 001: Tiled attempt
git add cuda/kernels/001_tiled/
git commit -m "kernel-001: Add x-y tiled kernel (LLM-generated, optimization attempt)"
git commit -m "kernel-001: Correctness issues identified, slower than baseline"
# Kernel 002: Sliding window
git add cuda/kernels/002_sliding/
git commit -m "kernel-002: Add sliding window kernel (LLM-generated, refined approach)"
git commit -m "kernel-002: Passes correctness, performance analysis complete"
# Merge to develop
git checkout develop
git merge --no-ff feature/cuda-kernels