FlashInfer

Recently, I participated in the MLSys 2026 - NVIDIA Track: FlashInfer AI Kernel Generation Contest (FlashInfer Contest, 2026a). This post is not a tutorial on CUDA kernel optimization, and I am not a GPU operator development expert. My main goal was to use a highly verifiable task environment with clear feedback to study how coding agents can continuously produce high-quality GPU kernels in a closed-loop workflow. The full materials are split into two reports: Harness Engineering for LLM-Driven GPU Kernel Generation (Shui et al., 2026) and Full-Agent Kernel Generation for FlashInfer (Ma et al., 2026). The code is available in mlsys26-flashinfer-contest. ...