How to Choose the Right Agent Architecture?
Context management for long-horizon tasks and coordination across multiple execution units are among the central problems in current Agent research. In LLM Agents I introduced the three core modules of an Agent—planning, memory, and tool use; in Self-Evolving Agents and the FlashInfer Contest Summary I discussed the paradigm of Harness Engineering: humans design the constraints, feedback, and evaluation, while the Agent iterates inside a controlled closed loop to produce verifiable results. This post focuses on a more concrete layer of the problem: when tasks grow longer and more complex, what architecture should we use to organize an Agent? ...
GPU Kernel Generation and Optimization with Coding Agents: MLSys 2026 FlashInfer Contest Summary
Recently, I participated in the MLSys 2026 - NVIDIA Track: FlashInfer AI Kernel Generation Contest (FlashInfer Contest, 2026a). This post is not a tutorial on CUDA kernel optimization, and I am not a GPU operator development expert. My main goal was to use a highly verifiable task environment with clear feedback to study how coding agents can continuously produce high-quality GPU kernels in a closed-loop workflow. The full materials are split into two reports: Harness Engineering for LLM-Driven GPU Kernel Generation (Shui et al., 2026) and Full-Agent Kernel Generation for FlashInfer (Ma et al., 2026). The code is available in mlsys26-flashinfer-contest. ...
Self-Evolving Agents
A structural shift is underway in AI: the core capability of agents is moving from one-shot answer generation to continually producing verifiable, self-improving results in closed-loop systems. A representative milestone is DeepMind’s release of AlphaEvolve, an LLM-driven evolutionary coding agent that has delivered breakthroughs in mathematics, algorithm design, and engineering optimization, in several cases improving upon best-known human-designed baselines. Under this paradigm, the division of labor between humans and agents is clearly reconfigured: ...
DeepSeek-V3.2 Series
By introducing DeepSeek Sparse Attention (DSA), a scalable reinforcement learning framework, and a large-scale agentic task synthesis pipeline, DeepSeek-V3.2 (DeepSeek-AI, 2025) achieves reasoning capabilities and agent performance comparable to GPT-5. Fig. 1. Benchmark of DeepSeek-V3.2 and its counterparts. (Image source: DeepSeek-AI, 2025) DeepSeek Sparse Attention Fig. 2. Attention architecture of DeepSeek-V3.2, where DSA is instantiated under MLA. (Image source: DeepSeek-AI, 2025) ...
Scaling Laws
From the evolution of the GPT series, researchers have gradually realized that as long as model parameters, training data, and compute resources are continuously scaled up, the performance of large models improves along a stable and predictable path. This predictability is characterized by Scaling Laws, which provide the theoretical foundation and practical confidence for high-cost pre-training. As model scale, alignment techniques, and inference-time compute co-evolve, the boundaries of AI capabilities are being systematically pushed. Scaling laws are not only the foundation for building next-generation models but also a key methodology for continuously improving model capabilities under compute constraints. ...