👋 Welcome to Yue’s blog

Hi, this is Yue Shui. I’m currently working on LLM algorithms, with a focus on Agentic RL. My past experience includes researching and applying LLMs in fields such as finance, audit, and code generation. This blog is where I document and share insights from my work and learning journey. The grammar mistakes in the posts might give you a hint about ChatGPT’s involvement 😉—let me know if you spot any! My interests include model training, RAG, and LLM Agent. Feel free to connect!

How to Choose the Right Agent Architecture?

Context management for long-horizon tasks and coordination across multiple execution units are among the central problems in current Agent research. In LLM Agents I introduced the three core modules of an Agent—planning, memory, and tool use; in Self-Evolving Agents and the FlashInfer Contest Summary I discussed the paradigm of Harness Engineering: humans design the constraints, feedback, and evaluation, while the Agent iterates inside a controlled closed loop to produce verifiable results. This post focuses on a more concrete layer of the problem: when tasks grow longer and more complex, what architecture should we use to organize an Agent? ...

Created: 2026-06-21 · Updated: 2026-06-21 · 21 min · 4450 words · Yue Shui

GPU Kernel Generation and Optimization with Coding Agents: MLSys 2026 FlashInfer Contest Summary

Recently, I participated in the MLSys 2026 - NVIDIA Track: FlashInfer AI Kernel Generation Contest (FlashInfer Contest, 2026a). This post is not a tutorial on CUDA kernel optimization, and I am not a GPU operator development expert. My main goal was to use a highly verifiable task environment with clear feedback to study how coding agents can continuously produce high-quality GPU kernels in a closed-loop workflow. The full materials are split into two reports: Harness Engineering for LLM-Driven GPU Kernel Generation (Shui et al., 2026) and Full-Agent Kernel Generation for FlashInfer (Ma et al., 2026). The code is available in mlsys26-flashinfer-contest. ...

Created: 2026-05-18 · Updated: 2026-05-25 · 10 min · 2046 words · Yue Shui

Self-Evolving Agents

A structural shift is underway in AI: the core capability of agents is moving from one-shot answer generation to continually producing verifiable, self-improving results in closed-loop systems. A representative milestone is DeepMind’s release of AlphaEvolve, an LLM-driven evolutionary coding agent that has delivered breakthroughs in mathematics, algorithm design, and engineering optimization, in several cases improving upon best-known human-designed baselines. Under this paradigm, the division of labor between humans and agents is clearly reconfigured: ...

Created: 2026-02-20 · Updated: 2026-03-16 · 14 min · 2785 words · Yue Shui

DeepSeek-V3.2 Series

By introducing DeepSeek Sparse Attention (DSA), a scalable reinforcement learning framework, and a large-scale agentic task synthesis pipeline, DeepSeek-V3.2 (DeepSeek-AI, 2025) achieves reasoning capabilities and agent performance comparable to GPT-5. Fig. 1. Benchmark of DeepSeek-V3.2 and its counterparts. (Image source: DeepSeek-AI, 2025) DeepSeek Sparse Attention Fig. 2. Attention architecture of DeepSeek-V3.2, where DSA is instantiated under MLA. (Image source: DeepSeek-AI, 2025) ...

Created: 2025-12-31 · Updated: 2025-12-31 · 14 min · 2917 words · Yue Shui

Scaling Laws

From the evolution of the GPT series, researchers have gradually realized that as long as model parameters, training data, and compute resources are continuously scaled up, the performance of large models improves along a stable and predictable path. This predictability is characterized by Scaling Laws, which provide the theoretical foundation and practical confidence for high-cost pre-training. As model scale, alignment techniques, and inference-time compute co-evolve, the boundaries of AI capabilities are being systematically pushed. Scaling laws are not only the foundation for building next-generation models but also a key methodology for continuously improving model capabilities under compute constraints. ...

Created: 2025-11-19 · Updated: 2025-12-03 · 12 min · 2365 words · Yue Shui