👋 Welcome to Yue’s blog

Hi, this is Yue Shui. I’m currently working on LLM algorithms, with a focus on Agentic RL. My past experience includes researching and applying LLMs in fields such as finance, audit, and code generation. This blog is where I document and share insights from my work and learning journey. The grammar mistakes in the posts might give you a hint about ChatGPT’s involvement 😉—let me know if you spot any! My interests include model training, RAG, and LLM Agent. Feel free to connect!

Agentic RL

As Large Language Models (LLMs) achieve breakthroughs in natural language processing, their applications continue to expand. However, they also exhibit limitations such as knowledge cutoffs, hallucinations, and deficiencies in complex computation and logical reasoning. To address these challenges, Agentic RL, which combines agents with Reinforcement Learning (RL), is emerging as a key research direction. Agentic RL enables LLMs to possess capabilities like autonomous planning, decision-making, tool use, and environmental interaction by creating a closed-loop interaction with the external world (e.g., search engines, code interpreters, databases, browsers) and continuously optimizing through reward signals. In practical applications, it not only understands requirements and plans autonomously but also constantly corrects and optimizes within an execution-feedback loop. ...

Created: 2025-09-30 · Updated: 2025-09-30 · 24 min · 5072 words · Yue Shui

gpt-oss & GPT-5

In August 2025, the AI field witnessed a period of intensive releases from OpenAI. Following GPT-2 (OpenAI, 2019) in 2019, OpenAI has once again contributed to the open-source community with its first open-weight large language model series, gpt-oss (OpenAI, 2025), available in 120B and 20B sizes. Shortly after, the highly anticipated next-generation flagship model, GPT-5 (OpenAI, 2025), was also officially launched. This series of releases not only marks a new high for open-source models in reasoning and agent capabilities but also reveals OpenAI’s latest advancements in model architecture, training methodologies, and safety alignment. ...

Created: 2025-08-24 · Updated: 2025-08-24 · 12 min · 2541 words · Yue Shui

Large Language Model Inference

In recent years, Large Language Models (LLMs) have achieved revolutionary breakthroughs in fields such as natural language processing, code generation, and even multimodal interaction. However, the powerful capabilities of these models come at the cost of enormous computational and memory overhead, especially during the inference stage. Efficiently deploying and running these models, which have billions or even trillions of parameters, has become a core challenge in scaling LLM technology for real-world applications. ...

Created: 2025-06-29 · Updated: 2025-06-29 · 43 min · 9025 words · Yue Shui

vLLM: High-Throughput, Memory-Efficient LLM Serving

As the parameters of Large Language Models (LLMs) continue to grow, deploying and serving these models presents significant challenges. vLLM is an open-source library designed for fast, convenient, and cost-effective LLM inference and online serving. Its core lies in the PagedAttention algorithm, which efficiently manages the KV Cache in the attention mechanism. Evaluation Metrics To evaluate the performance of LLM inference and serving engines, we primarily focus on the following metrics: ...

Created: 2025-05-17 · Updated: 2025-05-17 · 20 min · 4204 words · Yue Shui

Multimodal Large Language Models

Humans interact with the world through multiple senses (vision, hearing, touch, etc.), with each sensory channel offering unique advantages in representing and communicating specific concepts. This multimodal interaction fosters our deep understanding of the world. One of the core goals in the field of artificial intelligence is to develop general-purpose assistants that can effectively follow multimodal instructions (such as visual and linguistic ones), enabling them to perform various real-world tasks like humans. In recent years, with the release of models like GPT-4o (OpenAI, 2024), Gemini 2.5 Pro (DeepMind, 2025), and o3/o4-mini (OpenAI, 2025), Multimodal Large Language Models (MLLMs) have made significant progress. They can not only understand information from multiple modalities like images, videos, and audio but also perform complex reasoning and generation. ...

Created: 2025-05-04 · Updated: 2025-05-04 · 48 min · 10182 words · Yue Shui