Scaling Laws

From the evolution of the GPT series, researchers have gradually realized that as long as model parameters, training data, and compute resources are continuously scaled up, the performance of large models improves along a stable and predictable path. This predictability is characterized by Scaling Laws, which provide the theoretical foundation and practical confidence for high-cost pre-training. As model scale, alignment techniques, and inference-time compute co-evolve, the boundaries of AI capabilities are being systematically pushed. Scaling laws are not only the foundation for building next-generation models but also a key methodology for continuously improving model capabilities under compute constraints. ...

Created: 2025-11-19 · Updated: 2025-12-03 · 12 min · 2365 words · Yue Shui

Agentic RL

As Large Language Models (LLMs) achieve breakthroughs in natural language processing, their applications continue to expand. However, they also exhibit limitations such as knowledge cutoffs, hallucinations, and deficiencies in complex computation and logical reasoning. To address these challenges, Agentic RL, which combines agents with Reinforcement Learning (RL), is emerging as a key research direction. Agentic RL enables LLMs to possess capabilities like autonomous planning, decision-making, tool use, and environmental interaction by creating a closed-loop interaction with the external world (e.g., search engines, code interpreters, databases, browsers) and continuously optimizing through reward signals. In practical applications, it not only understands requirements and plans autonomously but also constantly corrects and optimizes within an execution-feedback loop. ...

Created: 2025-09-30 · Updated: 2025-09-30 · 24 min · 5072 words · Yue Shui

gpt-oss & GPT-5

In August 2025, the AI field witnessed a period of intensive releases from OpenAI. Following GPT-2 (OpenAI, 2019) in 2019, OpenAI has once again contributed to the open-source community with its first open-weight large language model series, gpt-oss (OpenAI, 2025), available in 120B and 20B sizes. Shortly after, the highly anticipated next-generation flagship model, GPT-5 (OpenAI, 2025), was also officially launched. This series of releases not only marks a new high for open-source models in reasoning and agent capabilities but also reveals OpenAI’s latest advancements in model architecture, training methodologies, and safety alignment. ...

Created: 2025-08-24 · Updated: 2025-08-24 · 12 min · 2541 words · Yue Shui

Large Language Model Inference

In recent years, Large Language Models (LLMs) have achieved revolutionary breakthroughs in fields such as natural language processing, code generation, and even multimodal interaction. However, the powerful capabilities of these models come at the cost of enormous computational and memory overhead, especially during the inference stage. Efficiently deploying and running these models, which have billions or even trillions of parameters, has become a core challenge in scaling LLM technology for real-world applications. ...

Created: 2025-06-29 · Updated: 2025-06-29 · 43 min · 9025 words · Yue Shui

vLLM: High-Throughput, Memory-Efficient LLM Serving

As the parameters of Large Language Models (LLMs) continue to grow, deploying and serving these models presents significant challenges. vLLM is an open-source library designed for fast, convenient, and cost-effective LLM inference and online serving. Its core lies in the PagedAttention algorithm, which efficiently manages the KV Cache in the attention mechanism. Evaluation Metrics To evaluate the performance of LLM inference and serving engines, we primarily focus on the following metrics: ...

Created: 2025-05-17 · Updated: 2025-05-17 · 20 min · 4204 words · Yue Shui