Agentic RL
As Large Language Models (LLMs) achieve breakthroughs in natural language processing, their applications continue to expand. However, they also exhibit limitations such as knowledge cutoffs, hallucinations, and deficiencies in complex computation and logical reasoning. To address these challenges, Agentic RL, which combines agents with Reinforcement Learning (RL), is emerging as a key research direction. Agentic RL enables LLMs to possess capabilities like autonomous planning, decision-making, tool use, and environmental interaction by creating a closed-loop interaction with the external world (e.g., search engines, code interpreters, databases, browsers) and continuously optimizing through reward signals. In practical applications, it not only understands requirements and plans autonomously but also constantly corrects and optimizes within an execution-feedback loop. ...
gpt-oss & GPT-5
In August 2025, the AI field witnessed a period of intensive releases from OpenAI. Following GPT-2 (OpenAI, 2019) in 2019, OpenAI has once again contributed to the open-source community with its first open-weight large language model series, gpt-oss (OpenAI, 2025), available in 120B and 20B sizes. Shortly after, the highly anticipated next-generation flagship model, GPT-5 (OpenAI, 2025), was also officially launched. This series of releases not only marks a new high for open-source models in reasoning and agent capabilities but also reveals OpenAI’s latest advancements in model architecture, training methodologies, and safety alignment. ...
Large Language Model Inference
In recent years, Large Language Models (LLMs) have achieved revolutionary breakthroughs in fields such as natural language processing, code generation, and even multimodal interaction. However, the powerful capabilities of these models come at the cost of enormous computational and memory overhead, especially during the inference stage. Efficiently deploying and running these models, which have billions or even trillions of parameters, has become a core challenge in scaling LLM technology for real-world applications. ...
vLLM: High-Throughput, Memory-Efficient LLM Serving
As the parameters of Large Language Models (LLMs) continue to grow, deploying and serving these models presents significant challenges. vLLM is an open-source library designed for fast, convenient, and cost-effective LLM inference and online serving. Its core lies in the PagedAttention algorithm, which efficiently manages the KV Cache in the attention mechanism. Evaluation Metrics To evaluate the performance of LLM inference and serving engines, we primarily focus on the following metrics: ...
Multimodal Large Language Models
Humans interact with the world through multiple senses (vision, hearing, touch, etc.), with each sensory channel offering unique advantages in representing and communicating specific concepts. This multimodal interaction fosters our deep understanding of the world. One of the core goals in the field of artificial intelligence is to develop general-purpose assistants that can effectively follow multimodal instructions (such as visual and linguistic ones), enabling them to perform various real-world tasks like humans. In recent years, with the release of models like GPT-4o (OpenAI, 2024), Gemini 2.5 Pro (DeepMind, 2025), and o3/o4-mini (OpenAI, 2025), Multimodal Large Language Models (MLLMs) have made significant progress. They can not only understand information from multiple modalities like images, videos, and audio but also perform complex reasoning and generation. ...