gpt-oss & GPT-5

In August 2025, the AI field witnessed a period of intensive releases from OpenAI. Following GPT-2 (OpenAI, 2019) in 2019, OpenAI has once again contributed to the open-source community with its first open-weight large language model series, gpt-oss (OpenAI, 2025), available in 120B and 20B sizes. Shortly after, the highly anticipated next-generation flagship model, GPT-5 (OpenAI, 2025), was also officially launched. This series of releases not only marks a new high for open-source models in reasoning and agent capabilities but also reveals OpenAI’s latest advancements in model architecture, training methodologies, and safety alignment. ...

Created: 2025-08-24 · Updated: 2025-08-24 · 12 min · 2541 words · Yue Shui

Multimodal Large Language Models

Humans interact with the world through multiple senses (vision, hearing, touch, etc.), with each sensory channel offering unique advantages in representing and communicating specific concepts. This multimodal interaction fosters our deep understanding of the world. One of the core goals in the field of artificial intelligence is to develop general-purpose assistants that can effectively follow multimodal instructions (such as visual and linguistic ones), enabling them to perform various real-world tasks like humans. In recent years, with the release of models like GPT-4o (OpenAI, 2024), Gemini 2.5 Pro (DeepMind, 2025), and o3/o4-mini (OpenAI, 2025), Multimodal Large Language Models (MLLMs) have made significant progress. They can not only understand information from multiple modalities like images, videos, and audio but also perform complex reasoning and generation. ...

Created: 2025-05-04 · Updated: 2025-05-04 · 48 min · 10182 words · Yue Shui