Lab Notebook · Vol. 03 / Architectures

The Open-Weight Decoder Stack, 2024–26

Fact sheets for 38 open language models — scale, attention mechanism, decoder type, key design decisions. From Llama 3 to the late-2026 MoE/SSM hybrids. Click any card for the architecture diagram.

All diagrams and base specs come from Sebastian Raschka's LLM Architecture Gallery; his Big LLM Architecture Comparison remains the canonical reference. Go support his work.

Closed Models — what we (think we) know Reverse-engineered notes on the architectures of frontier proprietary models.

GPT-4

CLOSED

OpenAI · 2023-03

Rumored MoE ~1.8T total params (8×220B experts). First multimodal GPT-4 variant, launched with Vision. Architecture never officially disclosed.

MoE (rumored) Multimodal ~1.8T params

GPT-4o

CLOSED

OpenAI · 2024-05

Dense or small MoE. Natively multimodal — text, vision, audio in a single end-to-end model. Faster inference than GPT-4. Exact architecture undisclosed.

MoE (possible) Natively Multimodal 128K context

Claude 3 Opus

CLOSED

Anthropic · 2024-03

Constitutional AI training. Exact architecture undisclosed, likely dense decoder. 200K token context window. Top benchmarks on release, surpassed GPT-4.

Likely Dense 200K context Constitutional AI

Gemini Ultra

CLOSED

Google · 2024-02

Confirmed MoE. Multimodal from ground up — handles text, image, audio, video natively. 1M token context window. Backbone of the Gemini 1.5 family.

MoE (confirmed) Multimodal 1M context

Gemini 2.5 Pro

CLOSED

Google · 2025-06

Confirmed MoE. Thinking mode (extended reasoning). #1 on most benchmarks mid-2025. Deep Research and agentic task support built-in.

MoE (confirmed) Thinking mode #1 benchmarks

Grok-3

CLOSED

xAI · 2025-02

Confirmed MoE. Trained on X (Twitter) data at massive scale. 128K context. Think mode for extended reasoning. Competes directly with GPT-4o and Claude Opus 4.

MoE (confirmed) Think mode 128K context

GPT-5

CLOSED

OpenAI · 2025-05

Architecture undisclosed, likely MoE. Strong agentic reasoning, improved tool use. Integrates extended thinking. Positioned as OpenAI's flagship 2025 model.

Likely MoE Agentic 1M context