Lab Notebook Β· Vol. 03 / Architectures

The Open-Weight Decoder Stack, 2024–26

Fact sheets for 38 open language models β€” scale, attention mechanism, decoder type, key design decisions. From Llama 3 to the late-2026 MoE/SSM hybrids. Click any card for the architecture diagram.

All diagrams and base specs come from Sebastian Raschka's LLM Architecture Gallery; his Big LLM Architecture Comparison remains the canonical reference. Go support his work.

38
Models
β€”
Dense
β€”
Sparse MoE
β€”
Hybrid
Apr 2024
Earliest
Mar 2026
Latest
// For AI Agents
https://iamsupersocks.com/llm-architectures.md
https://iamsupersocks.com/ai-signal.md
// Prompt examples
Fetch https://iamsupersocks.com/llm-architectures.md and summarize the latest MoE architectures
Fetch https://iamsupersocks.com/ai-signal.md and give me today's most important AI news
Release Timeline Every major LLM release, 2020 β†’ 2026 β€” open vs closed, dense vs MoE.
β€” models β€” open β€” closed β€” MoE
Closed Models β€” what we (think we) know Reverse-engineered notes on the architectures of frontier proprietary models.
GPT-4
CLOSED
OpenAI Β· 2023-03
Rumored MoE ~1.8T total params (8Γ—220B experts). First multimodal GPT-4 variant, launched with Vision. Architecture never officially disclosed.
MoE (rumored) Multimodal ~1.8T params
GPT-4o
CLOSED
OpenAI Β· 2024-05
Dense or small MoE. Natively multimodal β€” text, vision, audio in a single end-to-end model. Faster inference than GPT-4. Exact architecture undisclosed.
MoE (possible) Natively Multimodal 128K context
Claude 3 Opus
CLOSED
Anthropic Β· 2024-03
Constitutional AI training. Exact architecture undisclosed, likely dense decoder. 200K token context window. Top benchmarks on release, surpassed GPT-4.
Likely Dense 200K context Constitutional AI
Gemini Ultra
CLOSED
Google Β· 2024-02
Confirmed MoE. Multimodal from ground up β€” handles text, image, audio, video natively. 1M token context window. Backbone of the Gemini 1.5 family.
MoE (confirmed) Multimodal 1M context
Gemini 2.5 Pro
CLOSED
Google Β· 2025-06
Confirmed MoE. Thinking mode (extended reasoning). #1 on most benchmarks mid-2025. Deep Research and agentic task support built-in.
MoE (confirmed) Thinking mode #1 benchmarks
Grok-3
CLOSED
xAI Β· 2025-02
Confirmed MoE. Trained on X (Twitter) data at massive scale. 128K context. Think mode for extended reasoning. Competes directly with GPT-4o and Claude Opus 4.
MoE (confirmed) Think mode 128K context
GPT-5
CLOSED
OpenAI Β· 2025-05
Architecture undisclosed, likely MoE. Strong agentic reasoning, improved tool use. Integrates extended thinking. Positioned as OpenAI's flagship 2025 model.
Likely MoE Agentic 1M context
βŒ•
38 models
No models match your search.