Weekly Extract

The LLM week, compressed.

A focused weekly brief of the AI model, research, safety, and product updates worth reading. Built from LLMgram's canonical AI Signal pipeline, ranked for source quality, event relevance, and usefulness to builders. Click any item to open its full AI Signal card without leaving LLMgram.

10signals selected
7dranking window
May 14, 2026 · 20:16 UTCgenerated
fresh sourceAI Signal data
May 14, 2026 · 20:11 UTCsource refreshed
Top 10 This Week
01
Simon Willison · model · May 14, 2026

datasette-ip-rate-limit 0.1a0

Release: datasette-ip-rate-limit 0.1a0 The datasette.io site was being hammered by poorly-behaved crawlers, so I had Codex (GPT-5.5 xhigh) build a configurable rate limiting plugin to block IPs that were hammering specific areas of the sit…

02
OpenAI · model · May 12, 2026

How NVIDIA engineers and researchers build with Codex

Teams use Codex with GPT-5.5 to ship production systems and turn research ideas into runnable experiments.

03
🫂 X/@aaron_epstein · other · May 12, 2026

@aaron_epstein: New model just released that beats sonnet 4.6, gemini 3 flash, and gpt 5.4 mini on OCR, vision, and

New model just released that beats sonnet 4.6, gemini 3 flash, and gpt 5.4 mini on OCR, vision, and STT tasks @interfaze_ai

04
arXiv cs.CL · model · May 14, 2026

HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model

arXiv:2605.11255v1 Announce Type: new Abstract: We present Hebatron, a Hebrew-specialized open-weight large language model built on the NVIDIA Nemotron-3 sparse Mixture-of-Experts architecture. Training employs a three-phase easy-to-hard c…

05
Bindu Reddy (X) · model · May 14, 2026

Gemini 3.2 Flash - Capitalizing on DeepMind's clever distillation techniques... Rumors are that benchmarks show it's hitting 92% of GPT 5.5's performance on co…

Gemini 3.2 Flash - Capitalizing on DeepMind's clever distillation techniques... Rumors are that benchmarks show it's hitting 92% of GPT 5.5's performance on coding and reasoning tasks while being 15-20x cheaper on inference costs. The late…

06
arXiv cs.LG · model · May 12, 2026

Weight Pruning Amplifies Bias: A Multi-Method Study of Compressed LLMs for Edge AI

arXiv:2605.08137v1 Announce Type: new Abstract: Weight pruning is widely advocated for deploying Large Language Models on resource-constrained IoT and edge devices, yet its impact on model fairness remains poorly understood. We conduct a c…

07
Cursor (X) · model · May 12, 2026

Fast mode for Claude Opus 4.7 is now available in Cursor! It's 2.5x the speed at 6x the cost. For most tasks, we recommend using the standard speed.

Fast mode for Claude Opus 4.7 is now available in Cursor! It's 2.5x the speed at 6x the cost. For most tasks, we recommend using the standard speed.

08
Simon Willison · model · May 12, 2026

llm 0.32a2

Release: llm 0.32a2 A bunch of useful stuff in this LLM alpha, but the most important detail is this one: Most reasoning-capable OpenAI models now use the /v1/responses endpoint instead of /v1/chat/completions . This enables interleaved re…

09
arXiv cs.CL · model · May 14, 2026

Predicting Psychological Well-Being from Spontaneous Speech using LLMs

arXiv:2605.11303v1 Announce Type: new Abstract: We investigate the use of Large Language Models (LLMs) for zero-shot prediction of Ryff Psychological Well-Being (PWB) scores from spontaneous speech. Using a few minutes of voice recordings…

10
Bindu Reddy (X) · model · May 12, 2026

Opus 4.7 is released in fast mode... Will pass on it - it's NOT a great model and is insanely expensive In the meantime, we finally have DeepSeek flash working…

Opus 4.7 is released in fast mode... Will pass on it - it's NOT a great model and is insanely expensive In the meantime, we finally have DeepSeek flash working on for a real-world use-case