Model Watch: Mercury 2 Diffusion

Technical NoteMay 9, 20262 min read
aillmdiffusionmodel-watchmercuryinception

I don't typically hop on the "new shiny model" train--but Mercury 2 is absolutely worth mentioning. This is a diffusion model, which can achieve up to 1,000 TPS with reliable function calling. It's also surprisingly affordable.

I've been running it on one of my OpenPaw agents for the last few days. I'm absolutely blown away by the speed and the cost.

Mercury 2 is roughly 5–10x faster in tokens/sec than typical fast autoregressive models, while also being cheaper per token than many of them. atalupadhyay.wordpress

Tokens-per-second speeds

The exact numbers for other vendors vary by hardware and benchmark; the table below uses representative public benchmark figures and ratios where specific t/s numbers are not all directly published, using Haiku and “GPT 5 Mini” as speed-optimized baselines. geeky-gadgets

ModelTypeApprox speed (tokens/sec)
Mercury 2Diffusion reasoning LLM~1,000 t/s
Claude 4.5 HaikuSpeed‑optimized LLM~90 t/s
GPT 5 MiniSpeed‑optimized LLM~70 t/s

These values are meant for relative comparison rather than exact vendor guarantees, but they illustrate that Mercury 2 sits in a distinct speed class.

Input/output pricing

Mercury 2’s pricing is explicitly documented; competing “fast” models are typically more expensive per million tokens, though exact public price cards for every named model are not always broken out by variant in the same level of detail. artificialanalysis

ModelInput price (per 1M tokens)Output price (per 1M tokens)
Mercury 20.25 USD0.75 USD
Claude 4.5 HaikuTypically higherTypically higher
GPT 5 MiniTypically higherTypically higher

Definitely keep an eye on this Inception Labs & the Mercury series!!