Model Watch: Mercury 2 Diffusion

I don't typically hop on the "new shiny model" train--but Mercury 2 is absolutely worth mentioning. This is a diffusion model, which can achieve up to 1,000 TPS with reliable function calling. It's also surprisingly affordable.

I've been running it on one of my OpenPaw agents for the last few days. I'm absolutely blown away by the speed and the cost.

Mercury 2 is roughly 5–10x faster in tokens/sec than typical fast autoregressive models, while also being cheaper per token than many of them. atalupadhyay.wordpress

Tokens-per-second speeds

The exact numbers for other vendors vary by hardware and benchmark; the table below uses representative public benchmark figures and ratios where specific t/s numbers are not all directly published, using Haiku and “GPT 5 Mini” as speed-optimized baselines. geeky-gadgets

Model	Type	Approx speed (tokens/sec)
Mercury 2	Diffusion reasoning LLM	~1,000 t/s
Claude 4.5 Haiku	Speed‑optimized LLM	~90 t/s
GPT 5 Mini	Speed‑optimized LLM	~70 t/s

These values are meant for relative comparison rather than exact vendor guarantees, but they illustrate that Mercury 2 sits in a distinct speed class.

Input/output pricing

Mercury 2’s pricing is explicitly documented; competing “fast” models are typically more expensive per million tokens, though exact public price cards for every named model are not always broken out by variant in the same level of detail. artificialanalysis

Model	Input price (per 1M tokens)	Output price (per 1M tokens)
Mercury 2	0.25 USD	0.75 USD
Claude 4.5 Haiku	Typically higher	Typically higher
GPT 5 Mini	Typically higher	Typically higher

Definitely keep an eye on this Inception Labs & the Mercury series!!