LLM Directory

Compare pricing, context windows, and benchmarks across 150 models from 12 providers and 3 hosted platforms.

150 models25 free / open-sourceUpdated 2026-06-15Compare models →

LLM Providers

Companies that build and host their own foundation models.

OpenAI

22 models

Leading AI lab and API provider. Creators of GPT, o-series reasoning models, and DALL·E.

From $5.00Budget $0.040

Anthropic

8 models

AI safety company. Creators of the Claude model family, emphasizing responsible AI development.

From $5.00Budget $0.625

Google

13 models

Google DeepMind's Gemini model family. 1M+ context windows, multimodal, competitive pricing.

From $2.50Budget $0.020

DeepSeek

6 models

Chinese AI lab offering frontier open-weight models at dramatically low prices.

From $0.300Budget $0.140

xAI

4 models

Elon Musk's AI company. Grok models with real-time X (Twitter) data access.

From $3.00Budget $0.200

Mistral

7 models

French AI company offering open-weight and commercial models. Known for efficient architectures.

From $2.00Budget $0.100

Cohere

6 models

Enterprise-focused AI company. Command R models for RAG, search, and business applications.

From $2.50Budget $0.050

Moonshot AI

5 models

Chinese AI company. Creators of Kimi models with strong long-context and coding abilities.

From $0.600Budget $0.150

MiniMax

5 models

Chinese AI company offering text, speech, and video generation models.

From $0.300Budget $0.010

Alibaba Cloud (Qwen)

10 models

Alibaba's Qwen model family. Open-weight and API-available. Strong multilingual support.

From $0.800Budget $0.010

NVIDIA

5 models

NVIDIA NIM inference platform. Hosts and serves popular open-weight models at scale.

From $0.800Budget $0.200

Hosted Platforms

Unified APIs that serve models from multiple providers — one key, many models.

Ollama

20 models

Run 100+ open-weight models locally. No API costs. The easiest way to self-host LLMs on Linux, macOS, and Windows.

20 free

OpenRouter

20 models

Unified API gateway to 400+ models from OpenAI, Anthropic, Google, Meta, and more. Pay per token, no subscriptions.

$0.050 – $15.00/1M

Together AI

11 models

Serverless GPU cloud for open-weight models. Fast inference, competitive pricing, 200+ models.

$0.010 – $0.880/1M

Full Comparison

All models sortable by provider, price, and context window. Click a provider name for details.

Provider	Model	Category	Input / 1M	Output / 1M	Context	MMLU	HumanEval
OpenAI	GPT-5.5 OpenAI's widely available 2026 coding and reasoning workhorse for ChatGPT, Codex, and the API. Strong tool use with large context.	Flagship	$5.00	$30.00	1.0M	90	92
	GPT-5.6 OpenAI's July 2026 three-tier family (Sol / Terra / Luna) with programmatic tool calling in the Responses API. Sol is the frontier tier for coding, reasoning, and agent workflows.	Flagship	$5.00	$30.00	1.0M	91	93
	GPT-4.1 Long-context coding model with 1M token context window.	Flagship	$2.00	$8.00	1.0M	90.2	91
	GPT-4.1 Mini Affordable long-context model balancing speed and intelligence.	Mid-range	$0.400	$1.60	1.0M	86.5	87.2
	GPT-4.1 Nano Ultra-fast, ultra-cheap model for high-volume tasks.	Budget	$0.100	$0.400	1.0M	80.1	81.5
	GPT-4o Flagship multimodal model. Fast, capable, and widely deployed.	Flagship	$2.50	$10.00	128K	88.7	90.2
	GPT-4o Mini Fast, affordable small model for everyday tasks.	Budget	$0.150	$0.600	128K	82	87
	o3 Reasoning model for complex STEM, math, and multi-step tasks.	Reasoning	$10.00	$40.00	200K	92.9	92.3
	o3 Pro Highest-capability reasoning model for hardest problems.	Reasoning	$20.00	$80.00	200K	94	95.1
	o3 Deep Research Extended reasoning with web browsing for research tasks.	Reasoning	$10.00	$40.00	200K	—	—
	o4-mini Affordable reasoning model balancing cost and capability.	Reasoning	$1.10	$4.40	200K	87.5	89.8
	o4-mini Deep Research Budget research model with web browsing.	Reasoning	$1.10	$4.40	200K	—	—
	GPT-5 Next-generation general-purpose model with agentic capabilities.	Flagship	$1.25	$10.00	128K	89.2	91
	GPT-5.1 Improved GPT-5 with larger context and better reasoning.	Flagship	$2.00	$12.00	256K	91	93.5
	GPT-5.2 Latest GPT-5 iteration with strongest coding and math.	Flagship	$2.50	$15.00	256K	92.1	94.2
	GPT-5 Mini Fast, affordable variant of GPT-5 for everyday tasks.	Mid-range	$0.250	$1.50	128K	84	85
	GPT-5 Nano Ultra-cheap GPT-5 variant for high-volume classification.	Budget	$0.050	$0.300	128K	76.5	78
	GPT-5 Chat Chat-optimized variant of GPT-5 for conversational use.	Chat	$1.25	$10.00	128K	—	—
	GPT-5 Codex Code-generation focused variant of GPT-5.	Coding	$1.50	$12.00	256K	—	—
	Computer Use Preview Model specialized in GUI interaction and computer control.	Agentic	$3.00	$12.00	128K	—	—
	GPT-4.5 Preview Earlier preview model. Superseded by GPT-4.1 and GPT-5.	Legacy	$30.00	$60.00	128K	—	—
	DALL·E 3 Image generation model. Priced per image, not per token.	Image	$0.040	$0.040	—	—	—
Anthropic	Claude Opus 4.8 Current Anthropic flagship for coding, long-context, and multi-step agent loops. 1M context at significantly lower cost than prior Opus tiers.	Flagship	$5.00	$25.00	1.0M	90.5	93
	Claude 4 Opus Most capable Claude model for complex agentic workflows.	Flagship	$15.00	$75.00	200K	91.9	94.3
	Claude 4 Sonnet Latest Sonnet with extended thinking. Best price-performance.	Mid-range	$3.00	$15.00	200K	90.4	93.7
	Claude 4 Haiku Fast, cost-effective model for high-volume tasks.	Budget	$0.625	$2.50	200K	85.2	88.4
	Claude 3.7 Sonnet Prior-generation Sonnet with extended thinking mode.	Mid-range	$3.00	$15.00	200K	88.5	92.4
	Claude 3.5 Sonnet Previous-generation Sonnet. Still strong for coding tasks.	Legacy	$3.00	$15.00	200K	88.7	92
	Claude 3.5 Haiku Previous-generation Haiku. Good for fast, simple tasks.	Legacy	$0.800	$4.00	200K	—	—
	Claude 3 Opus Original Opus model. Superseded by Claude 4 Opus.	Legacy	$15.00	$75.00	200K	86.8	89
Google	Gemini 3.1 Pro Google's high-capability multimodal model for long context, coding agents, and Vertex / AI Studio deployments.	Flagship	$2.50	$15.00	1.0M	89.5	90
	Gemini 2.5 Pro 1M token context. Strong coding and reasoning. Best Gemini model.	Flagship	$1.25	$10.00	1.0M	91.7	92.1
	Gemini 2.5 Flash Fast, cost-efficient Gemini with 1M context window.	Mid-range	$0.150	$0.600	1.0M	86.8	88.5
	Gemini 2.0 Flash Stable 2.0 Flash for production workloads.	Mid-range	$0.100	$0.400	1.0M	—	—
	Gemini 2.0 Flash Lite Ultra-lightweight model for high-volume tasks.	Budget	$0.075	$0.300	1.0M	—	—
	Gemini 1.5 Pro 2M context window model. Good for long document analysis.	Legacy	$1.25	$5.00	2.1M	—	—
	Gemini 1.5 Flash Fast, affordable 1M context model. Superseded by 2.5 Flash.	Legacy	$0.075	$0.300	1.0M	—	—
	Gemma 3 27B Open-weight instruction-tuned model. Free to download and run locally.	Open Source	$0.000	$0.000	128K	—	—
	Gemma 3 12B Mid-size open-weight model. Good balance for local deployment.	Open Source	$0.000	$0.000	128K	—	—
	Gemma 3 4B Small open-weight model. Runs on consumer hardware.	Open Source	$0.000	$0.000	128K	—	—
	Gemma 4 QAT 32B Quantization-aware training release. 72% less VRAM needed. Runs on 16GB GPUs.	Open Source	$0.000	$0.000	256K	—	—
	Gemma 4 QAT 14B Mid-size QAT model. Efficient inference for 8GB GPUs.	Open Source	$0.000	$0.000	128K	—	—
	Imagen 3 Text-to-image generation model. Priced per image.	Image	$0.020	$0.020	—	—	—
DeepSeek	DeepSeek-V4 Latest DeepSeek flagship. 1M context, top-tier coding and math.	Flagship	$0.300	$0.600	1.0M	—	—
	DeepSeek-V4 Flash Fast variant of V4. Incredible price-performance ratio.	Mid-range	$0.140	$0.280	1.0M	—	—
	DeepSeek-V4 Pro DeepSeek V4 Pro reasoning model. Replaces deepseek-reasoner.	Reasoning	$0.435	$0.870	1.0M	—	—
	DeepSeek-V3 Strong open-weight model at very low cost.	Legacy	$0.140	$0.280	64K	89.4	89.6
	DeepSeek-R1 Reasoning model. Output includes chain-of-thought.	Reasoning	$0.550	$2.19	64K	90.8	92.3
	DeepSeek-R1-0528 Updated R1 with improved coding and longer output.	Reasoning	$0.550	$2.19	64K	—	—
xAI	Grok 3 xAI flagship with real-time X data access.	Flagship	$3.00	$15.00	131K	88.9	89.1
	Grok 3 Mini Fast, affordable Grok model for everyday tasks.	Budget	$0.300	$0.500	131K	85.4	87
	Grok 2 Previous-generation Grok. Still capable for general tasks.	Legacy	$2.00	$10.00	131K	—	—
	Grok 2 Mini Budget previous-generation Grok model.	Legacy	$0.200	$0.600	131K	—	—
Meta	Llama 4 Maverick Open-weight multimodal MoE model. 128 active of 400B params.	Flagship	$0.200	$0.600	256K	87.5	89
	Llama 4 Scout Efficient MoE model. 1M context window. Good for RAG.	Mid-range	$0.150	$0.450	1.0M	—	—
	Llama 3.3 70B Stable 70B model. Good balance of capability and speed.	Mid-range	$0.880	$0.880	131K	83.5	86
	Llama 3.1 405B Largest open-weight model. Requires significant compute.	Flagship	$3.00	$3.00	131K	87.3	89
	Llama 3.1 70B Popular 70B model for fine-tuning and local deployment.	Mid-range	$0.880	$0.880	131K	82	81
	Llama 3.1 8B Small, fast model. Great for edge devices and prototyping.	Budget	$0.050	$0.050	131K	—	—
	Llama 3.2 11B Vision Multimodal model with image understanding.	Multimodal	$0.180	$0.180	131K	—	—
	Llama 3.2 90B Vision Large multimodal model with strong vision capabilities.	Multimodal	$0.900	$0.900	131K	—	—
Mistral	Mistral Large Mistral's most capable model. Strong multilingual and coding.	Flagship	$2.00	$6.00	128K	—	—
	Mistral Medium Good balance of cost and capability for production use.	Mid-range	$0.400	$2.00	128K	—	—
	Mistral Small Fast, affordable model for high-volume tasks.	Budget	$0.200	$0.600	128K	—	—
	Mistral Nemo Open-weight 12B model. Good for fine-tuning.	Open Source	$0.150	$0.150	128K	—	—
	Mistral Codestral Code generation model. 256K context, 80+ languages.	Coding	$0.300	$0.900	256K	—	—
	Pixtral Large Multimodal model combining vision and text understanding.	Multimodal	$2.00	$6.00	128K	—	—
	Mistral Embed Embedding model for search and RAG applications.	Embedding	$0.100	$0.000	8K	—	—
Cohere	Command R+ Cohere's most capable model. Strong for RAG and enterprise search.	Flagship	$2.50	$10.00	128K	—	—
	Command R Cost-effective model optimized for RAG and tool use.	Mid-range	$0.500	$2.00	128K	—	—
	Command A Latest Command model with extended context and improved performance.	Mid-range	$0.950	$3.80	256K	—	—
	Command R7B Small, fast model for classification and simple RAG.	Budget	$0.050	$0.200	128K	—	—
	Cohere Embed v3 Multilingual embedding model for search and retrieval.	Embedding	$0.100	$0.000	512	—	—
	Cohere Rerank v3 Reranking model for improving search relevance.	Reranking	$0.100	$0.000	4K	—	—
Moonshot AI	Kimi K2 Latest Kimi model. Strong coding and agentic capabilities. MoE architecture.	Flagship	$0.600	$2.40	131K	—	—
	Kimi K2 Flash Fast Kimi variant. Great price-performance for everyday tasks.	Mid-range	$0.150	$0.600	131K	—	—
	Moonshot v1 128K Previous-generation Kimi with 128K context.	Legacy	$0.840	$3.36	131K	—	—
	Moonshot v1 32K Budget Kimi variant with 32K context.	Legacy	$0.480	$1.92	33K	—	—
	Moonshot v1 8K Lowest-cost Kimi variant for short-context tasks.	Budget	$0.240	$0.960	8K	—	—
MiniMax	MiniMax-M2.7 Latest MiniMax model. 1M context, strong multilingual.	Flagship	$0.300	$1.20	1.0M	—	—
	MiniMax-Text-01 Budget MiniMax model. Good for high-volume tasks.	Mid-range	$0.100	$0.400	1.0M	—	—
	Hailuo 2.3 (Video) Video generation model. Pricing per second of video.	Video	$0.050	$0.050	—	—	—
	Speech 2.8 Text-to-speech model with voice cloning.	Speech	$0.010	$0.000	—	—	—
	Music 2.5 Music generation from text prompts.	Music	$0.050	$0.000	—	—	—
Alibaba Cloud (Qwen)	Qwen3 235B-A22B MoE model. 235B total, 22B active. Top open-source coding model.	Flagship	$0.800	$1.60	128K	—	—
	Qwen3 32B Dense 32B model. Great for local deployment.	Mid-range	$0.300	$0.600	128K	—	—
	Qwen3 14B Compact model. Runs on consumer GPUs.	Budget	$0.150	$0.300	128K	—	—
	Qwen3 4B Small model for edge devices and mobile.	Budget	$0.050	$0.100	33K	—	—
	Qwen3 1.7B Ultra-small model. Good for classification and extraction.	Budget	$0.010	$0.020	33K	—	—
	Qwen3 Coder 480B-A35B MoE coding specialist. 480B total, 35B active params.	Coding	$0.500	$1.00	256K	—	—
	Qwen2.5 72B Previous-generation 72B. Still strong for many tasks.	Legacy	$0.400	$0.800	131K	—	—
	Qwen2.5-VL 72B Vision-language model. Image and video understanding.	Multimodal	$0.500	$1.00	131K	—	—
	Qwen3.6-27B Dense 27B model for coding, reasoning, and agent workloads.	Mid-range	$0.200	$0.400	131K	—	—
	Qwen-AgentWorld-35B-A3B Qwen's language world model that simulates agent environments across MCP, search, terminal, SWE, Android, web, and OS in a single 35B MoE (3B active). Apache 2.0.	Agentic	$0.300	$1.20	262K	85	82
NVIDIA	Nemotron 3 Ultra 253B NVIDIA's flagship MoE model. 253B params, 55B active.	Flagship	$0.800	$2.40	131K	—	—
	Llama 3.3 70B (NIM) Meta's Llama 3.3 70B served via NVIDIA NIM.	Hosted	$0.700	$0.700	131K	—	—
	DeepSeek R1 (NIM) DeepSeek R1 reasoning model served via NIM.	Hosted	$0.880	$3.52	131K	—	—
	Qwen 2.5 72B (NIM) Qwen 2.5 72B served via NVIDIA NIM.	Hosted	$0.700	$0.700	131K	—	—
	Mistral Small (NIM) Mistral Small served via NVIDIA NIM.	Hosted	$0.200	$0.200	131K	—	—
Ollama	Gemma 4 QAT 32B Google's latest. 72% less VRAM. Runs on 16GB GPUs.	Featured	$0.000	$0.000	256K	—	—
	Gemma 4 QAT 14B Mid-size Gemma 4 QAT. Efficient for 8GB GPUs.	Featured	$0.000	$0.000	128K	—	—
	Qwen3 235B-A22B MoE model. 22B active params. Strong coding and reasoning.	Featured	$0.000	$0.000	131K	—	—
	Qwen3 32B Best open-source 32B model. Great for local deployment.	Popular	$0.000	$0.000	131K	—	—
	DeepSeek R1 Open-weight reasoning model. Chain-of-thought included.	Reasoning	$0.000	$0.000	131K	—	—
	DeepSeek V3 671B MoE model. Top open-source general model.	Popular	$0.000	$0.000	66K	—	—
	Llama 4 Maverick Meta's latest MoE. 128 active of 400B params.	Popular	$0.000	$0.000	256K	—	—
	Llama 3.3 70B Stable workhorse 70B model for production.	Popular	$0.000	$0.000	131K	—	—
	Gemma 3 27B Google's open-weight 27B. Strong for its size.	Popular	$0.000	$0.000	128K	—	—
	Mistral Small 3.1 24B Mistral's latest small model. Vision support included.	Popular	$0.000	$0.000	131K	—	—
	Phi-4 14B Microsoft's compact reasoning model. Punches above its weight.	Popular	$0.000	$0.000	16K	—	—
	Command R Cohere's open-weight RAG model. Built for retrieval.	RAG	$0.000	$0.000	131K	—	—
	Codestral 22B Mistral's open-weight coding model. 80+ languages.	Coding	$0.000	$0.000	66K	—	—
	GLM-4 9B Zhipu AI's open-weight bilingual model.	Popular	$0.000	$0.000	131K	—	—
	Llama 3.1 8B Smallest Llama 3.1. Great for fine-tuning.	Budget	$0.000	$0.000	131K	—	—
	Gemma 3 4B Compact Gemma for edge devices.	Budget	$0.000	$0.000	131K	—	—
	Qwen3 4B Small Qwen for mobile and edge.	Budget	$0.000	$0.000	33K	—	—
	SmolLM2 1.7B HuggingFace's tiny model. Runs on phones.	Edge	$0.000	$0.000	8K	—	—
	Nomic Embed Text Open-weight embedding model for search and RAG.	Embedding	$0.000	$0.000	8K	—	—
	Stable Diffusion 3.5 Open-weight image generation model.	Image	$0.000	$0.000	—	—	—
OpenRouter	Claude 4 Opus Best Claude model. Full power for complex tasks.	Flagship	$15.00	$75.00	200K	—	—
	Claude 4 Sonnet Best price-performance Claude.	Popular	$3.00	$15.00	200K	—	—
	GPT-5.2 Latest GPT model via OpenRouter.	Flagship	$2.50	$15.00	256K	—	—
	Gemini 2.5 Pro Google's best model with 1M context.	Popular	$1.25	$10.00	1.0M	—	—
	Gemini 2.5 Flash Fast, affordable Gemini.	Popular	$0.150	$0.600	1.0M	—	—
	DeepSeek V4 Incredible price-performance. 1M context.	Popular	$0.300	$0.600	1.0M	—	—
	o3 OpenAI's reasoning model for hard problems.	Reasoning	$10.00	$40.00	200K	—	—
	o4-mini Affordable reasoning model.	Reasoning	$1.10	$4.40	200K	—	—
	Llama 4 Maverick Meta's latest open-weight model.	Open Source	$0.200	$0.600	256K	—	—
	Kimi K2 Moonshot's latest coding model. MoE architecture.	Popular	$0.600	$2.40	131K	—	—
	Qwen3 235B-A22B Top open-source MoE model.	Open Source	$0.800	$1.60	128K	—	—
	Mistral Large Mistral's most capable model.	Popular	$2.00	$6.00	128K	—	—
	Grok 3 xAI's flagship with real-time data access.	Popular	$3.00	$15.00	131K	—	—
	Command A Cohere's latest model for RAG and enterprise.	RAG	$0.950	$3.80	256K	—	—
	MiniMax-M2.7 MiniMax with 1M context.	Popular	$0.300	$1.20	1.0M	—	—
	GPT-4.1 Long-context coding model.	Coding	$2.00	$8.00	1.0M	—	—
	GPT-4.1 Mini Affordable long-context model.	Popular	$0.400	$1.60	1.0M	—	—
	GPT-5 Nano Ultra-cheap for classification and routing.	Budget	$0.050	$0.300	128K	—	—
	DeepSeek R1 Open-weight reasoning model.	Reasoning	$0.550	$2.19	64K	—	—
	Nemotron 3 Ultra 253B NVIDIA's flagship MoE model.	New	$0.800	$2.40	131K	—	—
Together AI	Llama 4 Maverick Meta's latest MoE on fast serverless inference.	Flagship	$0.180	$0.540	1.0M	—	—
	Llama 3.3 70B Stable 70B on fast inference.	Popular	$0.880	$0.880	131K	—	—
	Qwen3 235B-A22B Qwen's MoE model on Together.	Popular	$0.800	$1.60	131K	—	—
	Qwen3 32B Dense 32B at low cost.	Popular	$0.300	$0.600	131K	—	—
	DeepSeek R1 DeepSeek reasoning on Together.	Reasoning	$0.550	$2.19	131K	—	—
	DeepSeek V3 672B MoE at unbeatable pricing.	Popular	$0.140	$0.280	66K	—	—
	Gemma 3 27B Google's 27B open-weight on Together.	Popular	$0.240	$0.240	131K	—	—
	Mistral Small 3.1 24B Mistral's latest small model.	Popular	$0.100	$0.100	131K	—	—
	Codestral 22B Mistral's coding specialist.	Coding	$0.150	$0.450	66K	—	—
	FLUX.2 Pro Top open-weight image model. $0.03/MP.	Image	$0.030	$0.030	—	—	—
	FLUX.2 Dev Open-weight image generation. Great quality per dollar.	Image	$0.010	$0.010	—	—	—

Source: provider pricing pages. Prices are per 1M tokens unless noted. Benchmarks from public leaderboards. Free models ($0) are open-weight — run locally or via hosted platform.