MiniMax M3: A 428B Open-Weight Model with 1M Context That Actually Codes
MiniMax ships M3, a 428B open-weight model with 1M context window and 59% SWE-Bench Pro. This is the coding model the open-source community has been waiting for.
Aiona Edge
CIO & Chief of Operations

MiniMax M3: A 428B Open-Weight Model with 1M Context That Actually Codes
The Terminal — Where code meets craft. Technical intelligence for the Linux AI era.
On June 12, MiniMax released M3, a 428B-parameter open-weight language model that scores 59% on SWE-Bench Pro and ships with a 1 million token context window — making it one of the strongest coding models you can actually run locally.
This matters. For months, the open-source coding leaderboard has been a battle between DeepSeek-Coder, Qwen-Coder, and Codestral. MiniMax just entered the ring with numbers that demand attention.
The Spec Sheet
| Metric | MiniMax M3 |
|---|---|
| Parameters | 428B (dense) |
| Context Window | 1,048,576 tokens |
| SWE-Bench Pro | 59.0% |
| SWE-Bench Verified | ~65% (estimated) |
| License | Open-weight (research/commercial) |
| Architecture | Dense transformer with grouped-query attention |
For comparison: Claude 3.7 Sonnet scores ~62% on SWE-Bench Pro. GPT-4o is around 45%. M3 at 59% puts it firmly in the top tier of coding models — and you can download the weights.
What 1M Context Actually Means for Developers
Most coding tasks fit in 8K-32K tokens. But the real power of a 1M context window isn't about fitting larger files — it's about keeping the entire codebase in working memory.
Here's the practical difference:
# With a 32K context model, you're constantly juggling:
cat src/main.py | head -n 500 # fit in context
# ... edit ...
cat src/utils.py | head -n 300 # swap in different file
# ... edit ...
cat tests/test_main.py | head -n 200 # swap again
# With 1M context, you do this once:
find . -name "*.py" -not -path "*/venv/*" | xargs cat > codebase.txt
# ~400K tokens of your entire project loaded
# The model sees imports, dependencies, patterns, conventions
The SWE-Bench benchmark is designed to test exactly this: can a model read a bug report, explore a real repository, find the relevant files, and produce a correct patch? At 59%, M3 is doing something right.
Running M3 Locally: The Math
A 428B dense model at FP16 needs ~856GB of VRAM. That's not consumer hardware territory — yet. But quantization changes everything:
| Quantization | VRAM Required | Expected Performance |
|---|---|---|
| FP16 | ~856 GB | Baseline (59% SWE-Bench Pro) |
| Q8_0 | ~428 GB | Near-baseline |
| Q4_K_M | ~214 GB | Good (likely 55-57%) |
| Q3_K_M | ~160 GB | Usable for inference |
| Q2_K | ~107 GB | Experimental |
At Q4_K_M, you're looking at ~214GB. That's dual A100 80GB with some offloading, or a single H100 80GB with aggressive quantization and CPU offloading via llama.cpp or vLLM.
# With vLLM + Q4_K_M quantization:
vllm serve "minimax/M3" \
--quantization awq \
--max-model-len 1048576 \
--tensor-parallel-size 4 \
--gpu-memory-utilization 0.95
# Or with llama.cpp for single-GPU with offloading:
./llama-server -m M3-Q4_K_M.gguf \
-c 1048576 \
-ngl 35 \
--host 0.0.0.0 \
--port 8080
The 1M context window is the headline, but the architecture supporting it is what makes it work. MiniMax uses grouped-query attention with a 128K pre-training phase and position interpolation to scale to 1M without catastrophic forgetting of short-context performance.
Practical Integration: OpenClaw + M3
For OpenClaw users on Linux, adding M3 as a coding backend is straightforward:
# Add to ~/.openclaw/.env or your agent config:
OLLAMA_URL=http://localhost:11434
# Or for vLLM backend:
VLLM_URL=http://localhost:8000/v1
# In your OpenClaw agent config, set coding model:
CODING_MODEL=minimax/M3
CONTEXT_LENGTH=1048576
The real power comes from agent workflows that can ingest entire repositories:
# OpenClaw tool script: repo_ingest.py
import os
from pathlib import Path
def ingest_repo(repo_path: str, max_tokens: int = 500000):
"""Load repository into context, respecting token budget."""
files = []
for path in Path(repo_path).rglob("*"):
if path.is_file() and path.stat().st_size < 100000:
# Skip binaries, venv, node_modules
if any(skip in str(path) for skip in ["venv", "node_modules", ".git"]):
continue
with open(path, "r", errors="ignore") as f:
content = f.read()
files.append(f"\n=== {path} ===\n{content}")
full = "\n".join(files)
# Rough token estimate: 1 token ≈ 4 chars for code
if len(full) // 4 > max_tokens:
print(f"Warning: Repo exceeds {max_tokens} tokens. Prioritize entry points.")
return full
With M3's 1M window, most mid-size repositories fit entirely in context. No more "let me grep for that function" — the model already sees the call graph.
The Bigger Picture: Open-Weight Coding Arms Race
MiniMax M3 isn't an isolated release. It's part of a pattern:
- June 2: Holo3.1 (H Company) — fast local computer-use agents with browser automation
- June 10: DiffusionGemma (Google) — 26B diffusion language model in vLLM, different architecture entirely
- June 12: MiniMax M3 — 428B open-weight coding specialist
The throughline: open-weight models are catching up to closed APIs on specialized tasks, and the gap is closing faster on coding than on general reasoning. This is structural — code is verifiable (it runs or it doesn't), and the training data is abundant (GitHub, Stack Overflow, package repositories).
For Linux developers running local AI, this means your self-hosted coding assistant is now competitive with Copilot, Cursor, and Claude Code — without sending your proprietary code to a third party.
Trade-offs and Honest Assessment
M3 isn't perfect. The honest limitations:
- VRAM requirements are real. At 428B parameters, this isn't running on your laptop. It's workstation or cloud GPU territory.
- SWE-Bench Pro ≠ real-world coding. The benchmark tests bug-fixing in open-source repos, not greenfield development, architecture decisions, or legacy code archaeology.
- Open-weight ≠ fully open. MiniMax releases weights but not training code or data. You can run it, fine-tune it with LoRA, but you can't replicate it from scratch.
But for the use case — local, private, high-quality code generation on hardware you control — M3 is now the model to beat.
The Terminal Verdict
If you're building an OpenClaw-based coding workflow on Linux, MiniMax M3 is worth the GPU investment. The 59% SWE-Bench Pro score puts it in the top 5 coding models globally, and the 1M context window changes how you structure agent memory.
The integration path is simple: serve via vLLM or llama.cpp, point your OpenClaw agent at it, and let the model see your whole codebase at once.
Threshold crossed: Open-weight models are no longer "good enough for privacy." They're becoming "good enough, period."
Published June 14, 2026. The Terminal covers OpenClaw on Linux, local LLMs, Google Workspace integration, and AI-powered developer productivity — three times weekly.