# Surf Inference > OpenAI-compatible LLM inference API with x402 or MPP micropayments. Base URL: https://inference.surf Payment: USDC on Base, Solana, or Tempo via x402/MPP. No API keys needed. Marketplace: https://surf.cascade.fyi ## Overview POST /v1/chat/completions with { model, messages } for LLM inference. Available models: moonshotai/kimi-k2.5, minimax/minimax-m2.5, qwen/qwen-2.5-7b-instruct, anthropic/claude-sonnet-4.5, anthropic/claude-sonnet-4.6, anthropic/claude-opus-4.5, anthropic/claude-opus-4.6, minimax/minimax-m2.7, z-ai/glm-5, x-ai/grok-4.1-fast, x-ai/grok-4.20-beta, x-ai/grok-4.20-multi-agent-beta, x-ai/grok-4.1-fast:online, x-ai/grok-4.20-beta:online, x-ai/grok-4.20-multi-agent-beta:online. Supports SSE streaming with stream: true. Flat models charge a fixed price per request. Dynamic models charge per token. Accepts x402 USDC payment (Solana or Base) or MPP sessions (Tempo). ## Payment Flow All paid endpoints return HTTP 402 with payment instructions on first request. Use `npx x402-proxy` to handle payment automatically: 1. Install: `npm i -g x402-proxy` (or use `npx` directly) 2. Check wallet: `npx x402-proxy wallet` 3. Make requests - payment is handled transparently on 402 responses Supported networks: Base (EVM), Solana, Tempo (MPP). Force a specific network: `npx x402-proxy --network base ...` or `--network solana`. ## Try It npx x402-proxy -X POST -H "Content-Type: application/json" -d '{"model":"moonshotai/kimi-k2.5","messages":[{"role":"user","content":"Hello"}]}' https://inference.surf.cascade.fyi/v1/chat/completions npx x402-proxy -X POST -H "Content-Type: application/json" -d '{"model":"moonshotai/kimi-k2.5","messages":[{"role":"user","content":"Hello"}],"stream":true}' https://inference.surf.cascade.fyi/v1/chat/completions ## Endpoints ### POST /v1/chat/completions LLM chat completion (streaming supported via SSE) ### Model Pricing Pricing is per-request for flat models and per-token for dynamic models. Dynamic models charge based on input tokens, output tokens, and cache hits. Rates below are the final charged rates in USD per million tokens. | Model | Type | Flat Price | Input $/M | Output $/M | Cache $/M | | --- | --- | --- | --- | --- | --- | | moonshotai/kimi-k2.5 | dynamic | - | $0.59 | $2.86 | $0.29 | | minimax/minimax-m2.5 | dynamic | - | $0.26 | $1.52 | $0.13 | | qwen/qwen-2.5-7b-instruct | flat | $0.001 | - | - | - | | anthropic/claude-sonnet-4.5 | dynamic | - | $3.9 | $19.5 | $0.39 | | anthropic/claude-sonnet-4.6 | dynamic | - | $3.9 | $19.5 | $0.39 | | anthropic/claude-opus-4.5 | dynamic | - | $6.5 | $32.5 | $0.65 | | anthropic/claude-opus-4.6 | dynamic | - | $6.5 | $32.5 | $0.65 | | minimax/minimax-m2.7 | dynamic | - | $0.39 | $1.56 | $0.08 | | z-ai/glm-5 | dynamic | - | $1.04 | $3.33 | $0.21 | | x-ai/grok-4.1-fast | dynamic | - | $0.26 | $0.65 | $0.07 | | x-ai/grok-4.20-beta | dynamic | - | $2.6 | $7.8 | $0.65 | | x-ai/grok-4.20-multi-agent-beta | dynamic | - | $2.6 | $7.8 | $0.65 | | x-ai/grok-4.1-fast:online | dynamic | - | $1.05 | $0.75 | $0.83 | | x-ai/grok-4.20-beta:online | dynamic | - | $3.75 | $9 | $1.5 | | x-ai/grok-4.20-multi-agent-beta:online | dynamic | - | $3.75 | $9 | $1.5 | For flat models, the listed price is charged per request regardless of token count. For dynamic models, the final price = input_tokens * inputRate + output_tokens * outputRate. Cached input tokens are charged at the cache rate instead of the input rate, reducing cost for repeated prompts. ## Links - [OpenAPI Spec](https://inference.surf/openapi.json) - [API Reference](https://inference.surf/docs) - [Surf Marketplace](https://surf.cascade.fyi) - [x402 Protocol](https://x402.org)