# Surf Inference

> OpenAI-compatible LLM inference API with x402 or MPP micropayments.

Base URL: https://inference.surf
Payment: USDC on Base, Solana, or Tempo via x402/MPP. No API keys needed.
Marketplace: https://surf.cascade.fyi

## Overview

POST /v1/chat/completions with { model, messages } for LLM inference. Available models: moonshotai/kimi-k2.5, minimax/minimax-m2.5, qwen/qwen-2.5-7b-instruct, anthropic/claude-sonnet-4.5, anthropic/claude-sonnet-4.6, anthropic/claude-opus-4.5, anthropic/claude-opus-4.6, minimax/minimax-m2.7, z-ai/glm-5, x-ai/grok-4.1-fast, x-ai/grok-4.20-beta, x-ai/grok-4.20-multi-agent-beta, x-ai/grok-4.1-fast:online, x-ai/grok-4.20-beta:online, x-ai/grok-4.20-multi-agent-beta:online. Supports SSE streaming with stream: true. Flat models charge a fixed price per request. Dynamic models charge per token. Accepts x402 USDC payment (Solana or Base) or MPP sessions (Tempo).

## Payment Flow

All paid endpoints return HTTP 402 with payment instructions on first request.
Use `npx x402-proxy` to handle payment automatically:

1. Install: `npm i -g x402-proxy` (or use `npx` directly)
2. Check wallet: `npx x402-proxy wallet`
3. Make requests - payment is handled transparently on 402 responses

Supported networks: Base (EVM), Solana, Tempo (MPP).
Force a specific network: `npx x402-proxy --network base ...` or `--network solana`.

## Try It

    npx x402-proxy -X POST -H "Content-Type: application/json" -d '{"model":"moonshotai/kimi-k2.5","messages":[{"role":"user","content":"Hello"}]}' https://inference.surf.cascade.fyi/v1/chat/completions

    npx x402-proxy -X POST -H "Content-Type: application/json" -d '{"model":"moonshotai/kimi-k2.5","messages":[{"role":"user","content":"Hello"}],"stream":true}' https://inference.surf.cascade.fyi/v1/chat/completions

## Endpoints

### POST /v1/chat/completions

LLM chat completion (streaming supported via SSE)

### Model Pricing

Pricing is per-request for flat models and per-token for dynamic models.
Dynamic models charge based on input tokens, output tokens, and cache hits.
Rates below are the final charged rates in USD per million tokens.

| Model | Type | Flat Price | Input $/M | Output $/M | Cache $/M |
| --- | --- | --- | --- | --- | --- |
| moonshotai/kimi-k2.5 | dynamic | - | $0.59 | $2.86 | $0.29 |
| minimax/minimax-m2.5 | dynamic | - | $0.26 | $1.52 | $0.13 |
| qwen/qwen-2.5-7b-instruct | flat | $0.001 | - | - | - |
| anthropic/claude-sonnet-4.5 | dynamic | - | $3.9 | $19.5 | $0.39 |
| anthropic/claude-sonnet-4.6 | dynamic | - | $3.9 | $19.5 | $0.39 |
| anthropic/claude-opus-4.5 | dynamic | - | $6.5 | $32.5 | $0.65 |
| anthropic/claude-opus-4.6 | dynamic | - | $6.5 | $32.5 | $0.65 |
| minimax/minimax-m2.7 | dynamic | - | $0.39 | $1.56 | $0.08 |
| z-ai/glm-5 | dynamic | - | $1.04 | $3.33 | $0.21 |
| x-ai/grok-4.1-fast | dynamic | - | $0.26 | $0.65 | $0.07 |
| x-ai/grok-4.20-beta | dynamic | - | $2.6 | $7.8 | $0.65 |
| x-ai/grok-4.20-multi-agent-beta | dynamic | - | $2.6 | $7.8 | $0.65 |
| x-ai/grok-4.1-fast:online | dynamic | - | $1.05 | $0.75 | $0.83 |
| x-ai/grok-4.20-beta:online | dynamic | - | $3.75 | $9 | $1.5 |
| x-ai/grok-4.20-multi-agent-beta:online | dynamic | - | $3.75 | $9 | $1.5 |

For flat models, the listed price is charged per request regardless of token count.
For dynamic models, the final price = input_tokens * inputRate + output_tokens * outputRate.
Cached input tokens are charged at the cache rate instead of the input rate, reducing cost for repeated prompts.

## Links

- [OpenAPI Spec](https://inference.surf/openapi.json)
- [API Reference](https://inference.surf/docs)
- [Surf Marketplace](https://surf.cascade.fyi)
- [x402 Protocol](https://x402.org)