Clear-lane scheduling
Capacity is sold up to the calm operating range of each host. Your first tokens land quickly even when traffic surges elsewhere.
Perchy gives every active app a clear temporary lane on hosts running below their congestion point — so first-token latency stays flat while the open API spirals.
Drag traffic up. Watch a shared API degrade while Perchy holds your lane.
Three simple primitives. Every active app gets one clear position; the meter pauses when you go quiet; hosts earn when their idle GPU serves a position.
Capacity is sold up to the calm operating range of each host. Your first tokens land quickly even when traffic surges elsewhere.
Per-second metering. Stop sending and the lane returns to the market — your bill pauses with it.
Connect a spare GPU outbound — no public address. Earn while your machine has capacity, reclaim it whenever you need it.
Point your existing OpenAI-compatible client at api.perchy.ai/v1 — no SDK install, no proprietary client. The optional lane field tells us how long to hold your position.
# No SDK required — call the HTTP API directly.
curl https://api.perchy.ai/v1/chat/completions \
-H "Authorization: Bearer $PERCHY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-6",
"stream": true,
"lane": { "mode": "clear", "idle_timeout_ms": 3000 },
"messages": [
{ "role": "user", "content": "Write the launch email." }
]
}'Real-time view of the Qwen 3.6 27B fleet running on RTX PRO 6000 (Blackwell, 96 GB). New requests queue and auto-route the moment a lane releases.
qwen/qwen3.6-27bqwen/qwen3.6-27bqwen/qwen3.6-27bA single shared lane that becomes a queue under load. First tokens land at 1101ms in the current scenario.
Reserved while active, released after idle. First tokens at 191ms under the same load — and you only pay for the seconds you held.
Your simulated fleet of 4 spare GPUs could earn $6,121.27/hr at this load.
Free dev tier. Embedded Stripe payments for plans and usage. No card required to explore.