Perchy
Production AI without traffic panic

Find your perch.
Steady AI under any load.

Perchy gives every active app a clear temporary lane on hosts running below their congestion point — so first-token latency stays flat while the open API spirals.

Get started
SOC-grade billing OpenAI-compatible HTTP API Bring your own compute

Live latency simulator

Drag traffic up. Watch a shared API degrade while Perchy holds your lane.

slowsmoothtraffic →
Shared APIPerchy lane
Shared API1101ms69 tok/s stream
With Perchy191ms90 tok/s stream
Difference910mssaved on first tokens
lane stayed clear
fresh position found
meter paused
First token p95 under load Predictable streaming throughput Pay only while present Marketplace of spare GPUs
How it works

Built for production AI workloads.

Three simple primitives. Every active app gets one clear position; the meter pauses when you go quiet; hosts earn when their idle GPU serves a position.

Clear-lane scheduling

Capacity is sold up to the calm operating range of each host. Your first tokens land quickly even when traffic surges elsewhere.

Pay for presence

Per-second metering. Stop sending and the lane returns to the market — your bill pauses with it.

Marketplace of compute

Connect a spare GPU outbound — no public address. Earn while your machine has capacity, reclaim it whenever you need it.

Direct HTTP API

Familiar shape. Different guarantees.

Point your existing OpenAI-compatible client at api.perchy.ai/v1 — no SDK install, no proprietary client. The optional lane field tells us how long to hold your position.

shell — api.perchy.ai
# No SDK required — call the HTTP API directly.
curl https://api.perchy.ai/v1/chat/completions \
  -H "Authorization: Bearer $PERCHY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "stream": true,
    "lane": { "mode": "clear", "idle_timeout_ms": 3000 },
    "messages": [
      { "role": "user", "content": "Write the launch email." }
    ]
  }'
Live network · qwen/qwen3.6-27b

3 hosts online · all lanes occupied.

Real-time view of the Qwen 3.6 27B fleet running on RTX PRO 6000 (Blackwell, 96 GB). New requests queue and auto-route the moment a lane releases.

All lanes busy
9/9lanes occupied
Busyhost-iad-2
RTX PRO 600096 GB · Blackwell
us-east-1· Ashburn, VA
qwen/qwen3.6-27b
Lane occupancy3/3
TTFT184ms
Stream127 tok/s
Held4m 12s
KV cache
71%
Busyhost-fra-7
RTX PRO 600096 GB · Blackwell
eu-central-1· Frankfurt
qwen/qwen3.6-27b
Lane occupancy3/3
TTFT196ms
Stream119 tok/s
Held2m 18s
KV cache
64%
Busyhost-nrt-4
RTX PRO 600096 GB · Blackwell
ap-northeast-1· Tokyo
qwen/qwen3.6-27b
Lane occupancy3/3
TTFT211ms
Stream124 tok/s
Held1m 29s
KV cache
58%

Shared API today

A single shared lane that becomes a queue under load. First tokens land at 1101ms in the current scenario.

Perchy lane

Reserved while active, released after idle. First tokens at 191ms under the same load — and you only pay for the seconds you held.

Host earnings

Your simulated fleet of 4 spare GPUs could earn $6,121.27/hr at this load.

Apps request
Perchy matches
Clear position
Spare GPU earns

Ready to ship steady AI?

Free dev tier. Embedded Stripe payments for plans and usage. No card required to explore.