Production AI without traffic panic

Find your perch.
Steady AI under any load.

Perchy gives every active app a clear temporary lane on hosts running below their congestion point — so first-token latency stays flat while the open API spirals.

Get started

SOC-grade billing OpenAI-compatible HTTP API Bring your own compute

Live latency simulator

Drag traffic up. Watch a shared API degrade while Perchy holds your lane.

Shared APIPerchy lane

Shared API1101ms69 tok/s stream

With Perchy191ms90 tok/s stream

Difference910mssaved on first tokens

lane stayed clear

fresh position found

meter paused

First token p95 under load Predictable streaming throughput Pay only while present Marketplace of spare GPUs

People online36Idle hold3sSpare machines4x

How it works

Built for production AI workloads.

Three simple primitives. Every active app gets one clear position; the meter pauses when you go quiet; hosts earn when their idle GPU serves a position.

Clear-lane scheduling

Capacity is sold up to the calm operating range of each host. Your first tokens land quickly even when traffic surges elsewhere.

Pay for presence

Per-second metering. Stop sending and the lane returns to the market — your bill pauses with it.

Marketplace of compute

Connect a spare GPU outbound — no public address. Earn while your machine has capacity, reclaim it whenever you need it.

Direct HTTP API

Familiar shape. Different guarantees.

Point your existing OpenAI-compatible client at api.perchy.ai/v1 — no SDK install, no proprietary client. The optional lane field tells us how long to hold your position.

shell — api.perchy.ai

# No SDK required — call the HTTP API directly.
curl https://api.perchy.ai/v1/chat/completions \
  -H "Authorization: Bearer $PERCHY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "stream": true,
    "lane": { "mode": "clear", "idle_timeout_ms": 3000 },
    "messages": [
      { "role": "user", "content": "Write the launch email." }
    ]
  }'

Live network · qwen/qwen3.6-27b

3 hosts online · all lanes occupied.

Real-time view of the Qwen 3.6 27B fleet running on RTX PRO 6000 (Blackwell, 96 GB). New requests queue and auto-route the moment a lane releases.

All lanes busy

9/9lanes occupied

Busyhost-iad-2

RTX PRO 600096 GB · Blackwell

us-east-1· Ashburn, VA

qwen/qwen3.6-27b

Lane occupancy3/3

TTFT184ms

Stream127 tok/s

Held4m 12s

KV cache

71%

Busyhost-fra-7

RTX PRO 600096 GB · Blackwell

eu-central-1· Frankfurt

qwen/qwen3.6-27b

Lane occupancy3/3

TTFT196ms

Stream119 tok/s

Held2m 18s

KV cache

64%

Busyhost-nrt-4

RTX PRO 600096 GB · Blackwell

ap-northeast-1· Tokyo

qwen/qwen3.6-27b

Lane occupancy3/3

TTFT211ms

Stream124 tok/s

Held1m 29s

KV cache

58%

Shared API today

A single shared lane that becomes a queue under load. First tokens land at 1101ms in the current scenario.

Perchy lane

Reserved while active, released after idle. First tokens at 191ms under the same load — and you only pay for the seconds you held.

Host earnings

Your simulated fleet of 4 spare GPUs could earn $6,121.27/hr at this load.

Apps request

Perchy matches

Clear position

Spare GPU earns

Ready to ship steady AI?

Free dev tier. Embedded Stripe payments for plans and usage. No card required to explore.

Open the console See pricing

Find your perch.Steady AI under any load.

Live latency simulator

Built for production AI workloads.

Clear-lane scheduling

Pay for presence

Marketplace of compute

Familiar shape. Different guarantees.

3 hosts online · all lanes occupied.

Shared API today

Perchy lane

Host earnings

Ready to ship steady AI?

Find your perch.
Steady AI under any load.