About

We're building steady AI infrastructure for the next million apps.

Perchy started as a side-project to keep first-token latency flat through a launch spike — and grew into a routing layer that gives every active app a clear lane on a marketplace of GPU hosts. We're a small team in San Francisco that cares about correctness, billing transparency, and predictable behavior under load.

Mission

Make production AI feel boring. Reliable first tokens, honest billing, no lock-in. The fun should be the application, not the infrastructure.

Approach

A thin routing layer over many providers. Open weights and frontier proprietary models, side by side. Per-second metering so you never pay for idle.

Team

Eight engineers and one ops generalist. Backgrounds in latency-sensitive trading, distributed systems, and developer tooling. Hiring on /careers.

Backers

Seed-stage with angels in the open-source AI community. We optimize for revenue, not vanity metrics.

Want to build with us?

We're hiring senior systems engineers and a designer who likes data-dense product surfaces. Or just say hi.

Get in touch Read the blog