Mission
Make production AI feel boring. Reliable first tokens, honest billing, no lock-in. The fun should be the application, not the infrastructure.
Perchy started as a side-project to keep first-token latency flat through a launch spike — and grew into a routing layer that gives every active app a clear lane on a marketplace of GPU hosts. We're a small team in San Francisco that cares about correctness, billing transparency, and predictable behavior under load.
Make production AI feel boring. Reliable first tokens, honest billing, no lock-in. The fun should be the application, not the infrastructure.
A thin routing layer over many providers. Open weights and frontier proprietary models, side by side. Per-second metering so you never pay for idle.
Eight engineers and one ops generalist. Backgrounds in latency-sensitive trading, distributed systems, and developer tooling. Hiring on /careers.
Seed-stage with angels in the open-source AI community. We optimize for revenue, not vanity metrics.
We're hiring senior systems engineers and a designer who likes data-dense product surfaces. Or just say hi.