Beam is an ultrafast AI inference platform. We built a serverless runtime that launches GPU-backed containers in less than 1 second and quickly scales out to thousands of GPUs. Developers use our platform to serve apps to millions of users around the globe. We're backed by Y Combinator, Tiger Global, and prominent developer-tool founders, including the founder of Snyk and former CTO of GitHub.
Our team works in-person in New York City, but we welcome remote applicants who are exceptionally qualified.
About the Role
In this role, you'll optimize inference performance for a wide range of models running on our platform. You will minimize latency, maximize throughput, and continuously experiment to achieve industry-leading performance.
Your work will directly impact millions of users worldwide.
Skills & Experience
- Experience with the state-of-the-art inference stack (e.g., PyTorch, TensorRT, vLLM)
- Familiar with modern AI workflows, like ComfyUI and LoRA adaptors for fine-tuning
- Deep understanding of model compilation, quantization, and serving architectures
- Familiarity with GPU architectures and comfort in diving into kernel-level optimizations to resolve performance bottlenecks
- Experience programming with CUDA, Triton, or similar low-level accelerator frameworks
Benefits
- Work on challenging and impactful engineering problems
- Competitive salary and meaningful equity
- Join a fast-growing pre-Series A company at the ground floor
- Health, dental, and vision benefits with 90% coverage for employees and 50% for dependents
- Opportunities to participate in events across the cloud-native and AI communities
- Fitness stipend, learning budget, and much more