Senior Platform Engineer, Voice AI
Company: Together AI
Location: San Francisco
Posted on: April 2, 2026
|
|
|
Job Description:
About the Role Together AI is building the best inference
infrastructure for voice applications. Our Voice AI platform powers
production-grade, real-time voice agents and applications — serving
speech-to-text and text-to-speech models with best-in-class latency
and reliability. We're looking for a Senior Platform Engineer to
own the API and infrastructure layer for voice workloads. You'll
build the real-time WebSocket and HTTP APIs that developers use to
ship voice experiences, design autoscaling for latency-sensitive
streaming workloads, and ensure our multi-provider voice platform
is reliable enough for production voice agents handling millions of
calls. This is a foundational hire on a small, high-impact team.
Voice APIs have fundamentally different infrastructure requirements
than text-based inference — bidirectional audio streaming, stateful
connections, tight latency SLOs, and complex multi-model routing.
You'll define how developers interact with Together's voice
platform as we grow from early customers to the default
infrastructure for voice AI. Own the real-time API layer (WebSocket
HTTP streaming) that powers Together's voice platform. Design
autoscaling and orchestration for voice workloads running on tens
of thousands of GPUs. Build the developer experience — APIs,
observability, and tooling — for a fast-growing product area. Work
with production voice customers (contact centers, AI agents,
communication platforms) to ship what they actually need. Join a
small, early-stage team with outsized impact on a new product line.
Responsibilities Build and harden real-time WebSocket and HTTP
streaming APIs for STT and TTS — including connection lifecycle
management, backpressure, error handling, and reconnection, at the
reliability bar needed for production voice agents. Design and ship
autoscaling for voice model endpoints that handles bursty,
real-time traffic patterns — accounting for concurrent connection
limits, streaming state, and hard latency ceilings. Implement
voice-specific API features: word-level alignment, speaker
diarization in realtime, audio format flexibility (g711/mulaw for
telephony, PCM, WebRTC formats), pronunciation controls, and
multi-context WebSocket support. Build voice-specific observability
— latency breakdowns, audio quality signals, and dashboards that
help both the team and customers debug issues. Own multi-model
normalization across our model partners (Cartesia, Deepgram, Rime,
and others), ensuring consistent API behavior regardless of the
underlying provider. Collaborate with the ML engineering side of
the team on the interface between the API layer and the model
serving stack, ensuring latency and reliability requirements are
met end-to-end. Contribute to developer experience — API design,
documentation, integration cookbooks, playground and showcasing how
best-in-class voice agents are built. Lay the groundwork for
multiple new products down the line. Requirements 5 years of
experience building large-scale, real-time distributed systems and
API services. Deep expertise in real-time streaming infrastructure
— WebSocket server architecture, Server-Sent Events, bidirectional
streaming, connection multiplexing, and stateful protocol design.
Expert-level programming in TypeScript and Python; experience with
Rust is a plus. Strong distributed systems fundamentals: load
balancing, autoscaling, rate limiting, and traffic shaping for
latency-sensitive workloads. Experience with Kubernetes — including
custom autoscalers, resource management, and health checking for
stateful services. Strong product sense — you care about API
ergonomics and think about what developers building voice apps
actually need. Comfort working on a small, early-stage team where
you'll wear multiple hats and move fast. Experience with audio or
media protocols (WebRTC, g711, PCM encoding) is a strong plus.
Familiarity with ML model serving infrastructure and how inference
engines work is a plus — you'll interface with the serving layer
regularly. Full-stack experience (React, Next.js) is a nice-to-have
for contributing to developer-facing tooling. Bachelor's or
Master's degree in Computer Science, Computer Engineering, or
related field, or equivalent practical experience. About Together
AI Together AI is a research-driven artificial intelligence
company. We believe open and transparent AI systems will drive
innovation and create the best outcomes for society, and together
we are on a mission to significantly lower the cost of modern AI
systems by co-designing software, hardware, algorithms, and models.
We have contributed to leading open-source research, models, and
datasets to advance the frontier of AI, and our team has been
behind technological advancement such as FlashAttention, Hyena,
FlexGen, and RedPajama. We invite you to join a passionate group of
researchers and engineers in our journey in building the next
generation AI infrastructure. Compensation We offer competitive
compensation, startup equity, health insurance and other
competitive benefits. The US base salary range for this full-time
position is: $200,000 - $260,000 equity benefits. Our salary ranges
are determined by location, level and role. Individual compensation
will be determined by experience, skills, and job-related
knowledge. Equal Opportunity Together AI is an Equal Opportunity
Employer and is proud to offer equal employment opportunity to
everyone regardless of race, color, ancestry, religion, sex,
national origin, sexual orientation, age, citizenship, marital
status, disability, gender identity, veteran status, and more.
Please see our privacy policy at
https://www.together.ai/privacy
Keywords: Together AI, Sacramento , Senior Platform Engineer, Voice AI, IT / Software / Systems , San Francisco, California