
Cartesia
Low-latency voice AI models and a platform for real-time voice agents
Last reviewed 2026-06-19
Cartesia is a voice AI infrastructure company that builds real-time speech and transcription foundation models exposed as APIs and SDKs. Its differentiator is architectural: the founding team helped invent state space models (S4, Mamba), which enable ultra-low latency, long context, and efficient on-device operation. Its core models are Sonic (text-to-speech, marketed as the fastest and most emotive, across dozens of languages) and Ink (streaming speech-to-text with native turn detection), plus voice cloning, voice changing, and localization. Founded in 2023 out of the Stanford AI Lab by Karan Goel, Albert Gu, Arjun Desai, and Brandon Yang, Cartesia is based in San Francisco. In 2025 it moved up the stack with Line, a voice agent development platform with a Playground builder, CLI and GitHub deploy, and an SDK that plugs in third-party LLMs. It raised a reported $100M Series B in October 2025 (investors include Kleiner Perkins, Index, Lightspeed, and NVIDIA). Its models are assistant-level building blocks; Line agents are supervised, with observability and eval tooling rather than enforced human-in-the-loop.
What it can do
Synthesize speech (Sonic)
AssistantLow-latency, expressive text-to-speech via the Sonic model family across dozens of languages.
sourceTranscribe speech (Ink)
AssistantStreaming speech-to-text via the Ink model with native turn detection.
sourceClone and manage voices
AssistantInstant and pro voice cloning, voice changing, localization, and infill.
sourcePower real-time voice agents (Line)
SupervisedThe Line platform builds and deploys voice agents that run autonomously in-call, bounded by prompts and eval tooling rather than enforced human-in-the-loop.
source
Strengths
- +Genuinely differentiated state-space-model tech with best-in-class latency and on-device efficiency
- +Full stack (TTS, STT, cloning, and the Line agent platform) plus deep ecosystem integrations and self-hosted/VPC options
- +Strong technical credibility and capital, including NVIDIA backing
Limitations
- −Younger and less battle-tested than ElevenLabs and Deepgram; the Line agent platform is barely a year old
- −Closed, proprietary models (no open weights for production Sonic/Ink), creating lock-in
- −Two-axis credits-plus-prepaid-agents pricing is hard to forecast at scale
Overview
Cartesia is a voice AI infrastructure company that builds real-time speech and transcription foundation models exposed as APIs and SDKs, with an architectural edge from state space models (S4, Mamba).
What it does
Its core models are Sonic (low-latency, expressive TTS) and Ink (streaming STT with turn detection), plus voice cloning, voice changing, and localization. In 2025 it added Line, a voice agent development platform with a Playground builder, CLI and GitHub deploy, and an SDK that plugs in third-party LLMs.
Integrations & setup
Official LiveKit plugin plus Twilio, Pipecat, and Vapi, with Python and JS/TS SDKs and self-hosted on-prem or VPC deployment via Docker/Kubernetes.
Pricing
Credits-based freemium: a free tier (20K credits/mo), Pro at $5/mo, Startup at $49/mo, Scale at $299/mo, and a custom Enterprise tier, plus prepaid agent dollars.
Best for / not for
Best for developers and enterprises that need very low latency voice or on-device/VPC deployment. Less suited to buyers wanting the most battle-tested incumbent or open model weights.
Traction
Cartesia raised a reported $27M seed (Dec 2024), $64M Series A (Mar 2025), and $100M Series B (Oct 2025) from investors including Kleiner Perkins, Index, Lightspeed, and NVIDIA, for a reported ~$191M total.
Alternatives
Deepgram is the closest voice-model infrastructure competitor; ElevenLabs and Play.ai overlap on TTS and voice agents.
What people are saying
We aggregate real LinkedIn discussion into sentiment for the agents people search most. Cartesia isn't tracked yet, want it added? Request tracking.
FAQ
Is Cartesia an AI agent?+
Its speech models are infrastructure building blocks (assistant-level). Its Line platform builds voice agents that run autonomously in-call, bounded by prompts and eval tooling rather than enforced human-in-the-loop, so those are supervised-agent behaviors.
What makes Cartesia's models different?+
The founding team helped invent state space models (S4, Mamba), which give its Sonic and Ink models very low latency and efficient, on-device-capable operation rather than a Transformer-based design.
Sources
- Cartesia (official site) · accessed 2026-06-19
- Cartesia documentation · accessed 2026-06-19
- Cartesia pricing · accessed 2026-06-19
- Cartesia raises $64M Series A (Fortune) · accessed 2026-06-19
Last reviewed 2026-06-19