Cartesia

Low-latency voice AI models and a platform for real-time voice agents

Agent PlatformAssistant

Last reviewed 2026-06-19

Cartesia is a voice AI infrastructure company that builds real-time speech and transcription foundation models exposed as APIs and SDKs. Its differentiator is architectural: the founding team helped invent state space models (S4, Mamba), which enable ultra-low latency, long context, and efficient on-device operation. Its core models are Sonic (text-to-speech, marketed as the fastest and most emotive, across dozens of languages) and Ink (streaming speech-to-text with native turn detection), plus voice cloning, voice changing, and localization. Founded in 2023 out of the Stanford AI Lab by Karan Goel, Albert Gu, Arjun Desai, and Brandon Yang, Cartesia is based in San Francisco. In 2025 it moved up the stack with Line, a voice agent development platform with a Playground builder, CLI and GitHub deploy, and an SDK that plugs in third-party LLMs. It raised a reported $100M Series B in October 2025 (investors include Kleiner Perkins, Index, Lightspeed, and NVIDIA). Its models are assistant-level building blocks; Line agents are supervised, with observability and eval tooling rather than enforced human-in-the-loop.

What it can do

Synthesize speech (Sonic)
Assistant
Low-latency, expressive text-to-speech via the Sonic model family across dozens of languages.
source
Transcribe speech (Ink)
Assistant
Streaming speech-to-text via the Ink model with native turn detection.
source
Clone and manage voices
Assistant
Instant and pro voice cloning, voice changing, localization, and infill.
source
Power real-time voice agents (Line)
Supervised
The Line platform builds and deploys voice agents that run autonomously in-call, bounded by prompts and eval tooling rather than enforced human-in-the-loop.
source

Strengths

+Genuinely differentiated state-space-model tech with best-in-class latency and on-device efficiency
+Full stack (TTS, STT, cloning, and the Line agent platform) plus deep ecosystem integrations and self-hosted/VPC options
+Strong technical credibility and capital, including NVIDIA backing

Limitations

−Younger and less battle-tested than ElevenLabs and Deepgram; the Line agent platform is barely a year old
−Closed, proprietary models (no open weights for production Sonic/Ink), creating lock-in
−Two-axis credits-plus-prepaid-agents pricing is hard to forecast at scale

Overview

Cartesia is a voice AI infrastructure company that builds real-time speech and transcription foundation models exposed as APIs and SDKs, with an architectural edge from state space models (S4, Mamba).

What it does

Its core models are Sonic (low-latency, expressive TTS) and Ink (streaming STT with turn detection), plus voice cloning, voice changing, and localization. In 2025 it added Line, a voice agent development platform with a Playground builder, CLI and GitHub deploy, and an SDK that plugs in third-party LLMs.

Integrations & setup

Official LiveKit plugin plus Twilio, Pipecat, and Vapi, with Python and JS/TS SDKs and self-hosted on-prem or VPC deployment via Docker/Kubernetes.

Pricing

Credits-based freemium: a free tier (20K credits/mo), Pro at $5/mo, Startup at $49/mo, Scale at $299/mo, and a custom Enterprise tier, plus prepaid agent dollars.

Best for / not for

Best for developers and enterprises that need very low latency voice or on-device/VPC deployment. Less suited to buyers wanting the most battle-tested incumbent or open model weights.

Traction

Cartesia raised a reported $27M seed (Dec 2024), $64M Series A (Mar 2025), and $100M Series B (Oct 2025) from investors including Kleiner Perkins, Index, Lightspeed, and NVIDIA, for a reported ~$191M total.

Alternatives

Deepgram is the closest voice-model infrastructure competitor; ElevenLabs and Play.ai overlap on TTS and voice agents.

What people are saying

We aggregate real LinkedIn discussion into sentiment for the agents people search most. Cartesia isn't tracked yet, want it added? Request tracking.

FAQ

Is Cartesia an AI agent?+

Its speech models are infrastructure building blocks (assistant-level). Its Line platform builds voice agents that run autonomously in-call, bounded by prompts and eval tooling rather than enforced human-in-the-loop, so those are supervised-agent behaviors.

What makes Cartesia's models different?+

The founding team helped invent state space models (S4, Mamba), which give its Sonic and Ink models very low latency and efficient, on-device-capable operation rather than a Transformer-based design.

Sources

Cartesia (official site) · accessed 2026-06-19
Cartesia documentation · accessed 2026-06-19
Cartesia pricing · accessed 2026-06-19
Cartesia raises $64M Series A (Fortune) · accessed 2026-06-19

Last reviewed 2026-06-19

Alternatives & related

Deepgram

Voice AI infrastructure: speech-to-text, text-to-speech, and a voice agent API

ElevenLabs Agents

Platform for building real-time voice and chat AI agents

PlayAI

Voice AI platform for human-like speech, voice cloning, and voice agents

Vapi

Developer platform for voice AI agents that handle phone calls