Together AI

by Together Computer, Inc.

Cloud for running, fine-tuning, and serving open-source AI models

Agent PlatformAssistant

Last reviewed 2026-06-20

Together AI is an AI cloud platform for running open-source and open-weight models in production. It exposes hundreds of chat, reasoning, vision, image, audio, and video models through an OpenAI-compatible serverless inference API, and also offers dedicated endpoints, fine-tuning, batch inference, a code-execution sandbox, and on-demand GPU clusters (H100/H200/B200) for teams that want more control. The pitch is performance and cost: the company says its inference research lets it run open models faster and cheaper than naive serving, and it positions itself as 'The AI Native Cloud.' Together AI does not sell its own proprietary frontier model; it serves third-party open models (Llama, DeepSeek, Qwen, Mistral, MiniMax, FLUX, Whisper, and others) plus models you fine-tune or bring yourself. In this directory it is a developer platform and inference layer, not an agent: by default it returns model outputs to your application, which owns any agent orchestration. It supports tool/function calling, structured (JSON) outputs, and a Python code sandbox that developers use to build agent workflows on top of the API.

What it can do

Serverless inference for open-source models
Assistant
Runs hundreds of open and open-weight models (chat, reasoning, vision, image, audio, video) on demand with pay-per-token pricing and no infrastructure to manage, marketed for fast throughput from its in-house inference engine.
source
OpenAI-compatible API and SDKs
Assistant
Exposes inference through an OpenAI-compatible endpoint with Python and TypeScript SDKs, so existing OpenAI SDK code can be pointed at Together with minimal changes.
source
Fine-tuning on custom data
Assistant
Provides fine-tuning (including LoRA and full fine-tuning) of open models on user data to improve accuracy and control behavior, with per-token training pricing and hosting of the resulting model.
source
Dedicated endpoints and GPU clusters
Assistant
Launches dedicated single-model inference endpoints on reserved H100/H200/B200 hardware, and on-demand or reserved GPU clusters scaling to thousands of GPUs for training and custom workloads.
source
Tool calling, structured outputs, and code sandbox
Supervised
Supports function/tool calling and structured (JSON-schema) outputs, plus a sandbox to run Python safely alongside model calls. Together documents building agent workflows on top of these primitives; the orchestration and approvals live in the developer's application.
source
Batch inference for asynchronous workloads
Assistant
A batch inference API processes large-scale asynchronous jobs at lower cost than on-demand serverless calls, per the product pages.
source

Strengths

+Large catalog of open models across text, image, audio, and video
+OpenAI-compatible API makes migration nearly drop-in
+Full ladder from serverless to dedicated endpoints to raw GPU clusters
+Fine-tuning plus model hosting in one platform
+Built-in code sandbox and tool calling for agent backends

Limitations

−Serves open and bring-your-own models; no proprietary frontier model of its own
−It is an inference and compute layer, not an end-to-end agent: orchestration is on you
−Per-token and per-GPU-hour costs can add up at scale and require monitoring
−Model availability shifts as open-weight releases come and go

Overview

Together AI is an AI cloud founded in June 2022 by Vipul Ved Prakash (CEO), Ce Zhang (CTO), and co-founders including Percy Liang and Chris Re. It positions itself as "The AI Native Cloud": a full-stack platform for running, customizing, and scaling open-source and open-weight AI models in production. The product most developers touch is its serverless inference API, but the platform also spans fine-tuning, dedicated endpoints, batch jobs, a code sandbox, and raw GPU clusters.

Crucially, Together AI does not sell its own proprietary frontier model. It serves other organizations' open models (Llama, DeepSeek, Qwen, Mistral, MiniMax, FLUX, Whisper, and more) plus models you fine-tune or bring yourself, competing on inference performance, cost, and breadth of compute. So in this directory it is a platform / inference layer, not an agent. Its honest autonomy level is assistant: by default it returns model outputs to a request. It gains agentic surface (tool calling, structured outputs, the code sandbox) when a developer wires it into an application, but the orchestration and approvals stay in that application.

What it does

Together AI runs LLMs, reasoning models, vision models, and image, audio, and video models, with several deployment shapes per the product pages and docs:

Serverless inference on hundreds of open models, pay-per-token, marketed for high throughput from the company's in-house inference engine.
OpenAI-compatible API with Python and TypeScript SDKs, so existing OpenAI SDK code can be pointed at Together with minimal changes.
Fine-tuning (including LoRA and full fine-tuning) of open models on your data, with hosting of the resulting model.
Dedicated endpoints and GPU clusters: reserved single-model endpoints on H100/H200/B200 hardware, plus on-demand or reserved GPU clusters scaling to thousands of GPUs.
Tool/function calling, structured (JSON-schema) outputs, and a code sandbox to run Python safely alongside model calls, the primitives Together documents for building agent workflows.
Batch inference for large asynchronous jobs at lower cost than on-demand serverless.

Integrations & setup

Because the API is OpenAI-compatible, integration is close to a drop-in: swap the base URL and key in an OpenAI SDK call, or use ecosystem libraries such as LangChain, LlamaIndex, and the Vercel AI SDK. Models can also be pulled from the open ecosystem (for example Hugging Face) and served or fine-tuned on the platform. For teams that need consistent performance and control, Together offers dedicated endpoints and full GPU clusters rather than only shared serverless capacity.

Pricing

Together AI uses usage-based pricing. Reported figures from its pricing page (which change over time):

Serverless inference: per million tokens, with input pricing roughly from $0.03 and output up to several dollars depending on model size (for example, a small model versus DeepSeek-class models).
Image, video, and audio: per-image (cents and up), per-video, and per-minute transcription rates (for example Whisper Large v3 around $0.0015 per minute).
Fine-tuning: per million training tokens, scaling with model size, plus premium rates for specialized models.
Dedicated endpoints: roughly $6-12 per GPU-hour for single H100/H200/B200 instances.
GPU clusters: on-demand roughly $4.79-$8.19 per GPU-hour, reserved from about $3.29 per GPU-hour depending on commitment.

The page states teams can start for free and scale on demand. Check https://www.together.ai/pricing for current rates.

Best for / not for

Best for developers and teams building on open models who want one platform that covers fast serverless inference, fine-tuning, dedicated endpoints, and GPU compute behind a familiar OpenAI-style API, plus tool calling and a code sandbox for agent backends.

Not for teams that need a managed, end-to-end agent out of the box (Together gives you the engine and the compute, not the autopilot), or that specifically require a proprietary frontier model rather than open weights.

Alternatives

The closest comparisons are other open-model inference and AI-cloud providers: Fireworks AI, Groq (custom-silicon inference), OpenRouter (a routing aggregator), Replicate, and Hugging Face's hosted inference. Against general-purpose API providers like OpenAI, the tradeoff is open models, fine-tuning control, and raw GPU access versus a proprietary frontier model.

What people are saying

We aggregate real LinkedIn discussion into sentiment for the agents people search most. Together AI isn't tracked yet, want it added? Request tracking.

FAQ

Does Together AI make its own AI models?+

Together AI primarily serves third-party open and open-weight models (Llama, DeepSeek, Qwen, Mistral, MiniMax, FLUX, Whisper, and others) plus models you fine-tune or bring yourself. It has contributed to open research and datasets, but it competes mainly on inference speed, fine-tuning, and GPU compute rather than a proprietary frontier model.

Is Together AI an AI agent?+

Not on its own. Together AI is a developer platform and inference layer. It provides the model serving, tool/function calling, structured outputs, and a code sandbox that developers use to build agents. The agent logic, orchestration, and human approvals live in your application.

Is the Together AI API OpenAI-compatible?+

Yes. Together AI offers an OpenAI-compatible API with Python and TypeScript SDKs, so code written for the OpenAI SDK can usually be pointed at Together by changing the base URL and API key.

How is Together AI priced?+

Usage-based. Serverless inference is per million tokens (input and output), with separate per-image, per-video, and per-minute rates for media and audio models. Fine-tuning is priced per million training tokens, and dedicated endpoints and GPU clusters are billed per GPU-hour. The page states teams can start for free and scale on demand; see https://www.together.ai/pricing for current rates.

Can I fine-tune and deploy custom models on Together AI?+

Yes. Together AI supports fine-tuning open models (including LoRA and full fine-tuning) on your data, then hosting the resulting model for inference via serverless or dedicated endpoints.

Sources

Together AI homepage · accessed 2026-06-20
Together AI serverless inference · accessed 2026-06-20
Together AI dedicated model inference · accessed 2026-06-20
Together AI pricing · accessed 2026-06-20
Together AI documentation introduction · accessed 2026-06-20
Together AI announces $305M Series B (company blog) · accessed 2026-06-20
Together AI raises $305M Series B (PR Newswire) · accessed 2026-06-20

Last reviewed 2026-06-20

Alternatives & related

Fireworks AI

Fast inference and fine-tuning platform for open-source AI models

Groq

Fast, low-cost LLM inference on custom LPU silicon via GroqCloud

OpenRouter

One OpenAI-compatible API for 400+ LLMs across 70+ providers, with routing and fallbacks

Replicate

Run and fine-tune open-source AI models with a cloud API, billed per second

Hugging Face

Open-source AI platform: model hub, datasets, inference, and the smolagents framework

Mistral AI

European AI lab: open and proprietary LLMs, the Vibe assistant, and the La Plateforme API

What it can do

Serverless inference for open-source models

OpenAI-compatible API and SDKs

Fine-tuning on custom data

Dedicated endpoints and GPU clusters

Tool calling, structured outputs, and code sandbox

Batch inference for asynchronous workloads