Groq

by Groq, Inc.

Fast, low-cost LLM inference on custom LPU silicon via GroqCloud

Agent PlatformAssistant

Last reviewed 2026-06-20

Groq is an AI inference company that runs open-weight large language models and speech models on its own custom silicon, the LPU (Language Processing Unit), a chip purpose-built for inference rather than training. Its main product, GroqCloud, exposes that hardware as an OpenAI-compatible API and console so developers can call models like Llama, Qwen, GPT-OSS, Kimi, and Whisper at high token-throughput and per-token prices. Groq does not build its own foundation models; it serves other organizations' open models on LPU hardware, competing on inference speed and cost rather than model quality. It targets developers and teams building latency-sensitive applications (chat, voice, agents) and also offers on-premises hardware (GroqRack / LPX) for regulated or air-gapped deployments. In December 2025 Groq entered a non-exclusive technology licensing agreement with Nvidia; the company reportedly continues to operate GroqCloud independently.

What it can do

High-throughput LLM inference on LPU hardware
Assistant
Serves open-weight models (Llama, Qwen, GPT-OSS, Kimi, and others) on Groq's custom LPU chips, marketed for fast token throughput and low, linear per-token pricing.
source
OpenAI-compatible API
Assistant
Exposes inference through an OpenAI-compatible endpoint (base URL https://api.groq.com/openai/v1) so existing OpenAI SDK code can be pointed at Groq with minimal changes.
source
Tool use, function calling, and structured outputs
Copilot
Supports function calling, structured (JSON) outputs, and Groq built-in tools such as web search, website visits, code execution, and Wolfram Alpha, per the docs.
source
Speech-to-text and text-to-speech
Assistant
Runs Whisper-family speech recognition and text-to-speech models (including Orpheus) for real-time voice and transcription workloads.
source
Compound (agentic) systems with remote tools and MCP
Supervised
Offers a Compound agentic feature plus remote tools and MCP support (including Google Workspace connectors for Gmail, Calendar, and Drive) so models can call external tools during inference. The platform provides the inference layer; orchestration and approvals are left to the developer's application.
source
Batch API for asynchronous workloads
Assistant
A Batch API processes large-scale asynchronous jobs at a reported 50% lower cost than on-demand inference.
source

Strengths

+Marketed for very fast inference at low, linear per-token pricing
+OpenAI-compatible API makes migration nearly drop-in
+Free tier plus on-demand, batch, and on-prem (GroqRack/LPX) options
+Broad open-model catalog (Llama, Qwen, GPT-OSS, Kimi, Whisper)

Limitations

−Serves open models only; no proprietary frontier models of its own
−It is an inference layer, not an end-to-end agent: orchestration is on you
−Model availability changes as open-weight releases come and go
−Future direction uncertain after the December 2025 Nvidia licensing deal

Overview

Groq is an AI inference company founded in 2016 by Jonathan Ross. Its core bet is custom silicon: the LPU (Language Processing Unit), described by the company as the first chip purpose-built for inference rather than training. The product most developers touch is GroqCloud, which exposes that hardware as an OpenAI-compatible API and web console. Groq's positioning is blunt: "fast, low cost inference."

Crucially, Groq does not build its own foundation models. It serves other people's open-weight models (Llama, Qwen, GPT-OSS, Kimi, Mistral, Whisper, and more) on LPU hardware, competing on speed and price. So in this directory it is a platform / inference layer, not an agent. Its honest autonomy level is assistant: by default it responds to a request and returns tokens. It gains agentic surface (tool use, MCP, the Compound feature) when a developer wires it into an application, but the orchestration and approvals stay in that application.

What it does

GroqCloud runs LLM, speech-to-text, text-to-speech, and image-to-text models on LPU chips. Key capabilities per the docs and pricing page:

High-throughput inference on open models, marketed for fast token output and low, linear per-token cost.
OpenAI-compatible API at https://api.groq.com/openai/v1, so existing OpenAI SDK code can be pointed at Groq with minimal changes.
Function calling, structured outputs, and built-in tools (web search, website visits, code execution, Wolfram Alpha).
Speech models: Whisper-family transcription and text-to-speech (including Orpheus) for real-time voice.
Compound (agentic) systems plus remote tools and MCP, including Google Workspace connectors (Gmail, Calendar, Drive).
Batch API for asynchronous jobs at a reported 50% discount versus on-demand.

Groq reports serving roughly 3 million developers and teams, with named customers including Dropbox, Vercel, Canva, Robinhood, and the McLaren F1 Team (per its homepage).

Integrations & setup

Because the API is OpenAI-compatible, integration is close to a drop-in: swap the base URL and key in an OpenAI SDK call, or use ecosystem libraries like LangChain and the Vercel AI SDK. The platform supports MCP and remote tools (with Google Workspace connectors) for tool-augmented inference. For regulated or air-gapped environments, Groq offers on-premises hardware (GroqRack, and the LPX rack introduced with Nvidia at GTC 2026).

Pricing

Groq uses usage-based, tokens-as-a-service pricing it describes as "linear and predictable" with no idle-infrastructure fees. Reported figures from its pricing page:

Free tier to start (free API key).
On-demand per-token pricing, e.g. Llama 3.1 8B at about $0.05 input / $0.08 output per million tokens, up to Llama 3.3 70B at about $0.59 / $0.79.
Batch API at a reported ~50% discount for asynchronous workloads.
Speech: text-to-speech around $22-40 per million characters; speech recognition around $0.04-0.11 per hour transcribed.
Enterprise / on-prem custom pricing.

Exact prices and the model catalog change over time; check https://groq.com/pricing.

Best for / not for

Best for developers building latency-sensitive apps (chat, voice, agent backends) who want fast, cheap inference on open models behind a familiar OpenAI-style API, and teams that need an on-prem inference option.

Not for teams that need a managed, end-to-end agent out of the box (Groq gives you the engine, not the autopilot), or that require a proprietary frontier model. Buyers should also weigh the uncertainty introduced by the December 2025 Nvidia licensing deal, which is under regulatory scrutiny even though GroqCloud reportedly continues operating independently.

Alternatives

The closest comparisons are other fast / open-model inference providers: Together AI, Fireworks AI, Cerebras (another custom-silicon inference play), OpenRouter (a routing aggregator), and Replicate. Against general-purpose API providers like OpenAI, the tradeoff is open models and inference speed versus proprietary frontier models.

What people are saying

We aggregate real LinkedIn discussion into sentiment for the agents people search most. Groq isn't tracked yet, want it added? Request tracking.

FAQ

Does Groq make its own AI models?+

No. Groq runs open-weight models from others (such as Llama, Qwen, GPT-OSS, Kimi, and Whisper) on its custom LPU hardware. It competes on inference speed and cost, not on proprietary model quality.

Is Groq an AI agent?+

Not on its own. Groq is an inference platform: it provides the fast model-serving layer (plus function calling, MCP, and a Compound agentic feature) that developers use to build agents. The agent logic, orchestration, and human approvals live in your application.

Is Groq the same as Grok?+

No. Groq is the LPU-based inference company founded in 2016. Grok is xAI's chatbot and model family. The names are unrelated.

Does Groq have a free tier?+

Yes. GroqCloud offers a free API tier to get started, then on-demand pay-as-you-go per-token pricing, a discounted Batch API, and enterprise/on-prem options, per its pricing page.

What happened with Nvidia and Groq?+

In December 2025 Groq entered a non-exclusive technology licensing agreement with Nvidia (reported around a $20B value). Per the announcement, GroqCloud continues to operate without interruption as a standalone company. The deal has drawn regulatory scrutiny.

Sources

Groq homepage · accessed 2026-06-20
GroqCloud product page · accessed 2026-06-20
Groq pricing · accessed 2026-06-20
GroqCloud documentation overview · accessed 2026-06-20
Groq raises $750M at $6.9B valuation (Groq newsroom) · accessed 2026-06-20
Nvidia AI chip challenger Groq raises $750M, hits $6.9B valuation (TechCrunch) · accessed 2026-06-20
Groq and Nvidia enter non-exclusive inference licensing agreement (Groq newsroom) · accessed 2026-06-20

Last reviewed 2026-06-20

Alternatives & related

Together AI

Cloud for running, fine-tuning, and serving open-source AI models

Fireworks AI

Fast inference and fine-tuning platform for open-source AI models

OpenRouter

One OpenAI-compatible API for 400+ LLMs across 70+ providers, with routing and fallbacks

Replicate

Run and fine-tune open-source AI models with a cloud API, billed per second

Vercel AI SDK

Open-source TypeScript toolkit for building AI apps and agents

LangChain

Open-source framework and platform for building and deploying LLM agents

What it can do

High-throughput LLM inference on LPU hardware

OpenAI-compatible API

Tool use, function calling, and structured outputs

Speech-to-text and text-to-speech

Compound (agentic) systems with remote tools and MCP

Batch API for asynchronous workloads