Ollama Turbo

TurboPreview

Supercharge models with faster hardware
$20/mo
Upgrade to TurboTurbo lets you
-
Speed up model inference
Run models using datacenter-grade hardware, returning responses much faster.
-
Run larger models
Upgrade to the newest hardware, making it possible to run larger models.
-
Privacy first
Ollama does not retain your data to ensure privacy and security.
-
Save battery life
Take the load of running models off your Mac, Windows or Linux computer, giving you performance back for your other apps.
Frequently asked questions
-
What is Turbo?
Turbo is a new way to run open models using datacenter-grade hardware. Many new models are too large to fit on widely available GPUs, or run very slowly. Ollama Turbo provides a way to run these models fast while using Ollama's App, CLI, and API.
-
Which models are available in Turbo?
While in preview, the
gpt-oss-20b
andgpt-oss-120b
models are available. -
Does Turbo work with Ollama's CLI?
Yes, Ollama's CLI works with Turbo mode. See the docs for more information.
-
Does Turbo work with Ollama's API and JavaScript/Python libraries?
Yes, Ollama's API and JavaScript/Python libraries work with Turbo mode. See the docs for more information.
-
What data do you retain in Turbo mode?
Ollama does not log or retain any queries made via Turbo mode.
-
Where is the hardware that power Turbo located?
All hardware is located in the United States.
-
What are the usage limits for Turbo?
Turbo includes hourly and daily limits to avoid capacity issues. Usage-based pricing will soon be available to consume models in a metered fashion.
What's Your Reaction?






