๐๏ธ SambaNova Cloud
The SambaNova Cloud is a cloud platform for running large AI models with the world record Llama 3.1 70B/405B performance. You can sign up here, copy your API key on the initial welcome screen, and then hit the play button on any model from the model list.
๐๏ธ Ask Sage
To get an Ask Sage API key login to the Ask Sage platform (If you don't have an account, you can create one here) and follow the instructions in the Ask Sage Docs:Ask Sage API Key
๐๏ธ Cerebras Inference
Cerebras Inference uses specialized silicon to provides fast inference.
๐๏ธ Cloudflare Workers AI
Cloudflare Workers AI can be used for both chat and tab autocompletion in Continue. To setup Cloudflare Workers AI, add the following to your config.json file:
๐๏ธ Cohere
Before using Cohere, visit the Cohere dashboard to create an API key.
๐๏ธ DeepInfra
DeepInfra provides inference for open-source models at very low cost. To get started with DeepInfra, obtain your API key here. Then, find the model you want to use here and copy the name of the model. Continue can then be configured to use the DeepInfra LLM class, like the example here:
๐๏ธ Flowise
Flowise is a low-code/no-code drag & drop tool with the aim to make it easy for people to visualize and build LLM apps. Continue can then be configured to use the Flowise LLM class, like the example here:
๐๏ธ Free Trial
The "free-trial" provider lets new users quickly try out the best experience in Continue using our API keys through a secure proxy server. To prevent abuse, we will ask you to sign in with GitHub, which you can read more about below.
๐๏ธ Groq
Groq provides the fastest available inference for open-source language models, including the entire Llama 3.1 family.
๐๏ธ HuggingFace Inference Endpoints
Hugging Face Inference Endpoints are an easy way to setup instances of open-source language models on any cloud. Sign up for an account and add billing here, access the Inference Endpoints here, click on โNew endpointโ, and fill out the form (e.g. select a model like WizardCoder-Python-34B-V1.0), and then deploy your model by clicking โCreate Endpointโ. Change ~/.continue/config.json to look like this:
๐๏ธ IPEX-LLM
IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc A-Series, Flex and Max) with very low latency.
๐๏ธ Kindo
Kindo offers centralized control over your organization's AI operations, ensuring data protection and compliance with internal policies while supporting various commercial and open-source models. To get started, sign up here, create your API key on the API keys page, and choose a model from the list of supported models in the plugins tab.
๐๏ธ LlamaCpp
Run the llama.cpp server binary to start the API server. If running on a remote server, be sure to set host to 0.0.0.0:
๐๏ธ Llamafile
A llamafile is a self-contained binary that can run an open-source LLM. You can configure this provider in your config.json as follows:
๐๏ธ LM Studio
LM Studio is an application for Mac, Windows, and Linux that makes it easy to locally run open-source models and comes with a great UI. To get started with LM Studio, download from the website, use the UI to download a model, and then start the local inference server. Continue can then be configured to use the LMStudio LLM class:
๐๏ธ Msty
Msty is an application for Windows, Mac, and Linux that makes it really easy to run online as well as local open-source models, including Llama-2, DeepSeek Coder, etc. No need to fidget with your terminal, run a command, or anything. Just download the app from the website, click a button, and you are up and running. Continue can then be configured to use the Msty LLM class:
๐๏ธ NVIDIA
View the docs to learn how to get an API key.
๐๏ธ OpenRouter
OpenRouter is a unified interface for commercial and open-source models, giving you access to the best models at the best prices. You can sign up here, create your API key on the keys page, and then choose a model from the list of supported models.
๐๏ธ ReplicateLLM
Replicate is a great option for newly released language models or models that you've deployed through their platform. Sign up for an account here, copy your API key, and then select any model from the Replicate Streaming List. Change ~/.continue/config.json to look like this:
๐๏ธ AWS SageMaker
SageMaker can be used for both chat and embedding models. Chat models are supported for endpoints deployed with LMI, and embedding models are supported for endpoints deployed with HuggingFace TEI
๐๏ธ TextGenWebUI
TextGenWebUI is a comprehensive, open-source language model UI and local server. You can set it up with an OpenAI-compatible server plugin, and then configure it in your config.json like this:
๐๏ธ Together
The Together API is a cloud platform for running large AI models. You can sign up here, copy your API key on the initial welcome screen, and then hit the play button on any model from the Together Models list. Change ~/.continue/config.json to look like this:
๐๏ธ vLLM
Run the OpenAI-compatible server by vLLM using vllm serve. See their server documentation and the engine arguments documentation.
๐๏ธ IBM watsonx
watsonx, developed by IBM, offers a variety of pre-trained AI foundation models that can be used for natural language processing (NLP), computer vision, and speech recognition tasks.