README
Free LLM API resources
This lists various services that provide free access or credits towards API-based LLM usage.
Note
Please don't abuse these services, else we might lose them.
Warning
This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)
Free Providers
OpenRouter
Limits:
20 requests/minute
50 requests/day
Up to 1000 requests/day with $10 lifetime topup
Models share a common quota.
- Gemma 3 12B Instruct
- Gemma 3 27B Instruct
- Gemma 3 4B Instruct
- Hermes 3 Llama 3.1 405B
- Llama 3.2 3B Instruct
- Llama 3.3 70B Instruct
- Mistral Small 3.1 24B Instruct
- arcee-ai/trinity-large-preview:free
- arcee-ai/trinity-mini:free
- cognitivecomputations/dolphin-mistral-24b-venice-edition:free
- google/gemma-3n-e2b-it:free
- google/gemma-3n-e4b-it:free
- liquid/lfm-2.5-1.2b-instruct:free
- liquid/lfm-2.5-1.2b-thinking:free
- nvidia/nemotron-3-nano-30b-a3b:free
- nvidia/nemotron-nano-12b-v2-vl:free
- nvidia/nemotron-nano-9b-v2:free
- openai/gpt-oss-120b:free
- openai/gpt-oss-20b:free
- qwen/qwen3-4b:free
- qwen/qwen3-coder:free
- qwen/qwen3-next-80b-a3b-instruct:free
- stepfun/step-3.5-flash:free
- z-ai/glm-4.5-air:free
Google AI Studio
Data is used for training when used outside of the UK/CH/EEA/EU.
| Model Name | Model Limits |
|---|---|
| Gemini 3 Flash | 250,000 tokens/minute 20 requests/day 5 requests/minute |
| Gemini 3.1 Flash-Lite | 250,000 tokens/minute 500 requests/day 15 requests/minute |
| Gemini 2.5 Flash | 250,000 tokens/minute 20 requests/day 5 requests/minute |
| Gemini 2.5 Flash-Lite | 250,000 tokens/minute 20 requests/day 10 requests/minute |
| Gemma 3 27B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
| Gemma 3 12B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
| Gemma 3 4B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
| Gemma 3 1B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
NVIDIA NIM
Phone number verification required.
Models tend to be context window limited.
Limits: 40 requests/minute
Mistral (La Plateforme)
- Free tier (Experiment plan) requires opting into data training
- Requires phone number verification.
Limits (per-model): 1 request/second, 500,000 tokens/minute, 1,000,000,000 tokens/month
Mistral (Codestral)
- Currently free to use
- Monthly subscription based
- Requires phone number verification
Limits: 30 requests/minute, 2,000 requests/day
- Codestral
HuggingFace Inference Providers
HuggingFace Serverless Inference limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB.
Limits: $0.10/month in credits
- Various open models across supported providers
Vercel AI Gateway
Routes to various supported providers.
Limits: $5/month
Cerebras
| Model Name | Model Limits |
|---|---|
| gpt-oss-120b | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
| Qwen 3 235B A22B Instruct | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
| Llama 3.3 70B | 30 requests/minute 64,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
| Qwen 3 32B | 30 requests/minute 64,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
| Llama 3.1 8B | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
| Z.ai GLM-4.6 | 10 requests/minute 60,000 tokens/minute 100 requests/hour 100,000 tokens/hour 100 requests/day 1,000,000 tokens/day |
Groq
| Model Name | Model Limits |
|---|---|
| Allam 2 7B | 7,000 requests/day 6,000 tokens/minute |
| Llama 3.1 8B | 14,400 requests/day 6,000 tokens/minute |
| Llama 3.3 70B | 1,000 requests/day 12,000 tokens/minute |
| Llama 4 Maverick 17B 128E Instruct | 1,000 requests/day 6,000 tokens/minute |
| Llama 4 Scout Instruct | 1,000 requests/day 30,000 tokens/minute |
| Whisper Large v3 | 7,200 audio-seconds/minute 2,000 requests/day |
| Whisper Large v3 Turbo | 7,200 audio-seconds/minute 2,000 requests/day |
| canopylabs/orpheus-arabic-saudi | |
| canopylabs/orpheus-v1-english | |
| groq/compound | 250 requests/day 70,000 tokens/minute |
| groq/compound-mini | 250 requests/day 70,000 tokens/minute |
| meta-llama/llama-guard-4-12b | 14,400 requests/day 15,000 tokens/minute |
| meta-llama/llama-prompt-guard-2-22m | |
| meta-llama/llama-prompt-guard-2-86m | |
| moonshotai/kimi-k2-instruct | 1,000 requests/day 10,000 tokens/minute |
| moonshotai/kimi-k2-instruct-0905 | 1,000 requests/day 10,000 tokens/minute |
| openai/gpt-oss-120b | 1,000 requests/day 8,000 tokens/minute |
| openai/gpt-oss-20b | 1,000 requests/day 8,000 tokens/minute |
| openai/gpt-oss-safeguard-20b | 1,000 requests/day 8,000 tokens/minute |
| qwen/qwen3-32b | 1,000 requests/day 6,000 tokens/minute |
Cohere
Limits:
20 requests/minute
1,000 requests/month
Models share a common monthly quota.
- c4ai-aya-expanse-32b
- c4ai-aya-vision-32b
- command-a-03-2025
- command-a-reasoning-08-2025
- command-a-translate-08-2025
- command-a-vision-07-2025
- command-r-08-2024
- command-r-plus-08-2024
- command-r7b-12-2024
- command-r7b-arabic-02-2025
- tiny-aya-earth
- tiny-aya-fire
- tiny-aya-global
- tiny-aya-water
GitHub Models
Extremely restrictive input/output token limits.
Limits: Dependent on Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise)
- AI21 Jamba 1.5 Large
- Codestral 25.01
- Cohere Command A
- Cohere Command R 08-2024
- Cohere Command R+ 08-2024
- DeepSeek-R1
- DeepSeek-R1-0528
- DeepSeek-V3-0324
- Grok 3
- Grok 3 Mini
- Llama 4 Maverick 17B 128E Instruct FP8
- Llama 4 Scout 17B 16E Instruct
- Llama-3.2-11B-Vision-Instruct
- Llama-3.2-90B-Vision-Instruct
- Llama-3.3-70B-Instruct
- MAI-DS-R1
- Meta-Llama-3.1-405B-Instruct
- Meta-Llama-3.1-8B-Instruct
- Ministral 3B
- Mistral Medium 3 (25.05)
- Mistral Small 3.1
- OpenAI GPT-4.1
- OpenAI GPT-4.1-mini
- OpenAI GPT-4.1-nano
- OpenAI GPT-4o
- OpenAI GPT-4o mini
- OpenAI Text Embedding 3 (large)
- OpenAI Text Embedding 3 (small)
- OpenAI gpt-5
- OpenAI gpt-5-chat (preview)
- OpenAI gpt-5-mini
- OpenAI gpt-5-nano
- OpenAI o1
- OpenAI o1-mini
- OpenAI o1-preview
- OpenAI o3
- OpenAI o3-mini
- OpenAI o4-mini
- Phi-4
- Phi-4-mini-instruct
- Phi-4-mini-reasoning
- Phi-4-multimodal-instruct
- Phi-4-reasoning
Cloudflare Workers AI
Limits: 10,000 neurons/day
- @cf/aisingapore/gemma-sea-lion-v4-27b-it
- @cf/ibm-granite/granite-4.0-h-micro
- @cf/openai/gpt-oss-120b
- @cf/openai/gpt-oss-20b
- @cf/qwen/qwen3-30b-a3b-fp8
- @cf/zai-org/glm-4.7-flash
- DeepSeek R1 Distill Qwen 32B
- Deepseek Coder 6.7B Base (AWQ)
- Deepseek Coder 6.7B Instruct (AWQ)
- Deepseek Math 7B Instruct
- Discolm German 7B v1 (AWQ)
- Falcom 7B Instruct
- Gemma 2B Instruct (LoRA)
- Gemma 3 12B Instruct
- Gemma 7B Instruct
- Gemma 7B Instruct (LoRA)
- Hermes 2 Pro Mistral 7B
- Llama 2 13B Chat (AWQ)
- Llama 2 7B Chat (FP16)
- Llama 2 7B Chat (INT8)
- Llama 2 7B Chat (LoRA)
- Llama 3 8B Instruct
- Llama 3 8B Instruct (AWQ)
- Llama 3.1 8B Instruct (AWQ)
- Llama 3.1 8B Instruct (FP8)
- Llama 3.2 11B Vision Instruct
- Llama 3.2 1B Instruct
- Llama 3.2 3B Instruct
- Llama 3.3 70B Instruct (FP8)
- Llama 4 Scout Instruct
- Llama Guard 3 8B
- Mistral 7B Instruct v0.1
- Mistral 7B Instruct v0.1 (AWQ)
- Mistral 7B Instruct v0.2
- Mistral 7B Instruct v0.2 (LoRA)
- Mistral Small 3.1 24B Instruct
- Neural Chat 7B v3.1 (AWQ)
- OpenChat 3.5 0106
- OpenHermes 2.5 Mistral 7B (AWQ)
- Phi-2
- Qwen 1.5 0.5B Chat
- Qwen 1.5 1.8B Chat
- Qwen 1.5 14B Chat (AWQ)
- Qwen 1.5 7B Chat (AWQ)
- Qwen 2.5 Coder 32B Instruct
- Qwen QwQ 32B
- SQLCoder 7B 2
- Starling LM 7B Beta
- TinyLlama 1.1B Chat v1.0
- Una Cybertron 7B v2 (BF16)
- Zephyr 7B Beta (AWQ)
Google Cloud Vertex AI
Very stringent payment verification for Google Cloud.
| Model Name | Model Limits |
|---|---|
| Llama 3.2 90B Vision Instruct | 30 requests/minute Free during preview |
| Llama 3.1 70B Instruct | 60 requests/minute Free during preview |
| Llama 3.1 8B Instruct | 60 requests/minute Free during preview |
Providers with trial credits
Fireworks
Credits: $1
Models: Various open models
Baseten
Credits: $30
Models: Any supported model - pay by compute time
Nebius
Credits: $1
Models: Various open models
Novita
Credits: $0.5 for 1 year
Models: Various open models
AI21
Credits: $10 for 3 months
Models: Jamba family of models
Upstage
Credits: $10 for 3 months
Models: Solar Pro/Mini
NLP Cloud
Credits: $15
Requirements: Phone number verification
Models: Various open models
Alibaba Cloud (International) Model Studio
Credits: 1 million tokens/model
Models: Various open and proprietary Qwen models
Modal
Credits: $5/month upon sign up, $30/month with payment method added
Models: Any supported model - pay by compute time
Inference.net
Credits: $1, $25 on responding to email survey
Models: Various open models
Hyperbolic
Credits: $1
Models:
- DeepSeek V3
- DeepSeek V3 0324
- Llama 3.1 405B Base
- Llama 3.1 405B Instruct
- Llama 3.1 70B Instruct
- Llama 3.1 8B Instruct
- Llama 3.2 3B Instruct
- Llama 3.3 70B Instruct
- Pixtral 12B (2409)
- Qwen QwQ 32B
- Qwen2.5 72B Instruct
- Qwen2.5 Coder 32B Instruct
- Qwen2.5 VL 72B Instruct
- Qwen2.5 VL 7B Instruct
- deepseek-ai/deepseek-r1-0528
- openai/gpt-oss-120b
- openai/gpt-oss-120b-turbo
- openai/gpt-oss-20b
- qwen/qwen3-235b-a22b
- qwen/qwen3-235b-a22b-instruct-2507
- qwen/qwen3-coder-480b-a35b-instruct
- qwen/qwen3-next-80b-a3b-instruct
- qwen/qwen3-next-80b-a3b-thinking
SambaNova Cloud
Credits: $5 for 3 months
Models:
- E5-Mistral-7B-Instruct
- Llama 3.1 8B
- Llama 3.3 70B
- Llama 3.3 70B
- Llama-4-Maverick-17B-128E-Instruct
- Qwen/Qwen3-235B
- Qwen/Qwen3-32B
- Whisper-Large-v3
- deepseek-ai/DeepSeek-R1-0528
- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
- deepseek-ai/DeepSeek-V3-0324
- deepseek-ai/DeepSeek-V3.1
- deepseek-ai/DeepSeek-V3.1-Terminus
- deepseek-ai/DeepSeek-V3.2
- openai/gpt-oss-120b
- tbd
Scaleway Generative APIs
Credits: 1,000,000 free tokens
Models:
- BGE-Multilingual-Gemma2
- DeepSeek R1 Distill Llama 70B
- Gemma 3 27B Instruct
- Llama 3.1 8B Instruct
- Llama 3.3 70B Instruct
- Mistral Nemo 2407
- Pixtral 12B (2409)
- Whisper Large v3
- devstral-2-123b-instruct-2512
- gpt-oss-120b
- holo2-30b-a3b
- mistral-small-3.2-24b-instruct-2506
- qwen3-235b-a22b-instruct-2507
- qwen3-coder-30b-a3b-instruct
- qwen3-embedding-8b
- voxtral-small-24b-2507