Can I Run This LLM?

Enter your hardware specs to see which AI models you can run locally. Supports 35+ models including Gemma 4, Llama 4, Qwen 3, Mistral, and DeepSeek.

Your Hardware

Quick presets:

Compatible Models

Showing models for your hardware configuration

Fast — fits fully in VRAM/unified Good — fits with quantization Slow — partial offload / CPU Won't Run — insufficient memory

Frequently Asked Questions

How much RAM do I need to run an LLM locally?
It depends on the model size. Small models (1-3B parameters) need 4-8 GB RAM. Medium models (7-13B) need 16-32 GB. Large models (30-70B) need 32-64 GB. Quantized versions (Q4_K_M, Q5_K_M) reduce requirements by 50-75%, making larger models accessible on consumer hardware.
Can I run Gemma 4 on my laptop?
Gemma 4 comes in multiple sizes. The 4B model runs on laptops with 8GB+ RAM. The 12B model needs 16GB+. The 27B model needs 32GB+ RAM. With Q4 quantization, requirements drop significantly — the 12B Q4 runs comfortably on 8GB. Apple Silicon Macs are particularly good for local LLMs due to unified memory.
Do I need a GPU to run LLMs locally?
No, but it helps enormously. CPU-only inference works but is 5-20x slower than GPU. An NVIDIA GPU with 6GB+ VRAM can run small models at good speed. For 70B models, you need 24GB+ VRAM (RTX 4090) or can split across CPU+GPU. Apple Silicon Macs use unified memory, making them excellent for local LLMs without a discrete GPU.
What is quantization and how does it reduce requirements?
Quantization reduces model precision from 16-bit to 4-bit or 8-bit, cutting memory usage by 50-75% with minimal quality loss. Q4_K_M is the most popular balance of quality and size. Q5_K_M offers slightly better quality at ~10% more memory. Tools like Ollama, LM Studio, and llama.cpp support quantized GGUF models natively.
Ollama vs LM Studio vs llama.cpp — which should I use?
Ollama is simplest — one command to download and run any model. LM Studio has a GUI with a built-in model browser and chat interface. llama.cpp offers the most control and best performance tuning. All three support quantized GGUF models and work on Mac, Windows, and Linux. For beginners, start with Ollama or LM Studio.