Compatible Models
Showing models for your hardware configuration
Fast — fits fully in VRAM/unified
Good — fits with quantization
Slow — partial offload / CPU
Won't Run — insufficient memory
Frequently Asked Questions
How much RAM do I need to run an LLM locally?▾
It depends on the model size. Small models (1-3B parameters) need 4-8 GB RAM. Medium models (7-13B) need 16-32 GB. Large models (30-70B) need 32-64 GB. Quantized versions (Q4_K_M, Q5_K_M) reduce requirements by 50-75%, making larger models accessible on consumer hardware.
Can I run Gemma 4 on my laptop?▾
Gemma 4 comes in multiple sizes. The 4B model runs on laptops with 8GB+ RAM. The 12B model needs 16GB+. The 27B model needs 32GB+ RAM. With Q4 quantization, requirements drop significantly — the 12B Q4 runs comfortably on 8GB. Apple Silicon Macs are particularly good for local LLMs due to unified memory.
Do I need a GPU to run LLMs locally?▾
No, but it helps enormously. CPU-only inference works but is 5-20x slower than GPU. An NVIDIA GPU with 6GB+ VRAM can run small models at good speed. For 70B models, you need 24GB+ VRAM (RTX 4090) or can split across CPU+GPU. Apple Silicon Macs use unified memory, making them excellent for local LLMs without a discrete GPU.
What is quantization and how does it reduce requirements?▾
Quantization reduces model precision from 16-bit to 4-bit or 8-bit, cutting memory usage by 50-75% with minimal quality loss. Q4_K_M is the most popular balance of quality and size. Q5_K_M offers slightly better quality at ~10% more memory. Tools like Ollama, LM Studio, and llama.cpp support quantized GGUF models natively.
Ollama vs LM Studio vs llama.cpp — which should I use?▾
Ollama is simplest — one command to download and run any model. LM Studio has a GUI with a built-in model browser and chat interface. llama.cpp offers the most control and best performance tuning. All three support quantized GGUF models and work on Mac, Windows, and Linux. For beginners, start with Ollama or LM Studio.