How to Run a Full Local AI Stack on Your Mac

What you'll have at the end

A local AI chat server that speaks any language — running 100% on your Mac
Separate accounts for every family member or team member
AI image generation with FLUX.1 (DALL-E 3 quality)
Access from iPhone, iPad, or any device on your network
Automatic model routing — fast or smart, depending on the question
€60–90/month in saved subscriptions

Before You Start — Do These First

These accounts take a few minutes but will block you mid-setup if you skip them:

Hugging Face account (free) — go to huggingface.co, sign up, then go to FLUX.1-schnell → accept terms. Then Settings → Access Tokens → New Token → copy it. You'll need this later.
Docker Hub account (optional) — hub.docker.com. Only needed for future use.

Hardware Requirements

Spec	Minimum	Recommended
Chip	Apple Silicon M1/M2	M3 Max / M4 Max
RAM (Unified Memory)	16 GB	64 GB+
Storage (free)	100 GB	500 GB+
macOS	Ventura 13+	Sequoia 15+

With 16 GB RAM you can run small models (7B). For Qwen 3.5 35B you need 32 GB minimum. For simultaneous LLM + image generation, 64 GB+.

The Three Tools — What Each One Does

Before diving into setup, here's how the three components relate to each other:

LM Studio — The engine. It downloads AI models (Qwen, Llama, Mistral) into your Mac's RAM and runs them locally. Instead of sending your prompts to ChatGPT, you send them to your own model on your own hardware. It also runs as an API server on port 1234. Think of it as the car's engine — nothing moves without it.

Open WebUI — The steering wheel. A ChatGPT-like browser interface that connects to LM Studio in the background. Gives you chat history, system prompts, separate accounts, knowledge bases — everything that makes it feel like ChatGPT but entirely local. Without LM Studio it does nothing. Without Open WebUI, LM Studio has no face.

ComfyUI — The image studio. Completely separate from the chat stack. Loads Stable Diffusion / FLUX models and generates images and video through a node-based interface. Run it independently — it does a completely different job.

Boot order matters: LM Studio first → Open WebUI second → ComfyUI anytime (it's independent).

Mac (Apple Silicon)
│
├── LM Studio          ← runs AI models (LLM)
│     └── API port 1234
│
├── Docker Desktop
│     └── Open WebUI   ← ChatGPT-style interface (port 3000)
│           ├── account: admin (you)
│           ├── account: user1
│           └── account: user2, user3
│
└── ComfyUI            ← images / video (port 8188)

Subscription replaced	Cost/month	Replaced by
ChatGPT Plus	€20	Open WebUI + local model
Claude Pro	€19	Open WebUI + Qwen 3.5 35B
Perplexity Pro	€20	Open WebUI + web search
DALL-E / Midjourney	€10–30	ComfyUI + FLUX.1
Total	€59–89/month	€0

Step 1 — Install Docker Desktop

Step 01

Docker runs Open WebUI in a contained environment. Think of it as a box that keeps everything organized. You don't need to understand it — just install it like any Mac app.

Download from docker.com/products/docker-desktop — Apple Silicon version.
Install normally. Accept all permission requests.

After installing, two critical settings:

Settings → General: Enable "Start Docker Desktop when you sign in." Without this, Open WebUI won't load after a reboot.
Settings → Resources: Disable "Resource Saver." If left on, Docker pauses after inactivity and appears broken.

Check the menu bar: You should see the Docker whale icon top-right. If it's not there, Docker isn't running.

Step 2 — Install LM Studio and Download Models

Step 02

Download from lmstudio.ai and install. Open it — you'll see a Discover tab (like an app store for AI models).

If you have 64 GB+ RAM

Model	Size	Use
Qwen3.5-35B-A3B-Uncensored (Q6_K)	~29 GB	Main model. Multilingual, excellent reasoning. Does everything.
Devstral-Small-2-24B (Q4_K_XL)	~16 GB	Code specialist. Skip if you don't write code.
Meltemi-7B-v1 (Q8_0)	~7 GB	Fast, lightweight. For quick simple questions.

If you have 16–32 GB RAM

Model	Size	Use
Meltemi-7B-v1 (Q8_0)	~7 GB	Fast and multilingual — best option for less RAM
Qwen2.5-14B (Q4_K_M)	~9 GB	General purpose, strong multilingual

Search each model name in the Discover tab, hit Download, and wait (they're large — download overnight on slow connections).

Once downloaded: go to the Developer tab → Start Server. You'll see Server running on port 1234. Then enable "Serve on Local Network" so other devices on your WiFi can access it.

Step 3 — Install Open WebUI

Step 03

Open Terminal (find it with Spotlight — Cmd+Space → "Terminal") and run this single command:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_BASE_URL="" \
  -e OPENAI_API_BASE_URL="http://host.docker.internal:1234/v1" \
  -e OPENAI_API_KEY="lm-studio" \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

What this does: tells Docker to download Open WebUI, run it on port 3000, connect it to LM Studio on port 1234, and auto-restart if anything crashes.

Wait 2–3 minutes. Then open your browser and go to http://localhost:3000. You'll see a ChatGPT-style interface. Create your first account — the first user automatically becomes Administrator.

"Permission denied" error: Docker isn't running. Open Docker Desktop first, wait for the whale icon, then retry.

"Port 3000 already in use": Change -p 3000:8080 to -p 3001:8080 and use http://localhost:3001 instead.

Step 4 — Set Up Accounts

Step 04

Each person gets their own account with separate chat history. Go to Settings (gear icon) → Admin Panel → Users → + Add User:

Account	Role	Who
admin@home.local	Admin	You — can see and change everything
user1@home.local	User	Partner, colleague, etc.
user2@home.local	User	Second user

Emails don't need to be real — they're just unique identifiers. Nothing gets sent anywhere.

Step 5 — Create Model Profiles

Step 05

Instead of exposing model names like qwen3.5-35b-a3b-uncensored to users, create friendly named profiles. Go to Workspace → Models → + New Model:

🤖 Assistant — Base: Qwen 3.5 35B. System prompt: "You are a helpful assistant. Answer clearly, concisely, and accurately."
💻 Code — Base: Devstral Small 24B. System prompt: "You are an expert developer. Write clean code and explain your reasoning."
⚡ Quick — Base: Meltemi 7B. System prompt: "Answer briefly and directly. No unnecessary explanation."

Set 🤖 Assistant as the default in Admin Panel → Settings → Default Model. Users open the app, type, and get an answer — no model-picking required.

Step 6 — iPhone and iPad Access

Step 06

Every device on your network can use the AI. First, find your Mac's local IP:

ipconfig getifaddr en0

You'll get something like 192.168.1.100. On any iPhone or iPad on the same WiFi, open Safari and go to http://192.168.1.100:3000.

Make it a proper app: tap the Share button → "Add to Home Screen." It opens fullscreen, exactly like a native app. Nobody will know it's running locally on your Mac.

Tip: If your Mac's IP changes after router reboots, set a DHCP reservation in your router settings for your Mac's MAC address. It'll always get the same IP.

Step 7 — ComfyUI for Image Generation

Step 07

Skip this step if you only need chat. ComfyUI is a separate tool that handles AI image and video generation — completely independent from the chat stack.

Open Terminal and run these commands in order:

brew install python@3.11

(If brew isn't installed, go to brew.sh first.)

cd ~ && git clone https://github.com/comfyanonymous/ComfyUI.git

cd ~/ComfyUI && pip3.11 install -r requirements.txt

cd ~/ComfyUI && python3.11 main.py --force-fp16

Look for Device: mps in the output — this confirms it's using your Apple Silicon GPU. Then open http://localhost:8188.

Step 8 — Download FLUX.1 (Image Model)

Step 08

python3.11 -c "from huggingface_hub import login; login('YOUR_TOKEN_HERE')"

Download the FLUX.1 quantized model (~7 GB instead of 24 GB — same quality output):

cd ~/ComfyUI/models/checkpoints
python3.11 -c "
from huggingface_hub import hf_hub_download
hf_hub_download(
    repo_id='city96/FLUX.1-schnell-gguf',
    filename='flux1-schnell-Q4_K_S.gguf',
    local_dir='.'
)
print('Done!')
"

Then in the ComfyUI interface, open the "1.1 Starter – Text to Image" template → click "See Errors" → "Download all." This auto-downloads ~8 GB of helper files (text encoders + VAE).

Downloads are large. Run them overnight on slow connections.

What You're Saving

Service	Monthly	Yearly
ChatGPT Plus (×2)	€40	€480
Claude Pro	€19	€228
Perplexity Pro	€20	€240
Midjourney / DALL-E	€10	€120
Total	€89/month	€1,068/year

The hardware (Mac) pays for itself in under two years in saved subscriptions alone — before accounting for privacy, speed, and no rate limits.

What's Next (Phase 2)

Video generation: Wan2.1 or CogVideoX-5B via ComfyUI
Music generation: MusicGen Large (Meta) via ComfyUI
Browser automation: OpenClaw + local LLM for automated tasks
Workflow automation: n8n for morning digests, alerts, and automations
RAG on your own notes: Ask the AI about your own documents via Open WebUI RAG

This article is based on a real setup running on a MacBook Pro M3 Max 128 GB. Every step has been tested and verified to work.

LM Studio Open WebUI ComfyUI Apple Silicon Local AI FLUX.1 Privacy No Subscriptions

Mike Mingos

COO and co-founder of Tictac SA. Cybersecurity entrepreneur, AI builder, and speaker. Runs a local AI stack on M3 Max 128 GB. Writes at mikemingos.gr.

How to Run a Full Local AI Stack on Your Mac — Free, Private, No Subscriptions

What you'll have at the end

Before You Start — Do These First

Hardware Requirements

The Three Tools — What Each One Does

Step 1 — Install Docker Desktop

Step 2 — Install LM Studio and Download Models

If you have 64 GB+ RAM

If you have 16–32 GB RAM

Step 3 — Install Open WebUI

Step 4 — Set Up Accounts

Step 5 — Create Model Profiles

Step 6 — iPhone and iPad Access

Step 7 — ComfyUI for Image Generation

Step 8 — Download FLUX.1 (Image Model)

What You're Saving

What's Next (Phase 2)

Mike Mingos