What you'll have at the end

  • A local AI chat server that speaks any language — running 100% on your Mac
  • Separate accounts for every family member or team member
  • AI image generation with FLUX.1 (DALL-E 3 quality)
  • Access from iPhone, iPad, or any device on your network
  • Automatic model routing — fast or smart, depending on the question
  • €60–90/month in saved subscriptions

Before You Start — Do These First

These accounts take a few minutes but will block you mid-setup if you skip them:

  1. Hugging Face account (free) — go to huggingface.co, sign up, then go to FLUX.1-schnell → accept terms. Then Settings → Access Tokens → New Token → copy it. You'll need this later.
  2. Docker Hub account (optional) — hub.docker.com. Only needed for future use.

Hardware Requirements

SpecMinimumRecommended
ChipApple Silicon M1/M2M3 Max / M4 Max
RAM (Unified Memory)16 GB64 GB+
Storage (free)100 GB500 GB+
macOSVentura 13+Sequoia 15+
With 16 GB RAM you can run small models (7B). For Qwen 3.5 35B you need 32 GB minimum. For simultaneous LLM + image generation, 64 GB+.

The Three Tools — What Each One Does

Before diving into setup, here's how the three components relate to each other:

LM Studio — The engine. It downloads AI models (Qwen, Llama, Mistral) into your Mac's RAM and runs them locally. Instead of sending your prompts to ChatGPT, you send them to your own model on your own hardware. It also runs as an API server on port 1234. Think of it as the car's engine — nothing moves without it.

Open WebUI — The steering wheel. A ChatGPT-like browser interface that connects to LM Studio in the background. Gives you chat history, system prompts, separate accounts, knowledge bases — everything that makes it feel like ChatGPT but entirely local. Without LM Studio it does nothing. Without Open WebUI, LM Studio has no face.

ComfyUI — The image studio. Completely separate from the chat stack. Loads Stable Diffusion / FLUX models and generates images and video through a node-based interface. Run it independently — it does a completely different job.

Boot order matters: LM Studio first → Open WebUI second → ComfyUI anytime (it's independent).
Mac (Apple Silicon)
│
├── LM Studio          ← runs AI models (LLM)
│     └── API port 1234
│
├── Docker Desktop
│     └── Open WebUI   ← ChatGPT-style interface (port 3000)
│           ├── account: admin (you)
│           ├── account: user1
│           └── account: user2, user3
│
└── ComfyUI            ← images / video (port 8188)
Subscription replacedCost/monthReplaced by
ChatGPT Plus€20Open WebUI + local model
Claude Pro€19Open WebUI + Qwen 3.5 35B
Perplexity Pro€20Open WebUI + web search
DALL-E / Midjourney€10–30ComfyUI + FLUX.1
Total€59–89/month€0

Step 1 — Install Docker Desktop

Step 01

Docker runs Open WebUI in a contained environment. Think of it as a box that keeps everything organized. You don't need to understand it — just install it like any Mac app.

  1. Download from docker.com/products/docker-desktop — Apple Silicon version.
  2. Install normally. Accept all permission requests.

After installing, two critical settings:

  • Settings → General: Enable "Start Docker Desktop when you sign in." Without this, Open WebUI won't load after a reboot.
  • Settings → Resources: Disable "Resource Saver." If left on, Docker pauses after inactivity and appears broken.
Check the menu bar: You should see the Docker whale icon top-right. If it's not there, Docker isn't running.

Step 2 — Install LM Studio and Download Models

Step 02

Download from lmstudio.ai and install. Open it — you'll see a Discover tab (like an app store for AI models).

If you have 64 GB+ RAM

ModelSizeUse
Qwen3.5-35B-A3B-Uncensored (Q6_K)~29 GBMain model. Multilingual, excellent reasoning. Does everything.
Devstral-Small-2-24B (Q4_K_XL)~16 GBCode specialist. Skip if you don't write code.
Meltemi-7B-v1 (Q8_0)~7 GBFast, lightweight. For quick simple questions.

If you have 16–32 GB RAM

ModelSizeUse
Meltemi-7B-v1 (Q8_0)~7 GBFast and multilingual — best option for less RAM
Qwen2.5-14B (Q4_K_M)~9 GBGeneral purpose, strong multilingual

Search each model name in the Discover tab, hit Download, and wait (they're large — download overnight on slow connections).

Once downloaded: go to the Developer tab → Start Server. You'll see Server running on port 1234. Then enable "Serve on Local Network" so other devices on your WiFi can access it.

Step 3 — Install Open WebUI

Step 03

Open Terminal (find it with Spotlight — Cmd+Space → "Terminal") and run this single command:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_BASE_URL="" \
  -e OPENAI_API_BASE_URL="http://host.docker.internal:1234/v1" \
  -e OPENAI_API_KEY="lm-studio" \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

What this does: tells Docker to download Open WebUI, run it on port 3000, connect it to LM Studio on port 1234, and auto-restart if anything crashes.

Wait 2–3 minutes. Then open your browser and go to http://localhost:3000. You'll see a ChatGPT-style interface. Create your first account — the first user automatically becomes Administrator.

"Permission denied" error: Docker isn't running. Open Docker Desktop first, wait for the whale icon, then retry.
"Port 3000 already in use": Change -p 3000:8080 to -p 3001:8080 and use http://localhost:3001 instead.

Step 4 — Set Up Accounts

Step 04

Each person gets their own account with separate chat history. Go to Settings (gear icon) → Admin Panel → Users → + Add User:

AccountRoleWho
admin@home.localAdminYou — can see and change everything
user1@home.localUserPartner, colleague, etc.
user2@home.localUserSecond user
Emails don't need to be real — they're just unique identifiers. Nothing gets sent anywhere.

Step 5 — Create Model Profiles

Step 05

Instead of exposing model names like qwen3.5-35b-a3b-uncensored to users, create friendly named profiles. Go to Workspace → Models → + New Model:

  • 🤖 Assistant — Base: Qwen 3.5 35B. System prompt: "You are a helpful assistant. Answer clearly, concisely, and accurately."
  • 💻 Code — Base: Devstral Small 24B. System prompt: "You are an expert developer. Write clean code and explain your reasoning."
  • ⚡ Quick — Base: Meltemi 7B. System prompt: "Answer briefly and directly. No unnecessary explanation."

Set 🤖 Assistant as the default in Admin Panel → Settings → Default Model. Users open the app, type, and get an answer — no model-picking required.

Step 6 — iPhone and iPad Access

Step 06

Every device on your network can use the AI. First, find your Mac's local IP:

ipconfig getifaddr en0

You'll get something like 192.168.1.100. On any iPhone or iPad on the same WiFi, open Safari and go to http://192.168.1.100:3000.

Make it a proper app: tap the Share button → "Add to Home Screen." It opens fullscreen, exactly like a native app. Nobody will know it's running locally on your Mac.

Tip: If your Mac's IP changes after router reboots, set a DHCP reservation in your router settings for your Mac's MAC address. It'll always get the same IP.

Step 7 — ComfyUI for Image Generation

Step 07

Skip this step if you only need chat. ComfyUI is a separate tool that handles AI image and video generation — completely independent from the chat stack.

Open Terminal and run these commands in order:

brew install python@3.11

(If brew isn't installed, go to brew.sh first.)

cd ~ && git clone https://github.com/comfyanonymous/ComfyUI.git
cd ~/ComfyUI && pip3.11 install -r requirements.txt
cd ~/ComfyUI && python3.11 main.py --force-fp16

Look for Device: mps in the output — this confirms it's using your Apple Silicon GPU. Then open http://localhost:8188.

Step 8 — Download FLUX.1 (Image Model)

Step 08

Login to Hugging Face (use the token from Step 0):

python3.11 -c "from huggingface_hub import login; login('YOUR_TOKEN_HERE')"

Download the FLUX.1 quantized model (~7 GB instead of 24 GB — same quality output):

cd ~/ComfyUI/models/checkpoints
python3.11 -c "
from huggingface_hub import hf_hub_download
hf_hub_download(
    repo_id='city96/FLUX.1-schnell-gguf',
    filename='flux1-schnell-Q4_K_S.gguf',
    local_dir='.'
)
print('Done!')
"

Then in the ComfyUI interface, open the "1.1 Starter – Text to Image" template → click "See Errors" → "Download all." This auto-downloads ~8 GB of helper files (text encoders + VAE).

Downloads are large. Run them overnight on slow connections.

What You're Saving

ServiceMonthlyYearly
ChatGPT Plus (×2)€40€480
Claude Pro€19€228
Perplexity Pro€20€240
Midjourney / DALL-E€10€120
Total€89/month€1,068/year

The hardware (Mac) pays for itself in under two years in saved subscriptions alone — before accounting for privacy, speed, and no rate limits.

What's Next (Phase 2)

  • Video generation: Wan2.1 or CogVideoX-5B via ComfyUI
  • Music generation: MusicGen Large (Meta) via ComfyUI
  • Browser automation: OpenClaw + local LLM for automated tasks
  • Workflow automation: n8n for morning digests, alerts, and automations
  • RAG on your own notes: Ask the AI about your own documents via Open WebUI RAG
This article is based on a real setup running on a MacBook Pro M3 Max 128 GB. Every step has been tested and verified to work.
Mike Mingos

Mike Mingos

COO and co-founder of Tictac SA. Cybersecurity entrepreneur, AI builder, and speaker. Runs a local AI stack on M3 Max 128 GB. Writes at mikemingos.gr.