For the curious (and the engineers)

Under the hood.

A quick tour of what Huduko actually is — the models, the search pipeline, the storage, and what "fully local" means in practice on your Mac.

Architecture

One binary. A sidecar.
SQLite in the middle.

Huduko is a Tauri 2 desktop app with a frozen Python backend as a sidecar. The two processes talk over HTTP on localhost. Everything else — models, embeddings, index — lives on disk under ~/.huduko/.

┌─────────────────────────────────────────────────────────────────┐ │ Tauri Shell (Rust) ←→ React + Tailwind UI │ │ ↕ HTTP fetch │ │ FastAPI server (localhost:8765) │ │ ↕ │ │ SearchEngine ←────────── SQLite + sqlite‑vec + FTS5 │ │ ↑ ↑ │ │ mxbai CLIP (Apple MLX, Metal GPU) │ │ ↑ │ │ IngestionQueue ← FileWatcher (watchdog) │ └─────────────────────────────────────────────────────────────────┘

The Tauri shell spawns the backend binary on launch and kills it on quit. The backend is a PyInstaller --onefile bundle — a single aarch64 executable that contains Python, MLX, ONNX Runtime, PyMuPDF, Tesseract, and the model code. Shipped as a single DMG, signed with a Developer ID certificate and notarized by Apple.

Ingestion

How files become searchable.

A background pipeline keeps the index in sync with your filesystem. It collapses rapid saves, parses by extension, and batches embeddings through the GPU.

01

Watch

watchdog observes your configured directories using FSEvents on macOS. Create, modify, delete, and move events all fire callbacks into the queue.

02

Debounce

An asyncio.Queue with a 1s debounce collapses rapid saves. Editors that write-and-rename don't generate duplicate index jobs.

03

Parse & chunk

Routed by extension: PyMuPDF for PDFs (with OCR fallback on image-only pages), python-docx for Word, openpyxl for Excel, python-pptx for PowerPoint, plain-text readers for code/markdown. Text is chunked to ~400 words with 50-word overlap.

04

Embed & upsert

Chunks batched through the text encoder (mxbai, 32 per batch by default); images through CLIP (16 per batch). Embeddings upserted into SQLite alongside FTS5 full-text entries in a single transaction.

Embedding models

Two models.
Two spaces. Zero cloud.

Huduko ships with two embedding models — one for text, one for images — and runs both on your Mac's Apple Silicon GPU via Apple MLX. Total on-disk weight: ~1.2 GB.

mxbai‑embed‑large‑v1

1024-d text embeddings

Best-in-class English retrieval at its size on MTEB. Asymmetric usage: a query prefix ("Represent this sentence…") is prepended at query time but not at document ingest time — this matches how the model was trained.

Runs on Apple MLX on M-series Macs. On Windows, fastembed via ONNX Runtime / DirectML. 639 MB on disk (fp16, MLX).

CLIP ViT‑B/32 (LAION‑2B)

512-d shared text+image space

CLIP maps natural-language descriptions and image pixels into the same vector space. Trained on LAION-2B (2 billion image-text pairs). A text query is embedded once and matched against image embeddings — no filename, OCR, or caption required for basic visual search.

Runs on Apple MLX on M-series Macs. On Windows, fastembed via ONNX Runtime. 578 MB on disk (fp16, MLX).

Why MLX? On Apple Silicon, MLX gives us Metal GPU execution with dramatically better cold-start and memory behavior than PyTorch/MPS. Model load is fast, inference is stable, and the memory footprint is small enough to coexist with whatever else is running on your Mac.

Both models are downloaded from Hugging Face on first launch, cached under ~/.huduko/models/, and never fetched again unless you wipe the cache.

Storage

Why SQLite
(and not a vector DB).

One file at ~/.huduko/db/huduko.db holds everything: document text, image metadata, Apple Photos entries, 1024-d text vectors, 512-d image vectors, and the full-text index. Opened with any SQLite client.

sqlite-vec

A tiny C extension (<1 MB) that adds KNN over float32 vectors. Ships as a virtual table alongside the normal SQL tables. Contrast with LanceDB's 314 MB bundle.

FTS5

SQLite's built-in full-text search with BM25 scoring. No extension needed. Same row-id space as the vector index, so updates stay atomic.

One transaction

A file's text, vectors, and FTS5 entry all commit in a single SQLite transaction. No eventual consistency, no vector/keyword drift.

Why not Pinecone, Weaviate, Qdrant, or LanceDB? Every one of those assumes either a server to run or a heavier embedded library than we want shipping inside a desktop app. SQLite + sqlite-vec + FTS5 is smaller, simpler, and maps cleanly to Huduko's single-user, single-node model. If you outgrow this (you probably won't — the format scales comfortably to millions of rows on consumer hardware), swapping out the storage layer is a localized change.

Privacy, in practice

What "local" actually means.

Plenty of apps say "local." Here's the concrete version.

Stack

The full tech stack.

No AI frameworks, no orchestration layers, no cloud SDKs. Boring, concrete components that each do one thing.

Desktop shell Tauri 2 (Rust) — single-binary cross-platform, spawns the backend sidecar and kills it on quit
UI React + Tailwind — dark / light / system theme via CSS variables
Local API FastAPI + uvicorn on localhost:8765 — async, no auth (loopback only)
Backend bundle PyInstaller --onefile — ~148 MB aarch64 binary, Developer ID signed
Text embeddings mxbai-embed-large-v1 — 1024-d, Apple MLX on Apple Silicon
Image embeddings CLIP ViT-B/32 (LAION-2B) — 512-d, Apple MLX on Apple Silicon
File watcher watchdog — FSEvents on macOS, ReadDirectoryChangesW on Windows
Document parsing PyMuPDF, python-docx, openpyxl, python-pptx; Tesseract OCR fallback for image-only PDF pages
Vector store SQLite + sqlite-vec — <1 MB C extension, virtual table for KNN
Full-text index SQLite FTS5 — BM25 scoring, built into SQLite
Photo enrichment (macOS) PhotoKit, Vision (labels + OCR), CLGeocoder (GPS → place)
Telemetry transport stdlib urllib → Vercel serverless → Neon Postgres. Zero pip deps.
Try it

The fastest way to see
what this actually does.

$9.99 on the Mac App Store. One-time purchase, runs locally, and you can uninstall with a single drag. No accounts, no signups.

Want to try before buying? Email support@huduko.io for free TestFlight access.