Is it the year on on-device vector databases yet? Or at least on-device AI?
A year ago the interest in “on-device vector database” (also “local vector DB”, or “edge vector DB”) was mostly theoretical / experimental. While we saw SLMs appearing and rapdily dropping in size while gaining in capability, the market overall was not ready, even though vector database technology was already powering Apple Intelligence and shipped to very iPhone.
Nevertheless, the market is finally ready for a real change towards local AI on Mobile, IOT, and other embedded devices: NPUs and silicon are becoming widespread on mobile, embedding models shrank, binary quantization got better, and last not least: new regulations and cloud-cost pressure, both, are pushing vector management off the server.
The only thing “missing” is the on-device memory layer: Vector databases engineered for phones, ECUs, and other restricted edge devices. Ok, it’s not entirely missing, but the on-device vector database is a genuinely small field of products engineered for phones and ECUs. And we’re going to take a deeper look into that market in this article.
Note: A lot of brands still claiming “on-device” are, in practice, running on a high-end developer laptop that is off Wi-Fi. This article is focused primarily on local AI on more restricted devices like e.g. smartphones, ECUs, and PoS systems.
Why now?
Well, a lot of things happened in parallel, finally allowing for on-device AI on a larger scale:
- Devices now have the power: Since 2025, flagship phones (“Gen AI smartphones”) typcially come with a dedicated neural engine (e.g. Apple A18/A19 Pro, Snapdragon 8 Elite, Google Tensor G4/G5). NPUs are no longer a differentiator but a standard, and they specifically accelerate the dot products and matrix operations vector search depends on.
- AI models got smaller and better (Embedding Models, SLMs, task-specific models): E.g.
- EmbeddingGemma (308M) and Qwen3 Embedding 0.6B are MTEB-competitive and fit in a fraction of mobile RAM.
- SLMs like Gemma 3n, Phi-3, Qwen 2.5 now run usefully under ~4 GB. Retrieval can therefore be paired with on-device generation without destroying the device’s memory budget.
- And for restricted devices, depending on the use case of course, task-specific models often make most sense anyways (with task-specific models typically being way smaller than general-purpose models and better at the task-specific work).
- Vectors got smaller: A 1,536-dimensional float32 vector is 6 KB. Quantized to 1 bit per dimension it is 192 bytes – a ~32× memory reduction with typical recall loss in the ~5–10% range depending on the model and reranking strategy.
- The cloud cost conundrum[1] became real: Gartner and IDC, both, report rapidly growing cost numbers for cloud and AI infastructure and expect further climbs. IDC FutureScape 2026 warns that Global 1000 organisations will under-estimate their AI infrastructure costs by ~30% through 2027.
- Privacy regulations came into effect: The EU AI Act Article 5 prohibitions came into force in early February 2025, general-purpose AI obligations from August 2025, and full enforcement is scheduled for 2 August 2026.
What “on-device” actually means
A “real” on-device / edge (or mobile) vector DB for Edge AI persists locally, supports vector + metadata/hybrid search, exposes mobile-usable SDKs (Java / Swift / Kotlin / Flutter) for Mobile and C / C++ for other embedded devices, handles incremental CRUD, has predictable and efficient RAM/storage, a small footprint, works offline, and ideally supports selective data sync. ANN indexing math is the easy part – the hard part is mobile lifecycle, thermal throttling, encrypted storage, and sync of derived data when source content changes. E.g. Faiss is a solid library and good for some use cases, but it is not a database. Let’s look at what’s out there and which criteria they currently meet.
What is an Edge Database?
Edge databases are a type of databases that are optimised for local data storage on restricted devices, like embedded devices, Mobile, and IoT. Because they run on-device, they need to be especially resource-efficient (e.g. with regards to battery use, CPU consumption, memory, and footprint). The term “edge database” is becoming more widely-used every year, especially in the IoT industry. In IoT, the difference between cloud-based databases and ones that run locally (and therefore support Edge Computing) is crucial.
What is a Mobile Database?
We look at mobile databases as a subset of edge databases that run on mobile devices. The difference between the two terms lies mainly in the supported operating systems / types of devices. Unless Android and iOS are supported, an edge database is not really suited for the mobile device / smartphone market. In this article, we will use the term “mobile database” only as “database that runs locally on a mobile (edge) device and stores data on the device”. Therefore, we also refer to it as an “on-device” database.
Vendor Map
We only cover options that can plausibly run on resource-constrained devices here. You can find more on general vector databases here, though that review is from 2024 and due to AI / the development of search we did not found it worthwhile updating. The on-device vector database is worth covering as it is only shaping and lacking broad coverage. Approximate footprint shown — always verify on your target hardware.
| Segment | Vector Database | Approx. footprint | Sync | ACID | Metadata filter | Benchmarks / Efficiency | Status |
|---|---|---|---|---|---|---|---|
| Dedicated mobile / embedded DBs with vectors (vector search) | ObjectBox — HNSW for ANN; mobile, IoT, embedded, offline | <8MB binary, KB-class dynamic RAM | Yes — built-in, ACID-compliant, push-based, offline-first | Yes (full ACID) | Yes — vectors + regular object data | Vendor: ~0.25–0.27 ms/query, up to ~4,000 QPS on a 5-yr-old LG G8S (selected datasets); vectors need not all stay in RAM | Production |
| Couchbase Lite — Hybrid Vector Search; Sync needs Couchbase Edge Server | Compact mobile SDK; LiteCore native lib (single-arch ~10–15MB on Android) | Yes — Sync Gateway + peer-to-peer | Local only — inBatch() local-ACID; no ACID guarantee across sync | Yes — full document/JSON; hybrid SQL++ filters | None found from official sources; verify on target hardware | Production | |
| SQLite ecosystem | sqlite-vec — SQLite extension; brute-force only (no ANN yet) | ~2MB | None built-in | Yes — inherits SQLite ACID | Lighter-weight than a full DB | Author: 32× storage reduction with binary vectors; “fast brute-force” focus; benchmarks shown on M1 Mac mini, not phones | Pre-v1 |
| SQLite-Vector — SQLite extension; commercial license required for production | ~30MB default | Not built-in, but can be paired with CRDT-based SQLite-table-Sync (no vector-native sync!) | Yes — inherits SQLite ACID | Vectors in normal SQL tables alongside other columns | Vendor: “30MB by default” and “query millions of vectors in milliseconds” | Per-vendor; commercial license | |
| libSQL (Turso) — SQLite fork | SQLite-class | Embedded Replicas (writes to remote primary; replicas sync via WAL frames) | Yes — SQLite-class ACID | Full SQL with native vector indexes | No official sources found; SQLite-class baseline | Likely production | |
| Turso Database — same company's in-process Rust rewrite of SQLite (WIP) | Not yet quantified | Experimental | MVCC-based | SQL-compatible (target) | Pre-production; no published benchmarks | Pre-production |
Note: Excluded due to size / minimum requirements or availability: Qdrant Edge announced July 2025 as a re-architected in-process variant (private beta, partner-curated); not publicly available; the publicly distributed Qdrant is a server (~900MB compiled binary). Milvus Lite — Python binary, Linux/macOS only; broader Milvus typically provisioned with multiple GB RAM. DuckDB VSS — analytics-class; ≥125MB RAM/thread minimum, 1–4GB/thread for optimal performance. SQL Server 2025 — server-class: ≥1GB RAM (Express) / ≥4GB (other editions), ≥6GB disk, x64 only.
Why “edge vector database” tech is different from cloud
Most of the columns above probably look familiar from any other database review. The reason this category is genuinely different from typical databases, and cloud / server vector databases in particular, comes down to four things:
- Strict resource limits. In the cloud, performance problems can often be solved by scaling horizontally, adding memory, or moving to a larger instance. On a physical device, the compute, RAM, flash etc. are fixed. That changes the underlying architecture and the dilligence required in development: indexing, query execution, persistence, and sync all need to be efficient by design rather, because you cannot compensate with “throwing resources at the problem”.
- Energy budgets matter. On battery-powered devices, every query, write, compaction, sync, and re-embedding job also competes with the user experience, thermal limits, and battery life – constraints a cloud database usually does not face directly (more costs though…).
- The edge is fragmented. “Edge” can mean a smartphone, an ECU, a PoS terminal, a Linux gateway, an industrial PC, or a microcontroller-class device. These systems vary widely in operating system, CPU architecture, storage, available RAM, update model, security requirements, and connectivity. A credible edge vector database therefore needs more than ANN search; it needs predictable behavior across constrained and heterogeneous environments.
- Sync is hard. I would say harder than search. Vectors are derived data. When source content changes, permissions change, or the embedding model is upgraded, old embeddings may become stale. An edge vector database therefore needs to handle not only local search, but also updates, deletes, re-indexing, and selective sync between device and cloud. This is where a database matters more than a standalone ANN library.
Do you actually need an on-device vector database? When?
As always: It depends. Use on-device vector DBs when (basically when you need Edge Computing):
- you have privacy requirements; data is personal; you face compliance needs
- the app needs to work when offline, or reliably under flaky network conditions
- you want speed (think UX) or you need quaranteed response times (QoS)
- you need to cut networking and cloud costs to make the economics work
Let’s look at some cases where on-device vector databases are truly needed.
Mobile Apps
The strongest mobile use case currently isn’t generic “AI on phones,” but private assistant memory and context for RAG-based apps: AI chats or assistants that can answer questions using personal, app-specific, or domain-specific knowledge, for example in travel, product support, field service, or maintenance.
Notes, messages, files, photos, app activity, preferences, and location-specific history are already on the device. An on-device vector database lets an assistant embed that context locally into an on-device vector DB, retrieve it instantly, and sync only selected data when needed. That makes the experience faster and more private, while keeping the app useful even when connectivity is poor.
Domain-specific knowledge is often not publicly available to a general-purpose AI model. It may only exist inside an app, a downloaded manual, a product catalog, or a company’s technical documentation. In those cases, the app can use this semantic context through a local vector database. For example, a maintenance assistant could store heating-system technical docs on the phone, identify a part or problem from a photo, retrieve the relevant repair instructions, and suggest targeted fixing steps. Added benefit: it still works in the cellar.
Vehicles / ECUs
Vehicles are a strong fit because software-defined vehicles need cloud-scale learning, but in-car execution cannot depend on perfect connectivity. McKinsey says automotive software and electronics are moving toward zonal and central compute architectures for OTA updates, connectivity, and gen AI, with the market reaching $519 billion by 2035. The vector DB role is a compact local memory layer for in-cabin assistants, offline manual search, driver personalization, predictive diagnostics, and retrieval over vehicle logs or VSS-normalized signals. McKinsey’s edge-AI survey reinforces the hybrid stance: stakeholders cited offline availability (39%), latency (35%), privacy/security (20%), and network data cost (6%) as main factors for moving AI onboard; they also flagged SoC constraints (46%) and energy consumption (35%) as limits on what can run in the vehicle. So the answer is not cloud vs. edge; it is local-first retrieval and selective sync to the backend. This is the same position as the ObjectBox / MongoDB architecture: ObjectBox handles low-latency local operations and bi-directional sync connects selected data to MongoDB Atlas for storage, analytics, retraining, and coordination.
Point-of-sales systems
PoS systems often work on premises with flaky network conditions and offline and hybrid payment models improve payment resilience, accepting cash and card payments offline and uploading them after reconnection. A local vector layer makes sense when the PoS wants to improve the service and customer experience with AI features, e.g. with semantic lookup over products, promotions, policies, allergens, prior orders etc. 67% of retail executives expect AI-driven personalization capabilities in 2026, and McKinsey’s 2026 retail research says AI is reshaping discovery and purchase behavior as stores remain important. The pattern is local operations first, cloud analytics later: answer routine queries instantly in-store, then sync selected sales, stock, customer, and personalization data when the network is available.
Bottom line
The bottom line: on-device vector databases are moving from “interesting idea” to a much needed enabler for local AI. Not every app needs one, and many workloads will stay cloud-first, but y hybrid AI approach combining the best of the edge and the cloud is often benefitial. Whenever data is private, latency-sensitive, cost-sensitive, or needed offline, pushing vector search to the device makes a lot of sense. On top, finding the right balance between on-device AI and cloud AI helps save costs, and energy, and is therefore economically and environmentally the most sustainable option. The hard part is not just an ANN search, which a small dedicated lib can easily do; it is efficient persistence, updates, deletes, metadata filtering, sync, footprint, and predictable behavior under real device constraints. If we predict the future from the past, shrinking large server / cloud vector databases to work on edge devices will not work. Instead, this market needs dedicated and highly optimized solutions. Therefore, we believe, it will be won by databases actually engineered for the edge.