The on-device Vector Database for Android and Java

The on-device Vector Database for Android and Java

ObjectBox 4.0 is an on-device vector database allowing Android and Java developers to enhance their apps with local AI capabilities. A vector database facilitates advanced vector data processing and analysis, such as measuring semantic similarities across different document types like images, audio files, and texts. A classic use case would be to enhance a Large Language Model (LLM), or a small language model (like e.g. the Phi-3), with your domain expertise, your proprietary knowledge, and / or your private data. Combining the power of AI models with a specific knowledge base empowers high-quality, perfectly matching results a generic model simply cannot provide. This is called “retrieval-augmented generation” (RAG). Because ObjectBox works on-device, you can now do on-device RAG with data that never leaves the device and therefore stays 100% private. This is your chance to explore this technology on-device.

Vector Search (Similarity Search)

With this release it is possible to create a scalable vector index on floating point vector properties. It’s a very special index that uses an algorithm called HNSW. It’s scalable because it can find relevant data within millions of entries in a matter of milliseconds.

We pick up the example used in our vector search documentation. In short, we use cities with a location vector to perform proximity search. Here is the City entity and how to define a HNSW index on the location:

Vector objects are inserted as usual (the indexing is done automatically behind the scenes):

To perform a nearest neighbor search, use the new nearestNeighbors(queryVector, maxResultCount) query condition and the new “find with scores” query methods (the score is the distance to the query vector). For example, let’s find the 2 closest cities to Madrid:

Vector Embeddings

In the cities example above, the vectors were straight forward: they represent latitude and longitude. Maybe you already have vector data as part of your data. But often, you don’t. So where do you get the vector emebeddings of texts, images, video, audio files from?

For most AI applications, vectors are created by a embedding model. There are plenty of embedding models to choose from, but first you have to decide if it should run in the cloud or locally. Online embeddings are the easier way to get started and great for first testing; you can set up an account at your favorite AI provider and create embeddings online (only).

Depending on how much you care about privacy, you can also run embedding models locally and create your embeddings on your own device. There are a couple of choices for desktop / server hardware, e.g. check these on-device embedding models. For Android, MediaPipe is a good start as it has embedders for text and images.

Updated open source benchmarks 2024 (CRUD)

A new release is also a good occasion to update our open source benchmarks. The Android performance benchmark app provides many more options, but here are the key results:

CRUD is short for the basic operations a database does: create, read, update and delete. It’s an important metric for the general efficiency of a database.

Disclaimer 1: our focus is the “Object” performance (you may find a hint for that in our product name 🙂); so e.g. relational systems may perform a bit better when you directly work with raw columns and rows.

Disclaimer 2: ObjectBox delete performance was cut off at 800k per second to keep the Y axis within reasonable bounds. The actually measured value was 2.5M deletes per second.

Disclaimer 3: there cannot be enough disclaimers on any performance benchmark. It’s a complex topic where details matter. It’s best if you make your own picture for your own use case. We try to give a fair “arena” with our open source benchmarks, so it could be a starting point for you.

Feedback and Outlook: On-device vector search Benchmarks coming soon

We’re still working on a lot of stuff (as always 😉 and with on-device / local vector search being a completely new technology for Android, we need your feedback, creativity and support more than ever. We’ll also soon release benchmarks on the vector search. Follow us on LinkedIn, GitHub, or Twitter to keep up-to-date.

Vector search: making sense of search queries

Vector search: making sense of search queries

Today, finding the most valuable information to your search is more complicated than finding a needle in a haystack. Traditional search engines match keywords and favor SEO-optimized content, but what if there was a way for search engines to truly understand the meaning behind our queries? Enter vector search – a powerful technology that is transforming how we navigate information, not just for users, but also for applications performing background searches. In this article, we will discuss what vector search is and how it works.

What is a vector search and why should you care?

Example Results with a traditional search for “Simple Fruit Cake”.

Vector search, which is also known as semantic search, is a technology that improves search accuracy by understanding the meaning (semantics) of the data and relations between its parts. Unlike traditional search, vector search efficiently handles synonyms, typos, ambiguous language, and broad or fuzzy queries. This is because it focuses on meaning, not just keywords.

Imagine that you are searching for a dessert to cook during the weekend. In a traditional search engine, the “simple fruit cake” query will reveal only websites that include these keywords. However, a vector search engine is able to provide results like “apple pie in 20 minutes” or “easy summer desserts”, which capture the essence of the query and align with your desire for a straightforward dessert option, providing more valuable results to you. 

At its core, vector search uses Large Language models (LLMs), like GPT, to transform data into mathematical vectors, also known as vector embeddings

What is a vector embedding?

2D Vector Space Representation. “Easy apple pie” is close to “simple fruit cake” as they are both simple and have fruit as an ingredient. “Easy chocolate mousse” shares simplicity but does not contain fruit. “Fancy plum cake” has fruit but is not simple to make. And “extravagant chocolate mousse” does not share either simplicity or fruit as an ingredient. Thus, it is the farthest from “simple fruit cake”.

A vector or vector embedding is a numerical representation of any kind of unstructured data (e.g. texts, images, videos, audio). It captures its meaning while being easy and efficient to compute with. Think of it like this: imagine you have a collection of cake recipes. You can convert each recipe into a vector embedding, which is like a unique numerical code that represents the recipe’s characteristics (ingredients, cooking methods, flavors, etc.).

Once all the recipes are encoded into embeddings, we can perform a similarity search. This means we can compare the vectors to see how similar the recipes are. For example, the vector for an easy apple pie recipe would be close to the vector for a simple fruit cake recipe because they share similar characteristics (e.g. simplicity, fruitiness). On the other hand, the vector for an extravagant chocolate mousse cake would be farther away because it involves different ingredients and methods.

How to compare vectors?

Vector similarity is a measure of how similar two vectors are (see ep. 4 of ObjectBox Bites). There are three ways to compare vectors: Jaccard Similarity, Cosine Similarity, and L2 Distance (also known as Euclidean distance). Jaccard Similarity calculates the ratio of elements that are common to both vectors divided by the total number of elements in both vectors. Cosine Similarity calculates the cosine of the angle between two vectors. The last method is the L2 distance. It calculates the straight-line distance between two points in space represented by the vectors. This is the most frequently used method in AI applications. It is important to note that the choice of vector comparison method does not affect the mechanics of similarity search.

What is a vector database and how is it related to vector search?

A vector database is a specialized database designed to store, manage, and search vectors efficiently. This efficiency is crucial for handling large datasets and performing fast vector similarity searches. Also, with a vector database, the knowledge of AI models can be improved, adapted, and updated. Therefore, today, most AI apps use a vector database.

Imagine having an AI that knows your habits, your preferences, your health data, maybe even what’s in your fridge, and can use this knowledge to suggest recipes that fit your lifestyle and individual preferences. A standard AI model doesn’t have that data and wouldn’t learn that way, but with a vector database it can. Now, when you search for a “fruit cake recipe”, using this data, it can suggest a “simple fruit cake” without sugar if you usually prefer quick, easy, and healthy recipes, or a “fancy plum cake” if you enjoy more challenging baking projects and don’t like apples. Or, a vegan option, if you have neither milk nor eggs left in the fridge.

This technique is called Retrieval-Augmented Generation (RAG). It enhances the capabilities of LLMs with additional data (e.g. personal data, company data, fresh data) stored in a vector database.

When you query a vector database, it uses the query’s vector representation to find the nearest neighbors in the database.

Nearest Neighbor Search

How do we find the nearest neighbor to our query vector? The most straightforward approach is a brute-force search. It calculates the distance between our query vector and all other vectors in the database, one by one. Any metrics discussed in “How to compare vectors” can be used. However, this brute-force approach has a time complexity of O(N*d), where N is the number of vectors and d is the dimensionality. This becomes computationally expensive for large datasets.

Since exact nearest neighbor search can be slow for massive datasets, we often turn to approximate nearest neighbor (ANN) algorithms. These algorithms prioritize efficiency by finding neighbors that are very close (but not necessarily the absolute closest) to the query vector, significantly reducing search time. 

Continuing with the cooking assistant app example, imagine you’re searching for a “fruit cake recipe”. Assume that in our database, the real closest recipe is “simple apple pie”. With a massive database, an exact nearest neighbor search might take a long time to find the perfect match. However, an ANN algorithm can quickly find a recipe that is very similar to what you’re looking for, such as a “simple fruit cake” or a “basic apple pie”, even if it might not be the exact closest match. This efficiency ensures you get relevant and useful recipe suggestions promptly, enhancing your overall experience without a noticeable compromise in quality.

Approximate Nearest Neighbour Search

Now, let’s delve into the world of Approximate Nearest Neighbor (ANN) algorithms. The way you search for nearest neighbors depends on how the data is stored in the vector database. One of the earliest ANN algorithms, established in 1975, is called k-d trees. These trees work by recursively splitting the data space using hyperplanes, making the search process more efficient (see ep. 5 of ObjectBox Bites). However, k-d trees, like many exact nearest neighbor algorithms, suffer from the dimensionality curse. This means that as the number of dimensions (features) in your data increases, the distance between points becomes less meaningful, making searching very slow in high-dimensional spaces like those used in vector databases. 

For instance, consider simple fruit recipes. With a few features, such as cooking time and number of ingredients, finding similar recipes would be relatively straightforward. However, if we also include many other features like sweetness level, calorie count, fruit type, all specific ingredients, preparation complexity, and user ratings, the number of dimensions increases significantly. In such high-dimensional spaces, the traditional k-d tree method becomes inefficient because the distances between points (recipes) become less distinct and meaningful.

To overcome this challenge, ANN algorithms leverage two main approaches: indexing methods and sketching methods. Indexing methods work by creating a hierarchical data structure that allows for faster exploration of the search space. Imagine a well-organized library with categorized sections instead of just randomly placed books.  Sketching methods, on the other hand,  don’t search the entire dataset directly.  Instead, they create compressed versions (sketches) of the data that are faster to compare with the query vector. This reduces the search time significantly. Often, these two approaches are combined for optimal performance.


A popular example of an ANN search implementation for high-dimensional data is the Hierarchical Navigable Small World (HNSW) algorithm (e.g. implemented in Azure AI). HNSW relies on graph-based indexing to efficiently navigate the data space and find nearest neighbors. For more details watch episodes 6, 7, and 8 of ObjectBox Bites miniseries, where we describe the fundamentals of HNSW.

Take-away notes

To sum up, vector search offers a significant leap forward in how we search for information. By understanding the meaning and relationships behind data, it delivers more relevant and accurate results, even for unstructured data and complex queries. This technology has the potential to revolutionize various fields, from enhancing search engines to empowering AI applications. As vector search continues to evolve, we can expect even more exciting possibilities for navigating the ever-growing ocean of information and unlocking its full potential. This includes operating with data directly on the devices it was created on, reducing cloud costs, eliminating the reliance on an internet connection, and opening up using your private data without it ever being shared (100% private). If you’re interested in other AI and vector database-related topics, check out the ObjectBox mini-series. Stay tuned for more articles in the future.

Evolution of search: traditional vs vector search

Evolution of search: traditional vs vector search

Introduction

In today’s digital landscape, searching for information is integral to our daily lives, whether for education, research, work, or shopping. However, as the volume and complexity of data kept growing, traditional search methods faced more and more challenges in providing accurate and relevant results. That’s where vector search comes in. We’re already seeing Google changing its search engine to empower the vector search (RankBrain, BERT, Neural matching), and expecting even greater incorporation of AI tools to improve search experience. Let’s explore the differences between traditional (keyword) search and vector search to understand how these technologies are shaping our search experiences, and how this impacts the discoverability of any content you might produce.

Traditional (keyword) search

Traditional search performs exact keyword matching from user queries to the data to retrieve relevant results. For example, searching for “programming languages” with traditional search will list every source containing those words. A more advanced version can also incorporate additional rules to enhance search results, such as:

  • keyword frequency (how often the term “programming languages” is used within the result text),
  • the presence of related terms (e.g. “Java”, “Python”, “C++” versus “cooking”, “gardening”),
  • or location (results closer to your location are favored).

While this approach has served us well, it struggles with ambiguous language, synonyms as well as the impact of SEO strategies, often resulting in less accurate or less valuable search results. This can be especially frustrating for businesses who are trying to get their content seen by the right people. For example, a business that publishes a blog post about “sustainable fashion tips” might miss out on potential customers who are searching for “eco-friendly clothing recommendations” or “green clothing ideas,” simply because their keywords don’t exactly match.

Vector (semantic) search

2D Vector Space Representation. In this space, “Python” and “Java” are close to each other as well as to the “Programming language” query we are searching for because they are similar (they share high values in their features).


On the other hand, vector search takes a different approach by seeking out related objects that share similar characteristics or semantics. You can think of it as finding results based on meaning or understanding rather than just exact wording. For example, searching for “programming languages” with vector search will not only find sources mentioning those exact words but also identify specific languages like “Python” or “Java” as well as related concepts such as “coding tutorials” or“development frameworks”.

To do a vector search, first of all, the content, such as texts, images, audio files, or videos, needs to be represented as vectors/embeddings (also often called vector embeddings) by AI models. These embeddings represent data in a multidimensional vector space. Vectors capture the essence or semantics of the data they represent while remaining computationally efficient.

Once these vector representations are generated, they are basically sets of numbers, and therefore easy to compute with. For instance, instead of searching for a specific word in text, we aim to find the closest vector (from the text embeddings) to the query vector (representing the word we’re searching for). This process relies on well-established vector computing methods, such as calculating the distance between vectors or minimizing the angle between them (Cosine similarity).

Comparison

Let’s now compare different aspects of searching to understand the main differences between traditional search and vector search (also summarized in Table). 

  • Search Approach
    In traditional search, the approach relies on matching keywords directly from the user query to the content. Vector search uses vector embeddings to catch the semantics of the data to perform a meaning-based approach.
  • Ambiguity handling
    Therefore, vector search shines when it comes to handling ambiguity. It is superior for handling synonyms, ambiguous language, and broad or fuzzy queries compared to traditional search. This also automatically influences the relevance of the search results.
  • Search relevance calculation
    The metrics used for search relevance calculations are different. The traditional search uses term frequency-inverse document frequency (TF-IDF) and BM25, while vector search uses Jaccard Similarity, Cosine Similarity, and L2 Distance (or Euclidean).
  • Speed and Implementation
    Traditional search is easy to implement, straightforward in usage, and fast for simple queries. Vector search may be slower for simple queries and more complicated to implement, once it comes to huge datasets. However, the implementation of approximate nearest neighbor techniques (ANN) allows to significantly speed up the process.
  • Scalability
    Continuous expansion of content, challenges the scalability of traditional search, while for vector search scalability is one of the advantages. 
  • Cost
    While traditional search may have lower computational requirements, the superior performance and accuracy of vector search often justify the investment in additional computing power. Furthermore, the computational costs for vector search can be significantly reduced with the use of ANN.
Comparison table between keyword search and vector search

Conclusions

In summary, both traditional search and vector search offer distinct advantages and drawbacks. Vector search excels in handling ambiguity, correcting typos, enhancing relevance, and managing extensive datasets. Traditional search remains advantageous for straightforward queries, exact matches, or smaller datasets. Historically, limited computational resources, particularly for on-device computation (i.e. Edge Computing), favored traditional search. However, the landscape is evolving rapidly with the introduction of the first edge vector database solution by ObjectBox. This innovation promises to revolutionize the scenario by optimizing vector search for devices with constrained resources, extending the benefits of semantic search to the Edge.

On-device Vector Database for Dart/Flutter

On-device Vector Database for Dart/Flutter

ObjectBox 4.0 introduces the first on-device vector database for the Dart/Flutter platform, allowing Dart developers to enhance their apps with AI in ways previously not possible. A vector database facilitates advanced data processing and analysis, such as measuring semantic similarities across different document types like images, audio files, and texts. If you want to go all-in with on-device AI, combine the vector search with a large language model (LLM) and make the two interact with individual documents. You may have heard of this as “retrieval-augmented generation” (RAG). This is your chance to explore this as one of the first Dart developers.

Vector Search for Dart/Flutter

Now, let’s look into the Dart specifics! With this release, it is possible to create a scalable vector index on floating point vector properties. It’s a very special index that uses an algorithm called HNSW. It’s highly scalable and can find relevant data within millions of entries in a matter of milliseconds.

Let’s have a deeper look into the example used in our vector search documentation. In this example, we use cities with a location vector to perform proximity search. Here is the City entity and how to define a HNSW index on the location (it would also need additional properties like an ID and a name, of course):

Vector objects are inserted as usual (the indexing is done automatically behind the scenes):

To perform a nearest neighbor search, use the new nearestNeighborsF32(queryVector, maxResultCount) query condition and the new “find with scores” query methods (the score is the distance to the query vector). For example, to find the 2 closest cities:

Vector Embeddings

In the cities example above, the vectors were straight forward: they represent latitude and longitude. Maybe you already have vector data as part of your data, but often, you don’t. So where do you get the vectors from?

For most AI applications, vectors are created by a so-called embedding model. There are plenty of embedding models to choose from, but first you have to decide if it should run in the cloud or locally. Online embeddings are the easier way to get started. Just set up an account at your favorite AI provider and create embeddings online. Alternatively, you can also run your embedding model locally on device. This might require some research. A good starting point for that may be TensorFlow lite, which also has a Flutter package. If you want to use really good embedding models (starting at around 90 MB), you can also check these on-device embedding models. These might require a more capable inference runtime though. E.g. if you are targeting desktops, you could use ollama (e.g. using this package).

CRUD benchmarks 2024

A new release is also a good occasion to refresh our open source benchmarks. Have a look:

CRUD is short for the basic operations a database does: Create, Read, Update and Delete. It’s an important metric for the general efficiency of a database.

What’s next?

We are excited to see what you will build with the new vector search. Let us know! And please give us feedback. It’s the very first release of an on-device vector database ever – and the more feedback we get on it, the better the next version will be.

The first On-Device Vector Database: ObjectBox 4.0

The first On-Device Vector Database: ObjectBox 4.0

The new on-device vector database enables advanced AI applications on small restricted devices like mobile phones, Raspberry Pis, medical equipment, IoT gadgets and all the smart things around you. It is the missing piece to a fully local AI stack and the key technology to enable AI language models to interact with user specific data like text and images without an Internet connection and cloud services.

An AI Technology Enabler

Recent AI language models (LLMs) demonstrated impressive capabilities while being small enough to run on e.g. mobile phones. Recent examples include Gemma, Phi3 and OpenELM. The next logical step from here is to use these LLMs for advanced AI applications that go beyond a mere chat. A new generation of apps is currently evolving. These apps create “flows” with user specific data and multiple queries to the LLM to perform complex tasks. This is also known as RAG (retrieval augmented generation), which, in its simplest form, allows one to chat with your documents. And now, for the very first time, this will be possible to do locally on restricted devices using a fully fledged embedded database.

What is special about ObjectBox Vector Search?

We know restricted devices. Where others see limitations, we see the potential and we have repeatedly demonstrated creating superefficient software for these. And thus maximizing speed, minimizing resource use, saving battery life and CO2. With this knowledge, we approached vector search in a unique way.

Efficient memory management is the key. The challenge with vector data is that on the one hand, it consumes a lot of memory – while on the other hand, relevant vectors must be present in memory to compute distances between vectors efficiently. For this, we introduced a special multi-layered caching that gives the best performance for the full range of devices; from memory-constrained small devices to large machines that can keep millions of vectors in memory. This worked out so well that we saw ObjectBox outperform several vector databases built for servers (open source benchmarks coming soon). This is no small feat given that ObjectBox still holds up full ACID properties, e.g. caching must be transaction-aware.

Also, keep in mind that ObjectBox is a fully capable database that allows you to store complex data objects along with vectors. From an ObjectBox data model point of view, a vector is “just” another property type. This allows you to store all your data (vectors along with objects) in a single database. This “one database” approach also includes queries. You can already combine vector search with other conditions. Note that some limitations still apply with this initial release. Full hybrid search is close to being finished and will be part of one of the next releases.

In short, the following features make ObjectBox a unique vector database:

  • Embedded Database that runs inside your application without latency
  • Vector search based is state-of-the-art HNSW algorithm that scales very well with growing data volume
  • HNSW is tightly integrated within our internal database. Vector Search doesn’t just run “on top of database persistence”.
  • With this deep integration we do not need to keep all vectors in memory.
  • Multi-layered caching: if a vector is not in-memory, ObjectBox fetches it from disk.
  • Not just a vector database: you can store any data in ObjectBox, not just vectors. You won’t need a second database.
  • Low minimum hardware requirements: e.g. an old Raspberry Pi comfortably runs ObjectBox smoothly.
  • Low memory footprint: ObjectBox itself just takes a few MB of memory. The entire binary is only about 3 MB (compressed around 1 MB).
  • Scales with hardware: efficient resource usage is also an advantage when running on more capable devices like the latest phones, desktops and servers.
  • ObjectBox additionally offers commercial editions, e.g. a Server Cluster mode, GraphQL, and of course, ObjectBox Sync, our data synchronization solution.

Why is this relevant? AI anywhere & anyplace

With history repeating itself, we think AI is in a “mainframe era” today. Just like clunky computers from decades before, AI is restricted to big and very expensive machines running far away from the user. In the future, AI will become decentralized, shifting to the user and their local devices. To support this shift, we created the ObjectBox vector database. Our vision is a future where AI can assist everyone, anytime, and anywhere, with efficiency, privacy, and sustainability at its core.

What do we launch today?

Today, we are releasing ObjectBox 4.0 with Vector Search for a variety of languages:

*) We acknowledge Python’s popularity within the AI community and thus have invested significantly in our Python binding over the last months to make it part of this initial release. Since we still want to smooth out some rough edges with Python, we decided to label Python an alpha release. Expect Python to quickly catch up and match the comfort of our more established language bindings soon (e.g. automatic ID and model handling).

Let’s get you started right away? Check our Vector Search documentation to see how to use it!

One more thing: ObjectBox Open Source Database (OSS)

We are also very happy to announce that we will fully open source the core of ObjectBox. As a company we follow the open core model. Since we still have some cleaning up to do, this will happen in one of the next releases, likely 4.1.

“Release week”

With today’s initial releases, we are far from done yet. Starting next Tuesday, you can  expect additional announcements from us. Follow us to get the news as soon as it is released.

What’s next?

This is our very first version of a “vector database”. And while we are very happy with this release, there are still so many things to do! For example, we will optimize vector search by adding vector quantization and integrate it more tightly with our data synchronization. We are also focusing on expanding our solution’s reach through strategic partnerships. If you think you are a good fit, let us know. And as always, we are very eager to get some feedback from you! Take care.

SQLite and SQLite alternatives – a comprehensive overview

SQLite and SQLite alternatives – a comprehensive overview

SQLite and SQLite alternatives - databases for the Mobile and IoT edge

Overview of SQLite and SQLite alternatives as part of the mobile / edge database market with a comprehensive comparison matrix (updated summer 2023)

Digitalization is still on the rise, as is the number of connected devices (from 13 billion connected IoT devices + 15 billion mobile devices operating in 2021 already). Data volumes are growing accordingly ( 3.5 quintillion bytes of data is produced daily in 2023), and centralised (typically cloud-based) computing canbot support all the current needs. This has led to a shift from the cloud to the edge

Therefore, there is a renewed need for on-device databases like SQLite and SQLite alternatives to persist and manage data on edge devices. On top, due to the distributed nature of the edge, there is a need to manage data flows to / from and between edge devices. This can be done with Edge Databases that provide a Data Sync functionality (SQLite alternatives only, as SQLite doesn’t support this).  Below, we’ll take a close look at SQLite and its alternatives with consideration of today’s needs.

Databases for the Edge

While being quite an established market with many players, the database market is still growing consistently and significantly. The reason is that databases are at the core of almost any digital solution, and directly impact business value and therefore never going out of fashion. With the rapid evolvements in the tech industry, however, databases evolve too. This, in turn, yields new database types and categories. We have seen the rise of NoSQL databases in the last 20 years, and more recently some novel database technologies, like graph databases and time-series databases, and vector databases.

With AI and accordingly vector databases being all the hype since 2022/2023, the database market is indeed experiencing fresh attention. Due to the speed with which AI is evolving, we’re however already leaving the “mainframe era of AI” and entering the distributed Edge AI space. With SQLite not supporting vector search and related vector database functions, this adds a new dimension to this ever-present topic.  There is a new, additional need for local, on-device vector databases to support on-device AI that’s independent of an Internet connection, reliably fast, and keeps data on the device (100% private). 

We’re expecting vector databases that run locally on a wide variety of devices (aka Edge Vector Databases) to become the next big thing, surpassing even what we have seen happening in the server vector database space. And we wouldn’t be astonished if the synchronizing of vector data is a game changer for Edge AI. Time will tell 😉


Both, the shift back from a centralised towards a decentralised paradigm, and the growing number of restricted devices call for a “new type” of an established database paradigm. SQLite has been around for more than 20 years and for good reason, but the current market shift back to decentralized computing happens in a new environment with new requirements. Hence, the need for a “new” database type, based on a well-established database type: “Edge databases”. Accordingly, a need for SQLite alternatives that consider the need for decentralized data flows and AI functionalities (depending on the use case of course; after all SQLite is a great database).

database-evolution-towards-edge-vector-databases
What is an Edge Database?

Edge databases are a type of databases that are optimised for local data storage on restricted devices, like embedded devices, Mobile, and IoT. Because they run on-device, they need to be especially resource-efficient (e.g. with regards to battery use, CPU consumption, memory, and footprint). The term “edge database” is becoming more widely-used every year, especially in the IoT industry. In IoT, the difference between cloud-based databases and ones that run locally (and therefore support Edge Computing) is crucial.

What is a Mobile Database?

We look at mobile databases as a subset of edge databases that run on mobile devices. The difference between the two terms lies mainly in the supported operating systems / types of devices. Unless Android and iOS are supported, an edge database is not really suited for the mobile device / smartphone market. In this article, we will use the term “mobile database” only as “database that runs locally on a mobile (edge) device and stores data on the device”. Therefore, we also refer to it as an “on-device” database.

What are the advantages and disadvantages of working with SQLite?

SQLite is a relational database that is clearly the most established database suitable to run on edge devices. Moreover, it is probably the only “established” mobile database. It was designed in 2000 by Richard Hipp and has been embedded with iOS and Android since the beginning. Now let’s have a quick look at its main advantages and disadvantages:

Advantages  Disadvantages
  • 20+ years old (should be stable ;))
  • Toolchain, e.g. DB browser
  • No dependencies, is included with Android and iOS
  • Developers can define exactly the data schema they want
  • Full control, e.g. handwritten SQL queries
  • SQL is a powerful and established query language, and SQLite supports most of it
  • Debuggable data: developers can grab the database file and analyse it
  • 20+ years old ( less state-of-the-art tech)
  • Using SQLite means a lot of boilerplate code and thus inefficiencies ( maintaining long running apps can be quite painful)
  • No compile time checks (e.g. SQL queries)
  • SQL is another language to master, and can impact your app’s efficiency / performance significantly…
  • The performance of SQLite is unreliable
  • SQL queries can get long and complicated
  • Testability (how to mock a database?)
  • Especially when database views are involved, maintainability may suffer with SQLite

 

What are the SQLite alternatives?

There are a bunch of options for making your life easier, if you want to use SQLite. You can use an object abstraction on top of it, an object-Relational-Mapper (ORM), for instance greenDAO, to avoid writing lots of SQL. However, you will typically still need to learn SQL and SQLite at some point. So what you really want is a full blown database alternative, like any of these: Couchbase Lite, Interbase, LevelDB, ObjectBox, Oracle Berkeley DB, Mongo Realm, SnappyDB, SQL Anywhere, or UnQLite.

While SQLite really is designed for small devices, people do run it on the server / cloud too. Actually, any database that runs efficiently locally, will be highly efficient on big servers too, making them a sustainable lightweight choice for some scenarios. However, for server / cloud databases, there are a lot of alternatives you can use as a replacement like e.g. MySQL, MongoDB, or Cloud Firestore.

Bear in mind that, if you are looking to host your database in the cloud with apps running on small distributed devices (e.g. mobile apps, IoT apps, any apps on embedded devices etc.), there are some difficulties. Firstly, this will result in higher latency, i.e. slow response-rates. Secondly, the offline capabilities will be highly limited or absent. As a result, you might have to deal with increased networking costs, which is not only reflected in dollars, but also CO2 emissions. On top, it means all the data from all the different app users is stored in one central place. This means that any kind of data breach will affect all your and your users’ data. Most importantly, you will likely be giving your cloud / database provider rights to that data. (Consider reading the general terms diligently). If you care about privacy and data ownership, you might therefore want to consider a local database option, as in an Edge Database. This way you can decide, possibly limit, what data you sync to a central instance (like the cloud or an on-premise server).

SQLite alternatives Comparison Matrix

To give you an overview, we have compiled a comparison table including SQLite and SQLite alternatives. In this matrix we look at databases that we believe are apt to run on edge devices. Our rule of thumb is the databases’ ability to run on Raspberry Pi type size devices. If you’re reading this on mobile, click here to view the full matrix.

Edge Database Short description License / business model Android / iOS* Type of data stored Central Data Sync P2P Data Sync Offline Sync (Edge) Data level encryption Flutter / Dart support Vector Database (AI support) Minimum Footprint size Company
SQLite C programming library; probably still 90% market share in the small devices space (personal assumption) Public domain embedded on iOS and Android Relational No No No No, but option to use SQLCipher to encrypt SQLite Flutter plugins (ORMs) for SQLite, but nothing from Hwaci No, but various early & unofficial extensions are available < 1 MB Hwaci
Couchbase Mobile / Lite Embedded / portable database with P2P and central synchronization (sync) support; pricing upon request; some restrictions apply for the free version. Secure SSL. Partly proprietary, partly open-source, Couchbase Lite is BSL 1.1 Android / iOS JSON Documents / NoSQL db Yes Yes No Database encryption with SQLCipher (256-bit AES) Unofficial Flutter plugin for Couchbase Lite Community Edition No < 3,5 MB Couchbase
InterBase ToGo / IBLite Embeddable SQL database. Proprietary Android / iOS Relational No No No 256 bit AES strength encryption No No < 1 MB Embarcadero
LevelDB Portable lightweight key-value store, NoSQL, no index support; benchmarks from 2011 have been removed unfortunately New BSD Android / iOS Key-value pairs / NoSQL db No No No No Unofficial client that is very badly rated No < 1 MB LevelDB Team
LiteDB A .Net embedded NoSQL database MIT license Android / iOS (with Xamarin only) NoSQL document store, fully wirtten in .Net No No No Salted AES No No < 1 MB LiteDB team
Mongo Realm / Realm DB (acquired by Mongo in 2019) Embedded object database SSPL Android / iOS Object Database Yes (Mongo Atlas), tied to using Mongo DB No No Yes Officially released a Flutter binding in spring 2023 (See Flutter databases) No 5 MB+ MongoDB Inc.
ObjectBox NoSQL Edge Vector Database with out-of-the-box Data Sync for Mobile and IoT; fully ACID compliant; benchmarks available as open source. Open Core (plus Apache 2.0 bindings) Android / iOS / Linux / Windows / any POSIX Object-oriented NoSQL edge database for high-performance on Mobile, IoT, and embedded devices Yes WIP Yes transport encryption; additional encryption upon request Yes First local vector database fo on-device Edge AI released May 2024 < 1 MB ObjectBox
Oracle Database Lite Portable with P2P and central sync support as well as support for sync with SQLite Proprietary Android / iOS Relational Yes Yes No 128-bit AES Standard encrytion No No < 1 MB Oracle Corporation
SQL Anywhere Embedded / portable database with central snyc support with a stationary database, pricing now available here Proprietary Android / iOS Relational Yes, tied to using other SAP tech though (we believe) No No AES-FIPS cipher encryption for full database or selected tables No No   SAP (originally Sybase)
UnQLite Portable lightweight embedded db; self-contained C library without dependency. 2-Clause BSD Android / iOS Key-value pairs / JSON store / NoSQL db No No No 128-bit or 256-bit AES standard encryption not yet; might be coming though; there was a 0.0.1 released some time ago No ~ 1.5 MB Symisc systems
extremeDB Embedded relational database Proprietary iOS In-memory relational DB, hybrid persistence No No No AES encryption No No < 1 MB McObject LLC
redis DB High-performance in-memory Key Value store with optional durability Three clause BSD license, RSAL and Proprietary No K/V in-memory store, typically used as cache No No No TLS/SSL-based encryption can be enabled for data in motion. Unofficial redis Dart client available No on-device vector database, but cloud vector support An empty instance uses ~ 3MB of memory redislabs (the original author of redis left in 2020)
Azure SQL Edge  Designed as a SQL database for the IoT edge; however, due to the footprint it is no Edge Database Proprietary No Relational DB for IoT No No No will provide encryption No Not on-device 500 MB+ Microsoft

If you are interested in an indication of the diffusion rate of databases, check out the following database popularity ranking: http://db-engines.com/en/ran. If you are interested to learn more about SQLite, there is a great Podcast interview with Richard Hipp that is worthwhile listening to.

Is there anything we’ve missed? What do you agree and disagree with? Please share your thoughts with us via Twitter or email us on contact[at]objectbox.io. 

Make sure to check out the ObjectBox Database & try out ObjectBox Sync. You can get started in minutes and it’s perfect if you are using an object-oriented programming language, as it empowers you to work with your objects within the database. More than 1,000,000 developers already use this Edge Database designed specifically for high performance on small, connected, embedded devices.