Data Sync Alternatives: Offline vs. Online Solutions

by Anastasia | Feb 5, 2025 | Data Sync, Edge Database, Mobile Database, ObjectBox, Open Source, SQlite

Ever waited to order or pay with a waiter holding their ordering device in the air for a signal? These moments show why offline-first Data Sync is essential. With more and more services relying on the availability of on-device apps and the IoT market projected to hit $1.1 trillion by 2026, choosing the right solution – particularly online-only or offline-first data sync – is more crucial than ever. In this blog, we discuss their differences and highlight common Data Sync alternatives.

What is Data Sync?

Data synchronization (Sync) aligns data between two or more devices to maintain consistency over time. It is an essential component in applications ranging from IoT and mobile apps to cloud computing. Challenges in data synchronization include asynchrony, conflicts, and managing data across flaky networks.

Data Sync vs. Data Replication

Data Synchronization is often confused with Data Replication. Nevertheless, they serve different purposes:

Data Replication: A unidirectional process (works in one direction only) that duplicates data across storage locations to ensure availability and prevent loss. It is simple but limited in its application, and efficiency, and lacks conflict management.
Data Synchronization: A bidirectional process that harmonizes all or a subset of data between two or more devices. It ensures consistency across devices and entails conflict resolution. It is inherently more complex but also more flexible.

Online vs Offline Solutions: Why Offline Sync Matters

Online-only synchronization solutions rely entirely on cloud infrastructure, requiring a stable internet connection to function. While these tools offer simplicity and scalability, their dependency on constant cloud connectivity brings limitations: Online Data Sync solutions cannot guarantee response rates and their speed varies depending on the network. They do not work when offline or in on-premise settings. Using an Online Sync solution often entails sharing the data and might not comply with data privacy requirements. So, do read the terms and conditions.

Offline-first solutions (offline Sync) focus on local data storage and processing, ensuring the app remains fully functional even without an internet connection. When a network is available, the app synchronizes seamlessly with a server, the cloud, or other devices as needed. These solutions are ideal for on-premise scenarios with unreliable or no internet access, mission-critical applications that must always operate, real-time and high-performance use cases, as well as situations requiring high data privacy and data security compliance.

A less discussed, but in our view also relevant point, is sustainability. While there might be exceptions depending on the use case, for most applications offline-first solutions are more resourceful and therefore more sustainable. If CO2 footprint or battery usage is of concern to you, you might want to look into offline-first Data Sync alternatives.

Now, let’s have a look at current options:

Data Sync Alternatives

(If you are on mobile, click here for a view that’s optimized for mobile)

Solution

Company

Type

Offline Support

Self-hosted Sync

Decentralized Sync

Database

Type of DB

OS/Platforms

Languages

Open-Source Component

License

Other Considerations

Country

Firebase

Google
(Firebase was acquired by Google in 2014)

Online

Local cache only, no persistence, syncs when online

❌

Cloud: Firebase Realtime Database; Edge: Only caching, no DB (called Firestore)

Document store

iOS, Android, Web

Java
JavaScript
Objective-C
Swift
Kotlin
C++
Dart
C#
Python, Go, Node.js

❌

proprietory

Tied to Google Cloud, requires internet connectivity

🇺🇸

Supabase

Online

Limited

✅

❌

Cloud DB: PostgreSQL

Relational document store

Primarily a cloud solution

JavaScript/TypeScript
Flutter/Dart
C#
Swift
Kotlin
Python

✅

Apache License 2.0

Supabase is mainly designed as a SaaS, for use cases with constant connectivity

🇸🇬

ObjectBox Sync

ObjectBox

Offline-first

✅

In development

ObjectBox

Object-oriented embedded NoSQL DB

Android, Linux, Ubuntu,
Windows,
macOS, iOS,
QNX, Raspbian,
any POSIX system really,
any cloud (e.g. AWS/Azure/Google Cloud),
bare metal

C
C++
Java
Kotlin
Swift
Go
Flutter / Dart
Python

✅

DB: Open source bindings, Apache 2.0, closed core

Highly efficient (saves CPU, Memory, battery, and bandwidth); fully offline-first, supports on-premise settings, 100% cloud optional

🇩🇪

Couchbase (Lite + Couchbase Sync Gateway)

Couchbase (a merger of Couch One and Membase)

Online

✅

The CE Sync is a bare minimum and typically not usable; Self-hosted Sync with Couchbase Servers is available as part of their Enterprise offering

✅ as part of the Enterprise offering; gets expensive quickly

Edge: Couchbase Lite; Server: Couchbase

Multi-model NoSQL document-oriented database

Couchbase Lite: iOS, Android, macOS, Linux, Windows, Raspbian and Raspberry Pi OS

Couchbase Sync Gateway: Red Hat Enterprise Linux (RHEL) 9.x, Alma Linux 9.x, Rocky Linux 9.x, Ubuntu, Debian (11.x, 12.x), Windows Server 2022

.Net
C
Go
Java
JavaScript info
Kotlin
PHP
Python
Ruby
Scala

✅

Couchbase Lite is available under different licenses; the open source Community Edition does not get regular updates and misses many features especially around Sync (e.g. it does not include Delta Sync making it slow and expensive)

Typically requires Couchbase servers, quickly gets expensive

🇺🇸

MongoDB Realm + Atlas Device Sync

MongoDB
(Realm was acquired by MongoDB in 2019)

Offline-First

✅

Cloud-based sync only

❌

Cloud: MongoDB, Edge: Mongo Realm DB

MongoDB: NoSQL document store; RealmDB: Embedded NoSQL DB

MongoDB: Linux
OS X
Solaris
Windows
Mongo Realm DB:
Android, iOS

more than 20 languages, e.g. Java, C, C#, C++

✅

MongoDB changed its license from open source (AGPL) to MongoDB Inc.’s Server Side Public License (SSPL) in 2018. RealmDB is open source under the Apache 2.0 License. The Data Sync was proprietary.

Deprecated (in Sep 2024); End-of-life in Sep 2025; ObjectBox offers a migration option

🇺🇸

While SQLite does not offer a sync solution out-of-the-box, various vendors have built something on top, or integrated with SQLite giving them offline persistence.

Key Considerations for Choosing a Data Sync Solution

When selecting a synchronization solution, consider:

Connectivity Requirements: Will the application function in offline environments; how will it work with flaky network conditions; how is the user experience when there is intermittent connectivity?
Data Privacy & Security: How critical is it to ensure sensitive data remains local? Data compliance? How important is it that data is not breached?
Scalability and Performance: What are the expected data loads and network constraints? How important is speed for the users? Is there any need to guarantee QoS parameters? How much will the cloud and networking costs be?
Conflict Resolution: How does the solution handle data conflicts?
Delta Sync: Does the solution always synchronize all data or only changes (data delta)? Can a subset of data be synchronized? How efficient is the Sync protocol (affecting costs and speed)?

The Shift Towards Edge Computing

The trend toward Edge Computing highlights the growing preference for offline-first solutions. By processing and storing data closer to its source, Edge Computing reduces cloud dependency, enhances privacy, and improves efficiency. Data synchronization plays an important role in this shift, ensuring seamless operation across decentralized networks.

Offline and online synchronization solutions each have their merits, but the rise of edge computing and data privacy concerns has propelled offline Sync to the forefront. Developers must assess their application’s unique requirements to select the most appropriate synchronization method. As the industry evolves, hybrid and offline-first solutions are going to dominate, offering the best balance of functionality, privacy, and performance.

The first On-Device Vector Database: ObjectBox 4.0

by Markus | May 16, 2024 | Edge AI, Edge Computing, Edge Database, Mobile Database, ObjectBox, Open Source

The new on-device vector database enables advanced AI applications on small restricted devices like mobile phones, Raspberry Pis, medical equipment, IoT gadgets and all the smart things around you. It is the missing piece to a fully local AI stack and the key technology to enable AI language models to interact with user specific data like text and images without an Internet connection and cloud services.

An AI Technology Enabler

Recent AI language models (LLMs) demonstrated impressive capabilities while being small enough to run on e.g. mobile phones. Recent examples include Gemma, Phi3 and OpenELM. The next logical step from here is to use these LLMs for advanced AI applications that go beyond a mere chat. A new generation of apps is currently evolving. These apps create “flows” with user specific data and multiple queries to the LLM to perform complex tasks. This is also known as RAG (retrieval augmented generation), which, in its simplest form, allows one to chat with your documents. And now, for the very first time, this will be possible to do locally on restricted devices using a fully fledged embedded database.

What is special about ObjectBox Vector Search?

We know restricted devices. Where others see limitations, we see the potential and we have repeatedly demonstrated creating superefficient software for these. And thus maximizing speed, minimizing resource use, saving battery life and CO2. With this knowledge, we approached vector search in a unique way.

Efficient memory management is the key. The challenge with vector data is that on the one hand, it consumes a lot of memory – while on the other hand, relevant vectors must be present in memory to compute distances between vectors efficiently. For this, we introduced a special multi-layered caching that gives the best performance for the full range of devices; from memory-constrained small devices to large machines that can keep millions of vectors in memory. This worked out so well that we saw ObjectBox outperform several vector databases built for servers (open source benchmarks coming soon). This is no small feat given that ObjectBox still holds up full ACID properties, e.g. caching must be transaction-aware.

Also, keep in mind that ObjectBox is a fully capable database that allows you to store complex data objects along with vectors. From an ObjectBox data model point of view, a vector is “just” another property type. This allows you to store all your data (vectors along with objects) in a single database. This “one database” approach also includes queries. You can already combine vector search with other conditions. Note that some limitations still apply with this initial release. Full hybrid search is close to being finished and will be part of one of the next releases.

In short, the following features make ObjectBox a unique vector database:

Embedded Database that runs inside your application without latency
Vector search based is state-of-the-art HNSW algorithm that scales very well with growing data volume
HNSW is tightly integrated within our internal database. Vector Search doesn’t just run “on top of database persistence”.
With this deep integration we do not need to keep all vectors in memory.
Multi-layered caching: if a vector is not in-memory, ObjectBox fetches it from disk.
Not just a vector database: you can store any data in ObjectBox, not just vectors. You won’t need a second database.

Low minimum hardware requirements: e.g. an old Raspberry Pi comfortably runs ObjectBox smoothly.
Low memory footprint: ObjectBox itself just takes a few MB of memory. The entire binary is only about 3 MB (compressed around 1 MB).
Scales with hardware: efficient resource usage is also an advantage when running on more capable devices like the latest phones, desktops and servers.
ObjectBox additionally offers commercial editions, e.g. a Server Cluster mode, GraphQL, and of course, ObjectBox Sync, our data synchronization solution.

Why is this relevant? AI anywhere & anyplace

With history repeating itself, we think AI is in a “mainframe era” today. Just like clunky computers from decades before, AI is restricted to big and very expensive machines running far away from the user. In the future, AI will become decentralized, shifting to the user and their local devices. To support this shift, we created the ObjectBox vector database. Our vision is a future where AI can assist everyone, anytime, and anywhere, with efficiency, privacy, and sustainability at its core.

What do we launch today?

Today, we are releasing ObjectBox 4.0 with Vector Search for a variety of languages:

Python* : Github, blog post follows today
Android/Java: GitHub
Dart/Flutter: GitHub
C: GitHub

*) We acknowledge Python’s popularity within the AI community and thus have invested significantly in our Python binding over the last months to make it part of this initial release. Since we still want to smooth out some rough edges with Python, we decided to label Python an alpha release. Expect Python to quickly catch up and match the comfort of our more established language bindings soon (e.g. automatic ID and model handling).

Let’s get you started right away? Check our Vector Search documentation to see how to use it!

One more thing: ObjectBox Open Source Database (OSS)

We are also very happy to announce that we will fully open source the core of ObjectBox. As a company we follow the open core model. Since we still have some cleaning up to do, this will happen in one of the next releases, likely 4.1.

“Release week”

With today’s initial releases, we are far from done yet. Starting next Tuesday, you can expect additional announcements from us. Follow us to get the news as soon as it is released.

What’s next?

This is our very first version of a “vector database”. And while we are very happy with this release, there are still so many things to do! For example, we will optimize vector search by adding vector quantization and integrate it more tightly with our data synchronization. We are also focusing on expanding our solution’s reach through strategic partnerships. If you think you are a good fit, let us know. And as always, we are very eager to get some feedback from you! Take care.

Vector databases – a look at the AI database market with a comprehensive comparison matrix

by Vivien | May 30, 2023 | AI, Open Source, vector database

Vector databases - a look at the AI database market

⭐ What are vector databases? ⭐ What do you need them for? ⭐ Who is in the market? (Updated Oct 2024)

Includes a comparison matrix of vector database options like Pinecone, Milvus, Vespa, Vald, Chroma, Marqo AI, Weaviate, and Qdrant

In 2023 we saw record fundings of vector database players vector database. Since then almost every general purpose database (like MongoDB, elastic, Orcale MySQL etc.) have added a Vector Search and related features, basically making all of the vector databases too. There is an ongoing discussion if pure players are superior, but as always, the right answer is: “it depends”. Any ways, the vector database market is stilly very hot in Q4 of 2024 🔥

Of course, everyone, not just investors, is interested in the booming AI market. While AI applications have dominated the news for quite some time, the infrastructure software that supports these applications, such as vector databases, has finally gained more spotlight. In the following, we’ll have a look at why vector databases are gaining attention and compare current vector database alternatives.

What is a vector database?

A vector database stores vectors, or more precisely vector embeddings. A vector database therefore is a specialised type of database designed to store and manage large sets of vectors efficiently. However, the challenge and value are not derived from simply being able to store vectors. The value is created by the type of computations that can be run over the stored vector data and the speed with which these computations can be run, e.g. similarity searches.

Vector databases are essentially an important piece of the AI tech stack. They can be used e.g. to give LLMs (Large Language Models) – or more broadly speaking, AI applications – a long-term memory and faster search and querying capabilities. Another important use case is RAG (Retrieval-Augmented Generation).

To give some context: The most traditional databases, SQL databases, store data in rows and columns; graph databases store graphs and object databases store objects.

Because Large Language Models and AI applications rely on vector embeddings, vector databases are especially apt at supporting AI applications.

Accordingly, vector databases are becoming a critical layer in the AI tech stack; they are sometimes also called “AI databases”. However, databases tend to converge over time, meaning that many databases support several different database models.

What is a vector embedding?

A vector embedding is a list of numbers that represent objects and relationships, allowing unstructured data (such as images) to be searched and used. Typically, Large Language Models (more precisely the underlying Machine Learning (ML) algorithms) are used to create these vectors. The ML algorithms analyse large amounts of data to learn how to represent complex / unstructured data in a lower dimensional space (as vectors).

What do vector databases have to do with nearest neighbour search?

Searchability (making unstructured data usable) is at the heart of this concept. The nearest neighbour search is therefore a key concept in vector databases. The distance between vector embeddings expresses the similarity of the vectors (and thus the represented objects). Therefore, as you are searching for the most similar data, the so-called “nearest neighbour search” is a key concept and the time required to find the nearest neighbours is essential.

Do we need special vector databases?

There is already a discussion going on about whether special vector databases are needed or do not warrant a new category in the database landscape. Instead, vector extensions of traditional databases could be supporting the AI market. Both are reasonable expectations, and time will tell. Notable databases that have already added a vector extension include e.g. redis and elasticsearch. Additionally, more and more databases now allow storing vector types.

How does the vector database landscape look like?

To have a look at the current market situation, we are comparing the choices with the most traction, but excluding established players that have added vector capabilities to their existing database offering. Generally speaking we see a lot of very young companies, some companies that did pivot from their original specialization, and massive fundings. Please note: the table is not optimized to be readable on mobile or small screens (there just is a trade-off between providing the information and making it readable on every device).

If you’re on mobile, use this link to view a version that is readable on mobile.

Name

Open Source

License

GitHub stars

Developed in (language)

Summary

Business Model

Embeds / Uses

founding date / first released date

In-memory Unterstützung

Sharding

Index Types

Consistency Model

Benchmarks (Performance?)

Approximate Nearest Neighbor (ANN) Vector Databases

Funding

Who's behind it

HQ in

ObjectBox

Apache-2.0

C++, supports native language APIs in Java, Flutter / Dart, Swift, Python, GoLang, and C++

ObjectBox is an on-device vector database for Edge AI on Mobile, IoT, Embedded and other commodity devices

Free to use; paid Data Sync

HNSW built and optimized from scratch for efficiency / speed on devices with limited resources

development of the initial on-device database started in 2015; released the vector search to become the first on-device vector database for productive use early in 2024

HNSW

Transactionally safe, ACID

Seed in 2018

ObjectBox

🇪🇺

Marqo AI

Apache-2.0

2.8k ⭐

Python

A tensor-based cloud-native commercial Open Source search and analytics engine.

Open SaaS

Tensor-based

❔

HNSW

undisclosed preseed in May 2022

S2Search Australia Pty Ltd

🇦🇺

Weaviate

BSD

5.6k ⭐

Assembly, C++, GoLang

Weaviate is a commercial Open Source cloud-native vector database that stores both objects and vectors.

Open SaaS

❔

started in 2018 as a traditional graph database, first released in 2019

Y, static sharding

a custom HNSW PQ algorithm that supports CRUD

Eventual Consistency

not comparative, just evaluating their own performance

Y (multiple ANN algorithms as long as they support full CRUD)

67.7M USD, series B

SeMI Technologies

🇪🇺

Chroma

Apache-2.0

4.4k ⭐

Python & Typescript

Chroma is a Commercial Open Source vector database

Preparing a (Partly Open) SaaS model* [Commercial Open Source]

HNSW lib, DuckDB; based on ClickHouse

looks like 2022

Dynamic segment placement

20.3M USD, seed

Chroma Inc.

🇺🇸

Qdrant

Apache-2.0

6.6k ⭐

Rust

Qdrant is a Commercial Open Source vector similarity search engine and vector database

Open SaaS

RocksDB

first released: 2021

Y, static sharding

HNSW (SQ & PQ)

Eventual Consistency, tunable consistency

compares to weaviate, milvus, elastic (note: redis took too long to complete)

9.8M €

Qdrant Solutions GmbH

🇪🇺

Milvus

Apache-2.0

18k ⭐

GoLang & Python

Milvus is a cloud-native Commercial Open Source vector database

(Partly Open) SaaS* [Commercial Open Source]

Initial blog post from them said SQLite, but meanwhile they said RocksDB - exchanged?
they also have a ChatGPT-Cache that is build on SQLite
and say "Milvus uses SQLite or MySQL to manage metadata"

founded 2017, first released: 2019

Dynamic segment placement

ANNOY; HNSW; IVF_PQ; IVF_SQ(; IVF_FLAT; FLAT; IVF_SQ8_H; RNSG

Strong, bounded staleness, session, and eventually. The default consistency level in Milvus is bounded staleness.

not comparative

113M USD, series B

Zilliz

🇺🇸

Vespa

Apache-2.0

4.4k ⭐

Java & C++

Vespa is a Commercial Open Source vector database by Yahoo! It is a search engine which supports vector search, lexical search, and search in structured data

Open SaaS

❔

Originally a web search engine (alltheweb), acquired by Yahoo! in 2003 and later open sourced as Vespa in 2017; sinde Oct 2023 spinoff, raised series A in Nov 2023

maintains disk and memory structures for documents

Custom HNSW (Multi-vector hybrid HNSW-IF)

Eventual Consistency

not comparative

Spinoff from Yahoo! in Oct 2023, then raised a 31M USD series A

Yahoo!

🇺🇸

Vald

Apache-2.0

1.2k ⭐

GoLang

Vald is a cloud-native Open Source distributed approximate nearest neighbor (ANN) dense vector search engine

Community project, currently looks like no commercial interests are pursued

uses the vector search engine NGT

Technology incubation at Yahoo! Japan Corporation, development was stared in 2019

❔

N/A

not comparitive, but Vald performance only

Y (NGT)

Yusuke Kato (Yahoo Japan Corporation), Kiichiro Yukawa (Yahoo Japan Corporation)

🇯🇵

Pinecone

Proprietary

Pinecone is a fully managed vector database that specializes in enabling semantic search capabilities

SaaS

built on top of Faiss

first released in 2019

proprietary

Eventual Consistency

more programming language comparison for vector databases

Y (proprietary), plus KNN (with Faiss)

138M, series B

Pinecone Systems Inc

🇺🇸

Want to know more about the vector database market?

Here are some more questions answered for anyone interested

What is an "Open SaaS" business model?

Software as a service (SaaS) refers to software that is managed / hosted for the client and is essentially “rented.” The open in Open SaaS refers to the open source software that is being offered as such a service.

This frequently implies that not all code is open source, particularly that which is part of the managed service / hosting and associated value-adding features. Note: The open source software offered in this manner may or may not be provided by the company providing the software as a service. This has caused some friction in the open source community, as original creators often struggle to make a living, and/or maintainers struggle to keep maintaining the software – while other companies profit. Most famously, huge cloud providers have taken advantage of this option, leading to new licenses that keep the source open but restrict others from hosting as a service without donating the whole source code back to the community.

Why should I care about index types?

Indexes are essentially a way to speed up searching a database. There are several established index types for vector databases and they affect the performance of the database, e.g. the time it takes a query to complete.

What about benchmarks?

You will see, if you review the benchmarks given at the top, that results typically vary. Benchmarks are difficult to do and neutral benchmarks even more so. Certain use cases may favor certain solutions. Therefore, ideally you benchmark based on your specific use case…. but as a first evaluation, try to understand the basic influencing factors and have a look at a handful of benchmarks and explanations. Having said all this: There is a benchmarking tool available for approximate nearest neighbor (ANN) algorithms search. If you use this, you can compare the performance of different databases (with regards to the ANN search) for the same setup, based on the same approach. Also: The underlying libs often used by databases (like NGT and HNSW, see above) have already been benchmarked with it and you can compare to these directly.

Why is the market so hot, how can companies raise so much money?

AI is hot, everyone agrees that data and its management will be key to future success, and the database market is interesting: It is a long established market with many players, yet still demonstrating continually good growth (e.g. 17% in 2020). And the database market history shows that from time to time a new type of database comes up, and with it, the creation of a new market category. In such a market, typically the market creator “takes all” (not quite literally, but such a significant share, definetely the vast majority, that all other players are not attractive from a VC-perspective). Such a market could easily be worth 100M+ in ARR. Examples from the last 20 years: MongoDB (NoSQL databases), Cockroach (NewSQL databases), Neo4J (Graph databases), Influx (Time-Series databases). So, VCs are looking to find the next new type of database that can create a market… Maybe it will be vector databases? However, the database market has also shown to take 10 years+ for players to become profitable, so expect a longterm game. The race is still on for Edge Databases we think 🙂

Want to know more about the database market?

We recommend checking out db-engines. The website compares all relevant systems and has tons of data from the last 20 years. Note: They do only add databases once they have some traction and notability, not any hobby project. Accordingly not all databases of the above comparison have been added to the website yet.

Building a Business on Open Source

by Vivien | Jul 1, 2021 | Insights, Open Source, Startup

What is open source software?

For the sake of unambiguity: Open source software (OSS) primarily means that the source code of the software is accessible and users are free to use the code as they please. Depending on the license, you might be expected to attribute the source code to the authors and / or commit code enhancements back. Note: It’s “free” as in “freedom” not as in “free beer”.

Open Source and Commercialisation?

The origins of open source did not entail commercialization thoughts. However, in the last 20 years a lot of things have changed, and open source projects have seen commercial successes – though not always by the creators and maintainers… Open source is in its core tied to a philosophy and value set for many people. Simplified: For the developer community by and large open source is considered to be “good” versus proprietory source code is considered to be “evil”.

In any case, open source is one way to keep up an active vibrant developer ecosystem that empowers individual developers as well as startups and smaller players. Open Source is actually one piece of the IT ecosystem that helps balance the Big Tech and drive overall innovation. However, we also believe the open source ecosystem needs more balance to be successful longterm. If widely used open source repos cannot even sustain the half or full developer resource needed to maintain them, then there might well be a flaw in the system. If startups cannot build a business around their widely used open source code to sustain it longterm, it is to the disadvantage of the community, especially for the individual developers and SMEs. And likely, the learning at some point will be to keep the source closed instead.

In the following we will share, why we believe now is the unique opportunity to add fairness and balance for the value creators to the open source ecosystem to keep that ecosystem thriving and successful longterm.

What do we mean with “building a business on open source”?

In many talks with many people, we found there’s at least two diametric conceptions of building a business on open source:

1) using open source software for free and building something around it to earn money
2) developing a solution and open sourcing it or parts of it as part of the business model

In this article, we mean the latter and it inherently entails contributing a useful part of a solution to open source. For some open source enthusiasts a company needs to open source everything to be an open source company, and that’s ok. It is just our definition for this article.

A look at the market – the struggle of open source businesses

The Open Source Gold Rush: Success Stories

In the last years there have been many open source success stories, e.g. MongoDB, elastic, Cloudera all IPOd very successfully. There seemingly is a lot of money in open source businesses, e.g. a study by Fraunhofer concluded that “the EU economy is hugely benefiting from global OSS.” [1] Also, companies and big corporations are way more open to work with open source software, indeed 2020 was the first year where open source databases were on par with closed-source databases with regards to corporate adoption (see chart). [2]

And a recent (2021) report showed that across 17 industries, from 1,546 codebases 98% contained open source code. [3] There even is a bit of a hype that open source is the path to success. Now that it’s clear that it is possible to build a business with open source software, VCs also are more open to funding open source businesses. An Andreessen Horowitz report reveals that OSS companies have raised over $10B in capital with a trend towards bigger and bigger deals. [4] Annual invested capital in open-source and related dev tools has increased at around 10% CAGR over the last 5 years. [5] In the years 2018 and 2019 acquisitions, mergers, and IPOs from open-source companies generated over 80USD billion liquidity value according to Bessemer Venture Partners. [6]

The struggle of turning Open Source into a Business

Historically, open source companies have struggled with turning open source adoption into monetary success, “less than a decade ago open source was considered almost impossible to monetize.” [7] Sadly, that’s still a reality today for many open source maintainers and companies alike. Lots of open source maintainers with widely used open source code (“successful open source”), cannot get enough financial support to maintain the code. Of course, there are some successes, but in the end that might also be a question of ratios. For example, in 2020 GitHub reported having more than 190 million repositories. Even if only 10% of those do want to build a business on top of their code, how many of those see a financial reward? Gut feel: Far less than typical startup success odds. On top: What looks successful from the outside, might not really be a viable self-sustained business. Despite its many users, MongoDB spent $100M on development, and it took them more than 10 years to become profitable according to their own statements. [8]

A lot of tech companies struggle with – and spend a lot of time on – all the decisions around an open source business model. It isn’t easy, read up how GitLab struggled with finding a business model, or look closer into the MySQL story, and the MariaDB journey (which is a MySQL fork by the founders and original authors of MySQL); look at blog posts from CockroachDB, MongoDB, or elastic on open source – and what you see is a constant re-positioning of open source strategies.

As Mike Volpi from Index Ventures noted at the Index Open Source Summit (2021): “It took Mongo DB 10 years to derive the business model they run now and monetize successfully…” Wow, 10 years to somewhat successful monetization – and that is one of the major open source success stories.

Open sourcing your main technology as a strategy

In this article, we take a deeper look at open source as a pro-active business strategy.

Open Source to Build Traction

Traction is the most obvious reason to open source your product. It works like Freemium in the Mobile Games market – or more generally the Mobile Apps market. It’s a great way to evaluate product-market-fit and build traction. When you have that, you can think about monetization.

However, there is a big difference between giving something away for free and open sourcing it. If we stay in the mobile app world: Would open sourcing the app help with traction? Would it jeopardize the business model? Unless the main target users are developers, at least in the beginning likely not – less than making the app / game available for free in any case. However, once the app grows at amazing pace, open source availability could become a challenge in several respects.

The most obvious would be fast followers entering with that same game and potentially much bigger marketing budgets and better customer access (e.g. on the apps store). Think what would have happened if WhatsApp would have open sourced all its code from day 1 on top of giving the app away for free? It is a legit hyothesis that a fast follower could have scraped some of the market, changing the whole story. On the other hand, if they would open source all their code base now, how much would it harm them? At some point, it beame all about the traction, brand, customer access, so, I would think, it wouldn’t harm them at all at this point. So, driving traction with open source is probably only a viable idea if you address developers or engineers. It’s clearly a phenomenon of the developer-led landscape, and acts as a developer distribution channel. This being said, the price of open source traction is commercialization. It’s a straight forward trade-off: The more open and free your license is, the harder it is to monetize later on.

Open Source to Build Trust

Trust is something that is likely more important for certain software types (e.g. B2B and core tech).

ObjectBox is a database and with that it is a data-centric “core technology” / software infrastructure, sitting at the heart of a company’s solution. Anything that gets used at the heart of other companies or their solutions needs a lot of trust. Trust is easier to come by with size, “no one was ever fired for choosing SAP.” Being a small startup lies at the opposite on that spectrum for many decision makers. Open Source can be a way to overcome this specific challenge and build trust in three ways:

Transparency: The freedom to verify what the code enables; the internal developer team can check the code and vouch for the solution
Risk-reduction: The freedom to change and maintain the code oneself gives independence from the authors and the success of the solution
Quality: If an open source solution is actively used by a large number of developers quality inevitably goes up

So, if you are looking for adoption from big players in heavily regulated or security-concerned industries, e.g. medical, manufacturing, automotive, anything with mission-critical networks, open source can help you overcome many of the adoption hurdles you are facing.

Open Source as an IP Strategy

Seems counter-intuitive, right? Well, if you are not aiming to patent your technology, you still might not want someone else (who has been working on the same problem) to patent the same technology harming your freedom to operate. You can protect yourself from that risk by open sourcing it. This can come in the form of a copyleft license, designed to encourage further innovation advancements to the benefit of all, but also limiting the commercial exploitation opportunities for everyone. Or, you can choose a more permissive license, allowing people with commercial interests to keep any advancements they make to themselves.

Note: Open source code is not a blueprint with exact instructions; there are no obligations to provide clear docs or explanations. While a majority of open source projects strive to deliver a code base that is readable by others, it is not controlled. So, while open sourcing a technology harms patenting it, unfortunately, a way to still protect it, is making it hard to understand. On the other hand, a patent must have an extensive explanation. This makes it easily repeatable by others in the future, after the end of the patent protection, or as a basis for further research (and ways to tweak it in a novel enough way).

Although it often feels like open source is on the other spectrum of patents, a patent has a limited timeframe and people can learn from it even before it expires. The deal is basically an exchange of knowledge (to be used in the future) for protection (for commercially exploiting it). Keeping it a trade secret has other risks, but could mean that an invention wouldn’t be shared with others for a truly long time. And of course the protection encourages big companies to invest big budgets in R&D too. Delayed open source actually has many similarities with a patent, in both cases the tech is only made available for advancements and unrestricted use after a certain time frame has ended.

Open Source for the sake of it

There are a lot of ideas floating around open source, and some pressure from the developer community to open source everything. Among developers, open sourcing is considered to be good, social, fair, transparent, and worthy. While there are many advantages in open source, it has turned into a kind of “political tool”, and that’s a downside – and probably the opposite of the original idea.

Consideration 1: How is a great software supposed to be maintained and advanced without anyone providing funds? When MMOGs (Massive Multiplayer Online Games) became a thing, people understood that there was a constant cost associated with it and were willing to switch from a one-off fee to monthly payments. Software typically needs to be maintained too. So, there are ongoing development costs associated with a piece of software, even if it is not hosted. So, who benefits from open source in the end, if the original creators cannot keep up their work (assuming they need to eat and sleep)? Before pushing everyone to open source, maybe read here, here, here, or here about open source maintainers struggling under the pressure and dealing with burnout. On the flip side, if a company markets itself heavily as an “open source company”, they should give considerable parts of their own value creating solution back to the community. Using open source tools and building on top of open source code (and even committing back to these solutions) does not mean you are an open source company: If you want to reap the marketing benefits of calling yourself an “open source company” then you should truly be one and commit your value back to open source.

Consideration 2: Who benefits if another company pulls the repo, adds “sparkles”, maybe even some “missing features”, or merely a big “brand name”, or the “marketing budget” and makes a ton of money selling the solution? This is of course assuming a permissive license was used. Well, from an open source perspective that is perfectly fine, and part of the intention of open source. So, it’s great, right? We think, it is easy to understand that some authors who have put all their “free time” / unpaid time into that code struggle to accept when this happens, especially if they have a hard time supporting themselves. But we also understand that big companies with investors (stakeholders…) that have invested heavily in R&D and might or might not yet have reached profitability, don’t really like to see this happen. Unless you are really in it for the fun and driven by altruism and will be in perfect harmony with other people using your code to make money, you should look closely if and how you want to open source your code.

Open Source to save development costs

There is the idea floating around that you can develop your project for free using the open source community. We doubt it works out for many. Of course, if Google maintains a repo that is a base technology used by many developers, developers might want to commit something (anything really) for fame, to be part of it, maybe to get noticed. However, the “anything really” is already a problem: Someone needs to review the submission, respond, potentially rework it and so on… Most other repos will probably not get too many commit requests (let alone from the best tech talent around). Even then, onboarding a large community of unknown developers and letting them commit to your code has its challenges – especially if you are quality-conscious and / or trying to build a business. It creates a lot of work to review commits and reject / merge them. And on top of that from a legal perspective you need to have a waterproof contributors license signed by anyone committing. There clearly is some work involved in the process, maybe more than what it is worth sometimes.

Also consider this: Most successful open source projects that turned into a business success have limited contributors and / or only internal (contracted) contributors. For example, SQLite 99% of the code was done by Richard Hipp (author and founder of SQLite), and MongoDB stated that about 98-99% of the code was done internally. Redis was almost exclusively coded by Salvatore Sanfilippo. In a presentation from Index Ventures (one of the most renowned open source VCs), one criteria for potentially successful open source businesses was that at least 90% of the code base was developed internally – and of course that the team owned all the IP. If you are after cheap development and external help with your project, maybe take a closer look if open source is the right path.

What open source business models exist?

The following open source business models are common, but typically used in combination and not as pure models, e.g. most open source companies offer paid support, but rarely only paid support. Note: With time the examples may become wrong/outdated, because once you look into it, you will notice that companies adapt / change their model regularly. If you need to understand one specific company’s model you need to dig into it individually at that time.

There are three basic open source licenses to be distinguished: permissive, weak copyleft and copyleft.

A quick high-level note on the major license effects

Copyleft – major point is that derived works must be open sourced with a compatible copyleft license, meaning any advancements and changes to the work will be contributed back to the community and freely available for unrestricted use.

Weak Copyleft – the weaker copyleft refers to licenses where not all derived works inherit the just described copyleft effect; typically used in software libraries, e.g. a database library used in app development, so the library can be used in a mobile app without needing to contribute the whole app to open source; only changes to the database library itself would carry the copyleft effect.

Permissive – a permissive open source license allows you to do anything with the source code including keeping derived works to yourself and commercialising on it.

	Description	Examples	Note
Paid Support	Providing paid support, trainings, certificates	RedHat	Where has this approach been working – as a pure paid support approach – ever since Red Hat?
Open Core	The core product is free and open source, extra features are paid; have an open-source core and sell closed-source features on top of it	SugarCRM, MySQL	It is basically the widely successful freemium model just with open source; typically you expect the large majority of users to use it for free. The open source part of course enables anyone to build the same features as you
Dual Licencing	The free open source sw uses a copyleft license, whereas the paid license is a commercial license without copyleft effects	MySQL, elastic	This kind of license enables you to monetize your commercial (typically bigger users) and still enables the community to expand the product landscape and innovate based on the code base
Delayed Open Source	All code will be fully open sourced with a time delay (details and timings vary)	MariaDB, Cockroach DB	The effect depends also on the licenses used, but typically it protects you from competition for a given time frame, so only you can exploit your development commercially and gain market share / develop an advantage based on market entry time. At the same time it reduces the risk for adopters, because they know the code will become available to them
Open SaaS	Offering the software open source and hosted as a service (SaaS), which is the primary source of revenue allowing anyone to do the same with the software with a permissive license (self-host or host for others)	WordPress, Sharetribe, MySQL, MariaDB	This model has been the major point of discussion in the last 3 years and is seen by many as the holy grail for monetizing open source software; it also triggered many companies to move away from an open source licensing model as large cloud providers can easily host an open source product at better rates
“Closed SaaS”	Strictly speaking / officially not “open source”. Offering the solution open source and hosting it as a service (SaaS) while NOT allowing anyone to host it, often times unless they contribute the whole solution back to open source (copyleft effect))	MongoDB, elastic, Cockroach DB	The first license that built this specific copyleft-effect into its license was MongoDB (SPSL). The license has since been adopted by e.g. elastic, …. Since then similar licenses have been developed. OSI did not approve the license as an official open source license.
“Ad model”	For lack of a better name, I called it “Ad model”; it’s really having so much reach and traction that companies pay for customer access through your solution or similar co-operations	AdBlock Plus, Firefox	Can take many variations: For instance, the open-source application AdBlock Plus gets paid by Google for letting whitelisted acceptable Ads bypass the browser ad remover. Or, in 2014 Yahoo struck a deal with the Mozilla Corporation to make Yahoo the default search engine in Firefox

A look at the open source market

Name	Founding Year	Funding Summary	Started with Open Source (license)	Open Source Evolvement	Devtool	Open to contributions / CLA	HQ*	Notes / Story synopsis
MongoDB	2007	6 funding rounds with a total of $311M IPO was in autumn 2017; valuation $1.6B	started with AGPL	Created SSPL in 2018 causing much debate in the community. SSPL is not an open source license	Database	“we own 100% of the IP”; 99.9% developed in house and the few contributions accepted were from people who signed a CLA	US-based	According to statements fromMongoDB, adoption went up after the license change (15 mill dwlds, more than in the prior 10 years together). In 2016 they launched their database-as-a-service offering, which is considered the game changer w. regards to building a business. Until Oct 2017 MongoDb downloads were >30M with 10M from the prior 21 months.
Data Bricks	2013	Total funding 1.9B; last round: Series G; Feb 2021 $1B	proprietary PaaS	their main service is proprietary, but they use a lot of open source software and have a strong footprint in the open source community	Backend	NA	US-based	“Databricks is the original creator of some of the world’s most popular Open Source data technologies” – open source is a large part of their positioning and marketing. However, it seems their main offering, while based on open source, is proprietary. So, not an open source business as defined here.
elastic	predecessor released in 2004; first elasticsearch released in 2010; incorporation only in 2012	Total funding $162M; last round was a series D; elastic did IPO in autumn 2018	started with Apache 2 for for elastic search (which was the original main product)	Last license change in 2021: You can now choose between the proprietary elastic license or SSPL; so stritly spaking not open source anymore	Devtool	CLA	US-based	2018: elastic IPO –> shares doubled the first day. Note: With so many different products (not a single product company), the open source strategy is harder to grasp.
Confluent	2011	Total Funding Amount $455.9M, last round: series E	Unlike Apache Kafka which is available under the Apache 2.0 license, the Confluent Community License is not open source and has a few restrictions	Kafka is open source, Confluent isn’t	Devtool	NA	US-based	“Founded by the team that originally created Apache Kafka” – the team behind Confluent contributed a lot to open source prior to Confluent, but the Confluent code itself isn’t open source as far as we understand. They heavily rely on other open source software for their tech stack though.
RealmDB	2011, before the founders did “TightDB” on which the Realm DB was based	4 investment rounds. Then MongoDB acquired them for $39M on Apr 24, 2019	started out closed; then open sourced the database and went for the open core model, then subsequently open sourced the Sync solution too, going for the hosted (SaaS) model	from closed to open core to open SaaS; acquired by Mongo to push their backend offerings and complement with an edge and sync (serving Mobile and IoT better)	Database	looks like they accepted contributions	Started in Europe, but HQ went to the US when joining YC 2014; it was since bought bei MongoDB	The founders both left the company the year before it was acquired by MongoDB. The acquisition prize was a little less than what Realm had raised in the years before. The Sync solution is now tied to using the Mongo servers / cloud and a huge part of their push for the IoT market.
SQLite	2000	Bootstrapped	Public Domain, which we always considered one of the most “open source” ways to open source but in the light of recent discussions around the SSPL license, strictly speaking it is at least not OSI-approved	Public Domain, mainly monetize big corporates for being in a Consortium; also offers services and since xxxx? encryption (basically paid feature); our guess is that this is not really a repeatable business model	Database	Richard Hipp owns all IP, 99% is developed by himself; very limited outside support (2 part-time freelancers that we are aware of, both don’t have any rights to the IP)	US-based (privately held by Hipp, Wyrick & Company, Inc (author: Richard Hipp and all stock held by his wife G. Wyrick; both work for the company)), HQ	The company has always been and still is run by Richard Hipp and his wife; from a development perspective it is a one-man-show. Richard wrote SQLite himself, as far as we are aware they have no other employees apart from 2-3 part-time supporters for specific versions; very special Open Source Story.
Couchbase Lite	2009 – Couchbase, Inc. is a merger of Membase + CouchOne in 02.2011; both former companies were started 2009 and had funding	251 million USD total funding; 8 rounds with latest Series G for $105 million	Apache 2	Delayed Open Source	Database		US-based (both entities were US-based already before the merger)	Couchbase now mainly sells Couchbase Servers; Couchbase Lite is the smallest part of their business; in 2020 there seemed to be a shift towards the Sync Gateway and Edge Computing market in communication; however, the main business still seems be on the server side and based on cloud lock-in.
redis	2009	Total Funding Amount $246.6M	redis the database itself is and always was BSD; redislabs is the company that has secured certain rights for redis and sells extensions and add-ons under several licenses, they changed from APGL to Apache 2.0 with Common Clause to a proprietary license called “Redis Source Available License”	redis itself is BSD but features / extensions around it from RedisLabs are licensed uner prorietary licenses	Database	Any contribution needs a CLA that is provided by redislabs; we believe anything committed under this CLA could also be used in redislabs proprietary products (which typically is the same for anything committed under a permissive license, but which has attracted some criticism from the OSS community)	Redislabs is US-based. Salvatore Sanfillipo (antirez) was always bsaed in Europe; redislabs originated in Israel	RedisLabs is the commercial entity that markets redis; redis was largely developed by Salvatore Sanfilippo. He left redis as a maintainer in 2020.
RedHat	1993	bought by IBM in 2019 for $34 billion; before that they had raised $240.7M	Linux, which was the core of the success of RedHat, is GPL (though of course not the company’s decision)	RedHat is a huge company, definetely not a single product company, and thus also does not really fit into this matrix, however, it is THE example for successful commercialisation of open source and we feel the matrix would lack without it	Backend / Data centric	we believe you can contribute to most (all?) projects without a CLA	US-based	Read here why there will never be another Red Hat (and there is no “Red Hat Model”). Note that of course the Red Hat founders did not write Linux (on which the majority of their success is based), but at the very least they (as well as VA Linux) gave option shares to Linus Torvald out of gratitude (at lest not out of obligation). When both companies successfully IPOd, Linus made 20 Mill USD (in total) from both sales.
MySQL	1995 (development started already in 1994)	Total Funding Amount $39.8M, sold to Sun in 2008 for 1 USD billion	started out with AGPL; several license adaptions and changes in the open source business model over the years, e.g. for a long time they had a 2 year delay for the open source version, but changed that to no delay at some point.	Dual Licensing and Paid Support	Database	Yes, even though called OCA (Oracle Contributor Agreement)	Sweedish company until it was acquired by Sun Microsystems in 2008 (who then were acquired by Oracle)	The founders forked the latest MySQL version when Oracle acquired it. Most of the original database code base was developed by Michael Widenius; with regards to database technologies a pattern emerges: Often the core / most of the base technology is developed by one person – as building a database is a rather huge endeavor that’s kind of striking, isn’t it? BTW: MySQL is named after Monty Widenius daughter (“My”)
Hyper	2010 (academic research project at TUM)	undisclosed	proprietary, not open source	None	Database	NA	EU-based; German “university spinoff” acquired by Tableau very early	2016: HyPer acquired by Tableau. Terms of the deal were undisclosed
ParStream	2011	acquired by CISCO in November 3, 2015	proprietary, not open source	NA	Database	NA	Originally EU-based (German), then moved to US in 2012, acquired by Cisco in 2015	Cisco ParStream is no longer offered as a stand-alone product. The functionality of Cisco ParStream is now part of Cisco Kinetic.
Cockroach DB	2015	Series E in Jan 2021 for $160M	Apache 2.0, plus a proprietary license for enterprise features	Started as open core, now a form of closed SaaS with delayed open source: They changed to a proprietary license in 2019, called BSL, which prohibits users from offering CockroachDB as a service (DBaaS, SaaS), and each release converts to an open source license after three years. CockroachDB is therefore officially not considered open sorce anymore	Database	CockroachDB received significant contributions from the community (“we have had over 1590 commits from over 320 external contributors across all our open source repositories” (2020)), CLA: Yes	US-based	In June 2019, Cockroach Labs announced that CockroachDB would change its license from the free software license Apache License 2.0 to their own proprietary license, known as the Business Source License (BSL), which forbids “offer[ing] a commercial version of CockroachDB as a service without buying a license”, while remaining free for community use.
Berkeley DB	1994	Acquired by Oracle in 2006	BSD and Sleepycat Public License (a permissive OSS license)	Oracle changed to dual licensing with APGL and a commercial license	Database	NA	US-based	It is still used in many routers and gutfeel is that the market share in that specific area is good. Unfortunately, no numbers available.
GitHub	2008	In 2018 Microsoft bought GITHUB for $7.5 billion.	proprietary, not open source	NA	Backend / Data centric	NA	US-based	Microsoft bought GitHub for the developer access; that would not have changed if it would have been open source and I do wonder what would have happened to GitHub if it would have been open source; one thing is for sure: GitLab wouldn’t have been able to position themselves as the open source alternative; however: the closed source model worked for them well, even though it is a developer tool.
GitLab	development started in 2011; incorporated only in 2014	$434.2M Series E	completely open source (MIT license)	Now: Open Core Model; Community Edition: MIT License Enterprise Edition: Source-available proprietary software	Backend / Data centric	Originally CLA, now dropped and instead the code must be committed under the same license as the feature is (mainly Apache 2.0) plus a DCO	US-based (development was started in Europe, the founders incorporated in the US in 2014 when joining YC)	GitLab used being open source as a strong positioning factor against GitHub (which was never open source). It was an odyssey to find a sustainable business model (and it seems it is not SaaS). Note: The pure service model and the donation model did not work for them. Again: The code base of the core system was by and large developed by one person.
MariaDB	2009	Total Funding Amount $123.2M	Dual licenscing with GPL license, version 2 and a prorietary source available license for some parts	They evolved their dual licensing approach to using the proprietary source avaiable license (BSL)	Database	Yes, and the CLA is shared under a creative commons license that allows you to use it as you like https://mariadb.com/kb/en/mca/	Sweedish company	10 years after it was forked, MariaDB has 20M users, a fast growing database business and has >€100m backing. Note: The pure service model as well as the donation model did not work for them.

Building an Open Source business Exec Summary – TL; DR

There is a lot of evidence that open source companies struggle with open source models and licenses – this is also true for successful companies
There is no “Red Hat Model” – just selling services has rarely worked
The donation model typically hasn’t worked for open source companies, e.g. GitLab and MariaDB, so it is not astonishing that GitHub sponsorships don’t work out great for most maintainers. Also note: GitHub sponsorships may put you in a bad legal position depending on where you are based
There is a trend from successful open source companies towards Source Available licenses instead of “official Open Source licenses”, e.g. MongoDB, elastic, CockroachDB, …
There is an indication that successful open source companies are US-based (even if founded / started in Europe), which we believe is due to the funding opportunities provided in the US: 1) the US provides generally more funding (more and bigger funding opportunities; there is lots of market research on that), 2) US VCs and Silicon Valley have the reputation to also fund at earlier stages, e.g. idea stage, and companies with traction (instead of revenue), investing in a longterm perspective. Traditionally, European investors don’t.
Public domain is strictly speaking also not considered to be an open source license 😮 (at least not if it needs OSI-approval; does it? 🤔)
While Open and Closed SaaS seem at this moment to have been the most successful models, it is no holy grail and definetely does not work for everyone, e.g. it didn’t work as the sole business model for GitLab

Conclusion

The open source market lacks flexibility and transparency from a licencing / legal perspective, and ever more Source Available licenses don’t help: A “license stack” with building blocks like the Creative Commons would be helpful to mark software easily and clearly with regards to the main terms, e.g. “source available”, “free for commercial use”, “attribution necessary” etc. It would help maintainers and users alike, but needs bigger entities to drive this (like an OSI).

The open source market also needs more balance, at the very least more understanding and “love” towards maintainers. More finanical support as well as other ways of giving back to demonstrate the appreciation of well-maintained repos and great free software, will keep the ecosystem healthy and thriving. That’s a community effort; everyone can contribute.

Sources

1. https://openforumeurope.org/event/accelerating-the-eu-economy/
2. https://db-engines.com/en/ranking_osvsc
3. https://www.synopsys.com/blogs/software-security/open-source-trends-ossra-report/
4. https://agcpartners.com/wp-content/uploads/2021/03/AGC-Open-Source-Mar-2021.pdf
5. https://agcpartners.com/wp-content/uploads/2021/03/AGC-Open-Source-Mar-2021.pdf
6.https://www.bvp.com/atlas/roadmap-open-source
7. https://www.bvp.com/atlas/roadmap-open-source
8. Index Open Source Summit (2021)

Flutter databases – Hive, ObjectBox, sqflite, Isar and Moor (e.g. Drift, floor)

by Vivien | Feb 3, 2021 | Android, Mobile Database, ObjectBox, Open Source, SQlite, Sync

Flutter, the renowned cross-platform mobile framework, has been gaining immense popularity among developers worldwide. In 2024, Flutter had over 1 million monthly active developers, was behind nearly 30% of new iOS apps, and continued to be the most popular framework for cross-platform development. Dart, the programming language behind Flutter, was first released in 2011 and already made it to spot 28 on the Tiobe index as of February 2025.

This growth comes from a strong community, with more than 1,400 contributors, 10,000 package publishers, and over 50,000 available packages. As the Flutter community expands, the demand for efficient Flutter databases is also increasing. Developers now have access to a range of Flutter database options that cater to various needs and preferences.

In this article, we’ll focus specifically on local storage solutions, as these are essential for enabling offline functionality, improving performance, ensuring data persistence, enhancing data privacy and security, and supporting edge computing capabilities. Furthermore, local data storage is needed to promote sustainability. Let’s dive into the current local database landscape for Flutter and compare the most popular options.

Flutter databases / Flutter Dart data persistence

While the database market is huge and dynamic, there are only few options to choose from if you are a Flutter / Dart app developer. Before we dive into the Flutter database options, advantages and disadvantages, we’re taking a very quick look at databases to make sure, we share a common ground.

What is a database?

A database is a piece of software that allows the storage and systematic use of digital information, in other words: data persistence. As opposed to mere caching, data is reliably stored and available to work with unless actively deleted. A database typically allows developers to store, access, search, update, query, and otherwise manipulate data in the database via a developer language or API. These types of operations are done within an application, in the background, typically hidden from end users. Many applications need a database as part of their technology stack. The most typical database operations are CRUD: Create, Read, Update, Delete.

What are the major types of databases?

There are many types of databases. For our purpose, the most important differentiations are non-relational (NoSQL) versus relational databases (SQL), cloud databases versus edge databases, and maybe embedded versus in-memory. However, databases can be further distinguished by additional criteria e.g. the data types they support, or the way they scale – and definitions can vary.

What is an ORM?

An Object relational Mapper (ORM) is not a database. We’re bringing this up mainly, because we see it confused often. It is a layer that sits on top of a database and makes it easier to use. This is typically especially relevant when the database is a relational database (SQL) and the programming language used is object-oriented. As noted above, Dart is an object-oriuented programming language.

The Flutter local data persistence landscape

There are several Flutter databases that provide offline support, offering the ability to store and access data locally even without an internet connection. Here are some of the notable options:

Hive is a lightweight key-value database written in Dart for Flutter applications, inspired by Bitcask.
ObjectBox DB is a highly performant lightweight NoSQL database with an integrated Data Sync. It stores objects.
sqflite is a wrapper around SQLite, which is a relational database without direct support for Dart objects.
Drift is a reactive persistence library for Flutter and Dart, built ontop of SQLite.
Floor is another ORM on top of SQLite.

What is the best offline Flutter Dart database?

This of course depends… Make up your own mind with the following comparison matrix as a starting point. Note: With very few options to choose from, the following overview is sometimes a bit like comparing apples 🍎 and pears 🍐.

Data persistence	Description	Primary Model	Data Sync	Language	License	Fun Fact	"Headquarter"
Drift	ORM on top of SQLite	relational	❌	SQL	SQLite is public domain, Drift is MIT	Formerly known as Moor	🇩🇪
Floor	ORM on top of SQLite	relational	❌	SQL	SQLite is public domain, floor is Apache 2.0	Developed by a mobile app agency, not an individual author	🇳🇱
Isar	Lightweight NoSQL database	NoSQL	❌	Dart	Apache 2.0	Also the author of Hive - both libs are not maintained anymore	🇩🇪
Hive	Predecessor of Isar	NoSQL	❌	Dart	Apache 2.0	Also the author of Isar - both libs are not maintained anymore	🇩🇪
ObjectBox	Lightweight NoSQL database with integrated Data Sync	NoSQL	✅	Dart	Bindings are Apache 2.0	It is used in BMW cars 😮	🇩🇪
Realm	NoSQL database acquired by Mongo DB in spring 2019, Flutter binding came in 2023, now deprecated	NoSQL	Deprecated, End of life in Sep 2025; closest substitute is ObjectBox	Dart	Apache 2.0	Originally Realm was developed in Denmark… MongoDB stopped Realm support and the Sync is deprecated	🇺🇸
Sembast	NoSQL database, fully document-based	NoSQL	❌	Dart	BSD-3-Clause	Also the author of SQFlite	🇫🇷
sqflite	SQLite plugin for Flutter	relational	❌	SQL	SQLite is public domain, sqflite lib is MIT	Not an ORM	🇫🇷

<p>Diese Seite verwendet Frames. Frames werden von Ihrem Browser aber nicht unterstützt.</p>

Flutter Database performance benchmarks

As with any benchmark, you need to take a look at the details. We take benchmarking very serious and strive to get accurate results. Therefore, we also always open source the benchmarking code and encourage you to check it out. If you note anything that does not even out in your oppinion, do let us know. We have a long history of updating and improving our benchmarks continually and are happy to take any recommendations.

Performance Benchmark Test Setup

We used an Android 10 device with a Kirin 980 CPU to run the benchmarks as a Flutter app. The app executed all operations (ops) in batches of 10.000 objects. Each batch formed a single transaction. We ran each test 50 times. The results you see in the diagram are averages across all runs. We set it up that way to ensure that neither the Virtual Machine warmup during the first run nor the garbage collections affect the overall result significantly.

Flutter Databases CRUD Performance Results

Summary of the Flutter Dart DB Benchmarks

Hive and ObjectBox clearly outperform sqflite across all CRUD operations. The results show ObjectBox performing with up to 70 times the speedup for create and update operations. With regards to comparing Hive and ObjectBox, the results vary more. Hive can be faster at reading objects than ObjectBox. However, strictly speaking it’s not a fair comparison, because in Hive, the high read numbers result from Dart objects already cached in memory. If the objects are fetched using the async API from disk, the numbers drop by factor 1000.

Drift and Floor were not part of the benchmarking as they are ORMs. However, it is very likely they will perform similarly to sqflite, reflecting primarily the performance of SQLite.

Flutter Data persistence – Conclusion

Recently, the Flutter database landscape has experienced significant growth and diversification. With Flutter’s increasing popularity, developers now have a number of database options available. In this article, we focused on the best local databases, comparing their features in a comprehensive matrix, and showcasing performance benchmarks. In the end, the best choice depends on the specific needs of each project. The Flutter database landscape in 2025 is a thriving ecosystem, continuously evolving to meet the changing needs of Flutter app development. One upcoming change that we can see is the rise of vector databases for AI. So, we encourage you to keep an eye on the lively market of Flutter databases not to miss any important updates.

If you want to get started learning how to use a database, we suggest you check out this video tutorial series that teaches you how to build a Flutter app with ObjectBox from scratch.

Introducing: ObjectBox Generator, plus C++ API [Request for Feedback!]

by Markus | Jun 16, 2020 | Edge Computing, IoT, Open Source, Release

We are introducing the ObjectBox Generator today to simplify ObjectBox development for more programming languages, starting with C/C++. Additionally, we are releasing a brand new C++ API that goes hand in hand with the new generator. Historically, our C API was rather low level as it was focused on providing the foundation for our Swift and Go APIs. With this release we want to provide C/C++ developers with ObjectBox convenience and ease of use.

ObjectBox Generator takes over the burden of writing the binding code and data model declaration. Based on a single input file, it generates the code for you, so you can focus on the actual application logic.

Generator Example

ObjectBox let’s you handle data as FlatBuffers. For example, you can put and get data objects as FlatBuffers encoded bytes. To work with FlatBuffers, you need to define a FlatBuffer schema file (.fbs). And this file is also the input for ObjectBox Generator. This way, everything is defined in a single location.

Let’s say we have a FlatBuffers schema file “task.fbs” with the following content:

table Task {

id: ulong;

text: string;

date_created: ulong;

date_finished: ulong;

}

Now, we can tell ObjectBox Generator to use this file to generate C++ sources:

1	./objectbox-generator -cpp task.fbs

This makes ObjectBox Generator to generate the following files:

objectbox-model.h: source code to build the internal data model, that you need to pass when creating a store.
objectbox-model.json: keeps track of internal schema IDs; you don’t need to worry about this except that you should put it in your source control.
task-cpp.obx.h: the C++ value structs (data objects), binding code for FlatBuffers and the new Box class.

C++ API Example

Now, let’s use the previously generated code and the new C++ API around the Store and Box classes. A simple CRUD application boils down to a few lines:

#include "objectbox-cpp.h"

#include "objectbox-model.h" // provides create_obx_model()

#include "task-cpp.obx.h" // provides Task struct and bindings

int main(int argc, char* args[]) {

obx::Store store(create_obx_model());

obx::Box<Task> taskBox(store);

obx_id id = box.put({.text = "Buy milk"}); // Create

std::unique_ptr<Task> task = box.get(id); // Read

if (task) {

task->text += " & some bread";

box.put(*task); // Update

...

box.remove(id); // Delete

}

...

}

Note that the generated code is header-only and compatible with the existing ObjectBox C-API, allowing both to be used from the same application. The C and C++ APIs both have unique advantages: the C++ API uses RAII so you do not need to worry about cleaning up, while the C API has additional features, e.g. queries.

Open Source, Docs

ObjectBox Generator is open source and available on GitHub. The repository comes with a readme file that also serves as a documentation. Among other things, you will find ObjectBox specific annotations there, which are used in fbs files to express ObjectBox-specific concerns. For example, in the definition of Task above, we used ulong as a FlatBuffers type to store dates. However, FlatBuffers does not know what a date is and we use ObjectBox annotations to express this:

1 2	/// objectbox: date date_created: ulong;

For our initial release of ObjectBox Generator and the public C++ API we decided on labeling it as version 0.9. Although we are already very close to a 1.0 and we wanted to gather some feedback before our first major release. As we can still change the API or smooth out any rough edges you may find, we cannot stress enough how much we welcome and appreciate your feedback at this point. Thank you!

« Older Entries