The first On-Device Vector Database: ObjectBox 4.0

The first On-Device Vector Database: ObjectBox 4.0

The new on-device vector database enables advanced AI applications on small restricted devices like mobile phones, Raspberry Pis, medical equipment, IoT gadgets and all the smart things around you. It is the missing piece to a fully local AI stack and the key technology to enable AI language models to interact with user specific data like text and images without an Internet connection and cloud services.

An AI Technology Enabler

Recent AI language models (LLMs) demonstrated impressive capabilities while being small enough to run on e.g. mobile phones. Recent examples include Gemma, Phi3 and OpenELM. The next logical step from here is to use these LLMs for advanced AI applications that go beyond a mere chat. A new generation of apps is currently evolving. These apps create “flows” with user specific data and multiple queries to the LLM to perform complex tasks. This is also known as RAG (retrieval augmented generation), which, in its simplest form, allows one to chat with your documents. And now, for the very first time, this will be possible to do locally on restricted devices using a fully fledged embedded database.

What is special about ObjectBox Vector Search?

We know restricted devices. Where others see limitations, we see the potential and we have repeatedly demonstrated creating superefficient software for these. And thus maximizing speed, minimizing resource use, saving battery life and CO2. With this knowledge, we approached vector search in a unique way.

Efficient memory management is the key. The challenge with vector data is that on the one hand, it consumes a lot of memory – while on the other hand, relevant vectors must be present in memory to compute distances between vectors efficiently. For this, we introduced a special multi-layered caching that gives the best performance for the full range of devices; from memory-constrained small devices to large machines that can keep millions of vectors in memory. This worked out so well that we saw ObjectBox outperform several vector databases built for servers (open source benchmarks coming soon). This is no small feat given that ObjectBox still holds up full ACID properties, e.g. caching must be transaction-aware.

Also, keep in mind that ObjectBox is a fully capable database that allows you to store complex data objects along with vectors. From an ObjectBox data model point of view, a vector is “just” another property type. This allows you to store all your data (vectors along with objects) in a single database. This “one database” approach also includes queries. You can already combine vector search with other conditions. Note that some limitations still apply with this initial release. Full hybrid search is close to being finished and will be part of one of the next releases.

In short, the following features make ObjectBox a unique vector database:

  • Embedded Database that runs inside your application without latency
  • Vector search based is state-of-the-art HNSW algorithm that scales very well with growing data volume
  • HNSW is tightly integrated within our internal database. Vector Search doesn’t just run “on top of database persistence”.
  • With this deep integration we do not need to keep all vectors in memory.
  • Multi-layered caching: if a vector is not in-memory, ObjectBox fetches it from disk.
  • Not just a vector database: you can store any data in ObjectBox, not just vectors. You won’t need a second database.
  • Low minimum hardware requirements: e.g. an old Raspberry Pi comfortably runs ObjectBox smoothly.
  • Low memory footprint: ObjectBox itself just takes a few MB of memory. The entire binary is only about 3 MB (compressed around 1 MB).
  • Scales with hardware: efficient resource usage is also an advantage when running on more capable devices like the latest phones, desktops and servers.
  • ObjectBox additionally offers commercial editions, e.g. a Server Cluster mode, GraphQL, and of course, ObjectBox Sync, our data synchronization solution.

Why is this relevant? AI anywhere & anyplace

With history repeating itself, we think AI is in a “mainframe era” today. Just like clunky computers from decades before, AI is restricted to big and very expensive machines running far away from the user. In the future, AI will become decentralized, shifting to the user and their local devices. To support this shift, we created the ObjectBox vector database. Our vision is a future where AI can assist everyone, anytime, and anywhere, with efficiency, privacy, and sustainability at its core.

What do we launch today?

Today, we are releasing ObjectBox 4.0 with Vector Search for a variety of languages:

*) We acknowledge Python’s popularity within the AI community and thus have invested significantly in our Python binding over the last months to make it part of this initial release. Since we still want to smooth out some rough edges with Python, we decided to label Python an alpha release. Expect Python to quickly catch up and match the comfort of our more established language bindings soon (e.g. automatic ID and model handling).

Let’s get you started right away? Check our Vector Search documentation to see how to use it!

One more thing: ObjectBox Open Source Database (OSS)

We are also very happy to announce that we will fully open source the core of ObjectBox. As a company we follow the open core model. Since we still have some cleaning up to do, this will happen in one of the next releases, likely 4.1.

“Release week”

With today’s initial releases, we are far from done yet. Starting next Tuesday, you can  expect additional announcements from us. Follow us to get the news as soon as it is released.

What’s next?

This is our very first version of a “vector database”. And while we are very happy with this release, there are still so many things to do! For example, we will optimize vector search by adding vector quantization and integrate it more tightly with our data synchronization. We are also focusing on expanding our solution’s reach through strategic partnerships. If you think you are a good fit, let us know. And as always, we are very eager to get some feedback from you! Take care.

Vector databases – a look at the AI database market with a comprehensive comparison matrix

Vector databases – a look at the AI database market with a comprehensive comparison matrix

Vector databases - a look at the AI database market

⭐ What are vector databases? ⭐ What do you need them for? ⭐ Who is in the market?

Includes a comparison matrix of vector database options like Pinecone, Milvus, Vespa, Vald, Chroma, Marqo AI, Weaviate, and Qdrant

With 350M+ USD invested in AI / vector databases in the last months, one thing is clear: The vector database market is hot 🔥 Everyone, not just investors, is  interested in the booming AI market. While AI applications have dominated the news for quite some time, the infrastructure software that supports these applications, such as vector databases, is finally gaining attention too.  In the following, we’ll have a look at why vector databases are gaining attention and compare current vector database alternatives.

What is a vector database? 

A vector database stores vectors, or more precisely vector embeddings. A vector database therefore is a specialised type of database designed to store and manage large sets of vectors efficiently. However, the challenge and value are not derived from simply being able to store vectors. The value is created by the type of computations that can be run over the stored vector data and the speed with which these computations can be run, e.g. similarity searches. 

Vector databases are essentially an important piece of the AI tech stack. They can be used e.g. to give LLMs (Large Language Models) – or more broadly speaking, AI applications – a long-term memory and faster search and querying capabilities. Another important use case is RAG (Retrieval-Augmented Generation).

To give some context: The most traditional databases, SQL databases, store data in rows and columns; graph databases store graphs and object databases store objects.

Because Large Language Models and AI applications rely on vector embeddings, vector databases are especially apt at supporting AI applications. 

Accordingly, vector databases are becoming a critical layer in the AI tech stack; they are sometimes also called “AI databases”. However, databases tend to converge over time, meaning that many databases support several different database models.

What is a vector embedding?

A vector embedding is a list of numbers that represent objects and relationships, allowing unstructured data (such as images) to be searched and used. Typically, Large Language Models (more precisely the underlying Machine Learning (ML) algorithms) are used to create these vectors. The ML algorithms analyse large amounts of data to learn how to represent complex / unstructured data in a lower dimensional space (as vectors).

What does it have to do with nearest neighbour search?

Searchability (making unstructured data usable) is at the heart of this concept. The nearest neighbour search is therefore a key concept in vector databases. The distance between vector embeddings expresses the similarity of the vectors (and thus the represented objects). Therefore, as you are searching for the most similar data, the so-called “nearest neighbour search” is a key concept and the time required to find the nearest neighbours is essential. 

Do we need special vector databases?

There is already a discussion going on about whether special vector databases are needed or do not warrant a new category in the database landscape. Instead, vector extensions of traditional databases could be supporting the AI market. Both are reasonable expectations, and time will tell. Notable databases that have already added a vector extension include e.g. redis and elasticsearch. Additionally, more and more databases now allow storing vector types.

How does the vector database landscape look like?

To have a look at the current market situation, we are comparing the choices with the most traction, but excluding established players that have added vector capabilities to their existing database offering. Generally speaking we see a lot of very young companies, some companies that did pivot from their original specialization, and massive fundings. Please note: the table is not optimized to be readable on mobile or small screens (there just is a trade-off between providing the information and making it readable on every device).

If you’re on mobile, use this link to view a version that is readable on mobile.

  Open Source License GitHub stars  Developed in (language) Summary Business Model Embeds / Uses founding date / first released date In-memory Unterstützung Sharding Index Types Consistency Model Benchmarks (Performance?) Queries per second (using text nytimes-256-angular) Latency, ms (Recall/Percentile 95 (millis), nytimes-256-angular) Approximate Nearest Neighbor (ANN) Vector Databases Funding Who's behind it HQ in 
Marqo AI Y Apache-2.0 2.8k ⭐ Python A tensor-based cloud-native commercial Open Source search and analytics engine. Open SaaS Tensor-based   Y HNSW   -   undisclosed preseed in May 2022 S2Search Australia Pty Ltd 🇦🇺
Weaviate Y BSD 5.6k ⭐ Assembly, C++, GoLang Weaviate is a commercial Open Source cloud-native vector database that stores both objects and vectors. Open SaaS started in 2018 as a traditional graph database, first released in 2019 N Y, static sharding a custom HNSW PQ algorithm that supports CRUD Eventual Consistency not comparative, just evaluating their own performance  791 2 Y (multiple ANN algorithms as long as they support full CRUD) 67.7M USD, series B SeMI Technologies 🇪🇺
Chroma Y Apache-2.0 4.4k ⭐ Python & Typescript Chroma is a Commercial Open Source vector database Preparing a (Partly Open) SaaS model* [Commercial Open Source] HNSW lib, DuckDB; based on ClickHouse looks like 2022 N Dynamic segment placement       Y 20.3M USD, seed Chroma Inc. 🇺🇸
Qdrant Y Apache-2.0 6.6k ⭐ Rust Qdrant is a Commercial Open Source vector similarity search engine and vector database Open SaaS RocksDB first released: 2021 Y Y, static sharding HNSW (SQ & PQ) Eventual Consistency, tunable consistency compares to weaviate, milvus, elastic (note: redis took too long to complete) 326 4 Y 9.8M € Qdrant Solutions GmbH 🇪🇺
Milvus Y Apache-2.0 18k ⭐ GoLang & Python Milvus is a cloud-native Commercial Open Source vector database (Partly Open) SaaS* [Commercial Open Source] Initial blog post from them said SQLite, but meanwhile they said RocksDB - exchanged?
they also have a ChatGPT-Cache that is build on SQLite
and say "Milvus uses SQLite or MySQL to manage metadata"
founded 2017, first released: 2019 N Dynamic segment placement ANNOY; HNSW; IVF_PQ; IVF_SQ(; IVF_FLAT; FLAT; IVF_SQ8_H; RNSG Strong, bounded staleness, session, and eventually. The default consistency level in Milvus is bounded staleness.  not comparative 2406 1 Y 113M USD, series B Zilliz 🇺🇸
Vespa Y Apache-2.0 4.4k ⭐ Java & C++ Vespa is a Commercial Open Source vector database by Yahoo! It is a search engine which supports vector search, lexical search, and search in structured data Open SaaS Originally a web search engine (alltheweb), acquired by Yahoo! in 2003 and later open sourced as Vespa in 2017; sinde Oct 2023 spinoff, raised series A in Nov 2023 maintains disk and memory structures for documents Y Custom HNSW (Multi-vector hybrid HNSW-IF) Eventual Consistency not comparative  Y Spinoff from Yahoo! in Oct 2023, then raised a 31M USD series A Yahoo! 🇺🇸
Vald Y Apache-2.0 1.2k ⭐ GoLang Vald is a cloud-native Open Source distributed approximate nearest neighbor (ANN) dense vector search engine Community project, currently looks like no commercial interests are pursued uses the vector search engine NGT Technology incubation at Yahoo! Japan Corporation, development was stared in 2019 N/A N/A N/A not comparitive, but Vald performance only Y (NGT) - Yusuke Kato (Yahoo Japan Corporation), Kiichiro Yukawa (Yahoo Japan Corporation) 🇯🇵
Pinecone
N Proprietary NA   Pinecone is a fully managed vector database that specializes in enabling semantic search capabilities SaaS built on top of Faiss first released in 2019 N Y proprietary Eventual Consistency more programming language comparison for vector databases 150 (for p2, but more pods can be added) 1 (batched search, 0.99 recall, 200k SBERT) Y (proprietary), plus KNN (with Faiss) 138M, series B Pinecone Systems Inc 🇺🇸

Want to know more about the vector database market?

Here are some more questions answered for anyone interested

What is an "Open SaaS" business model?

Software as a service (SaaS) refers to software that is managed / hosted for the client and is essentially “rented.” The open in Open SaaS refers to the open source software that is being offered as such a service.

This frequently implies that not all code is open source, particularly that which is part of the managed service / hosting and associated value-adding features. Note: The open source software offered in this manner may or may not be provided by the company providing the software as a service. This has caused some friction in the open source community, as original creators often struggle to make a living, and/or maintainers struggle to keep maintaining the software – while other companies profit. Most famously, huge cloud providers have taken advantage of this option, leading to new licenses that keep the source open but restrict others from hosting as a service without donating the whole source code back to the community.

Why should I care about index types?

Indexes are essentially a way to speed up searching a database. There are several established index types for vector databases and they affect the performance of the database, e.g. the time it takes a query to complete.

What about benchmarks?

You will see, if you review the benchmarks given at the top, that results typically vary. Benchmarks are difficult to do and neutral benchmarks even more so. Certain use cases may favor certain solutions. Therefore, ideally you benchmark based on your specific use case…. but as a first evaluation, try to understand the basic influencing factors and have a look at a handful of benchmarks and explanations. Having said all this: There is a benchmarking tool available for approximate nearest neighbor (ANN) algorithms search. If you use this, you can compare the performance of different databases (with regards to the ANN search)  for the same setup, based on the same approach. Also: The underlying libs often used by databases (like NGT and HNSW, see above) have already been benchmarked with it and you can compare to these directly.

Why is the market so hot, how can companies raise so much money?

AI is hot, everyone agrees that data and its management will be key to future success, and the database market is interesting: It is a long established market with many players, yet still demonstrating continually good growth (e.g. 17% in 2020). And the database market history shows that from time to time a new type of database comes up, and with it, the creation of a new market category. In such a market, typically the market creator “takes all” (not quite literally, but such a significant share, definetely the vast majority, that all other players are not attractive from a VC-perspective). Such a market could easily be worth 100M+ in ARR. Examples from the last 20 years: MongoDB (NoSQL databases), Cockroach (NewSQL databases), Neo4J (Graph databases), Influx (Time-Series databases). So, VCs are looking to find the next new type of database that can create a market… Maybe it will be vector databases? However, the database market has also shown to take 10 years+ for players to become profitable, so expect a longterm game. The race is still on for Edge Databases we think 🙂

Want to know more about the database market?

We recommend checking out db-engines. The website compares all relevant systems and has tons of data from the last 20 years. Note: They do only add databases once they have some traction and notability, not any hobby project. Accordingly not all databases of the above comparison have been added to the website yet.

Building a Business on Open Source

Building a Business on Open Source

What is open source software?

For the sake of unambiguity: Open source software (OSS) primarily means that the source code of the software is accessible and users are free to use the code as they please. Depending on the license, you might be expected to attribute the source code to the authors and / or commit code enhancements back. Note: It’s “free” as in “freedom” not as in “free beer”. 

opensourceneedsmorebalance

Open Source and Commercialisation?

The origins of open source did not entail commercialization thoughts. However, in the last 20 years a lot of things have changed, and open source projects have seen commercial successes – though not always by the creators and maintainers… Open source is in its core tied to a philosophy and value set for many people. Simplified: For the developer community by and large open source is considered to be “good”  versus proprietory source code is considered to be “evil”.

In any case, open source is one way to keep up an active vibrant developer ecosystem that empowers individual developers as well as startups and smaller players. Open Source is actually one piece of the IT ecosystem that helps balance the Big Tech and drive overall innovation. However, we also believe the open source ecosystem needs more balance to be successful longterm. If widely used open source repos cannot even sustain the half or full developer resource needed to maintain them, then there might well be a flaw in the system. If startups cannot build a business around their widely used open source code to sustain it longterm, it is to the disadvantage of the community, especially for the individual developers and SMEs. And likely, the learning at some point will be to keep the source closed instead.

In the following we will share, why we believe now is the unique opportunity to add fairness and balance for the value creators to the open source ecosystem to keep that ecosystem thriving and successful longterm.

What do we mean with “building a business on open source”?

In many talks with many people, we found there’s at least two diametric conceptions of building a business on open source:

1) using open source software for free and building something around it to earn money
2) developing a solution and open sourcing it or parts of it as part of the business model

In this article, we mean the latter and it inherently entails contributing a useful part of a solution to open source. For some open source enthusiasts a company needs to open source everything to be an open source company, and that’s ok. It is just our definition for this article.

A look at the market – the struggle of open source businesses

The Open Source Gold Rush: Success Stories

In the last years there have been many open source success stories, e.g. MongoDB, elastic, Cloudera all IPOd very successfully. There seemingly is a lot of money in open source businesses, e.g. a study by Fraunhofer concluded that “the EU economy is hugely benefiting from global OSS.” [1] Also, companies and big corporations are way more open to work with open source software, indeed 2020 was the first year where open source databases were on par with closed-source databases with regards to corporate adoption (see chart). [2]

And a recent (2021) report showed that across 17 industries, from 1,546 codebases 98% contained open source code. [3] There even is a bit of a hype that open source is the path to success. Now that it’s clear that it is possible to build a business with open source software, VCs also are more open to funding open source businesses. An Andreessen Horowitz report reveals that OSS companies have raised over $10B in capital with a trend towards bigger and bigger deals. [4] Annual invested capital in open-source and related dev tools has increased at around 10% CAGR over the last 5 years. [5] In the years 2018 and 2019 acquisitions, mergers, and IPOs from open-source companies generated over 80USD billion liquidity value according to Bessemer Venture Partners. [6]

The struggle of turning Open Source into a Business

GitHub Sponsorship fail Historically, open source companies have struggled with turning open source adoption into monetary success, “less than a decade ago open source was considered almost impossible to monetize.” [7] Sadly, that’s still a reality today for many open source maintainers and companies alike. Lots of open source maintainers with widely used open source code (“successful open source”), cannot get enough financial support to maintain the code. Of course, there are some successes, but in the end that might also be a question of ratios. For example, in 2020 GitHub reported having more than 190 million repositories. Even if only 10% of those do want to build a business on top of their code, how many of those see a financial reward? Gut feel: Far less than typical startup success odds. On top: What looks successful from the outside, might not really be a viable self-sustained business. Despite its many users, MongoDB spent $100M on development, and it took them more than 10 years to become profitable according to their own statements. [8] 

db-enginesMariaDBvsMySQL A lot of tech companies struggle with – and spend a lot of time on – all the decisions around an open source business model. It isn’t easy, read up how GitLab struggled with finding a business model, or look closer into the MySQL story, and the MariaDB journey (which is a MySQL fork by the founders and original authors of MySQL); look at blog posts from CockroachDB, MongoDB, or elastic on open source – and what you see is a constant re-positioning of open source strategies.

As Mike Volpi from Index Ventures noted at the Index Open Source Summit (2021): “It took Mongo DB 10 years to derive the business model they run now and monetize successfully…” Wow, 10 years to somewhat successful monetization – and that is one of the major open source success stories.

Open sourcing your main technology as a strategy

In this article, we take a deeper look at open source as a pro-active business strategy.

open-source-traction-growth-business Open Source to Build Traction

Traction is the most obvious reason to open source your product. It works like Freemium in the Mobile Games market – or more generally the Mobile Apps market. It’s a great way to evaluate product-market-fit and build traction. When you have that, you can think about monetization.

However, there is a big difference between giving something away for free and open sourcing it. If we stay in the mobile app world: Would open sourcing the app help with traction? Would it jeopardize the business model? Unless the main target users are developers, at least in the beginning likely not – less than making the app / game available for free in any case. However, once the app grows at amazing pace, open source availability could become a challenge in several respects.  

The most obvious would be fast followers entering with that same game and potentially much bigger marketing budgets and better customer access (e.g. on the apps store). Think what would have happened if WhatsApp would have open sourced all its code from day 1 on top of giving the app away for free? It is a legit hyothesis that a fast follower could have scraped some of the market, changing the whole story. On the other hand, if they would open source all their code base now, how much would it harm them? At some point, it beame all about the traction, brand, customer access, so, I would think, it wouldn’t harm them at all at this point. So, driving traction with open source is probably only a viable idea if you address developers or engineers. It’s clearly a phenomenon of the developer-led landscape, and acts as a developer distribution channel. This being said, the price of open source traction is commercialization. It’s a straight forward trade-off: The more open and free your license is, the harder it is to monetize later on. 

building-trust-open-source Open Source to Build Trust

Trust is something that is likely more important for certain software types (e.g. B2B and core tech).

ObjectBox is a database and with that it is a data-centric “core technology” / software infrastructure, sitting at the heart of a company’s solution. Anything that gets used at the heart of other companies or their solutions needs a lot of trust. Trust is easier to come by with size, “no one was ever fired for choosing SAP.” Being a small startup lies at the opposite on that spectrum for many decision makers. Open Source can be a way to overcome this specific challenge and build trust in three ways:

  1. Transparency: The freedom to verify what the code enables; the internal developer team can check the code and vouch for the solution 
  2. Risk-reduction: The freedom to change and maintain the code oneself gives independence from the authors and the success of the solution
  3. Quality: If an open source solution is actively used by a large number of developers quality inevitably goes up 

So, if you are looking for adoption from big players in heavily regulated or security-concerned industries, e.g. medical, manufacturing, automotive, anything with mission-critical networks, open source can help you overcome many of the adoption hurdles you are facing.

open-source-ip Open Source as an IP Strategy

Seems counter-intuitive, right? Well, if you are not aiming to patent your technology, you still might not want someone else (who has been working on the same problem) to patent the same technology harming your freedom to operate. You can protect yourself from that risk by open sourcing it. This can come in the form of a copyleft license, designed to encourage further innovation advancements to the benefit of all, but also limiting the commercial exploitation opportunities for everyone. Or, you can choose a more permissive license, allowing people with commercial interests to keep any advancements they make to themselves. 

Note: Open source code is not a blueprint with exact instructions; there are no obligations to provide clear docs or explanations. While a majority of open source projects strive to deliver a code base that is readable by others, it is not controlled. So, while open sourcing a technology harms patenting it, unfortunately, a way to still protect it, is making it hard to understand. On the other hand, a patent must have an extensive explanation. This makes it easily repeatable by others in the future, after the end of the patent protection, or as a basis for further research (and ways to tweak it in a novel enough way). 

Although it often feels like open source is on the other spectrum of patents, a patent has a limited timeframe and people can learn from it even before it expires. The deal is basically an exchange of knowledge (to be used in the future) for protection (for commercially exploiting it). Keeping it a trade secret has other risks, but could mean that an invention wouldn’t be shared with others for a truly long time. And of course the protection encourages big companies to invest big budgets in R&D too. Delayed open source actually has many similarities with a patent, in both cases the tech is only made available for advancements and unrestricted use after a certain time frame has ended. 

Open Source for the sake of it

There are a lot of ideas floating around open source, and some pressure from the developer community to open source everything. Among developers, open sourcing is considered to be good, social, fair, transparent, and worthy. While there are many advantages in open source, it has turned into a kind of “political tool”, and that’s a downside – and probably the opposite of the original idea. 

Consideration 1: How is a great software supposed to be maintained and advanced without anyone providing funds? When MMOGs (Massive Multiplayer Online Games) became a thing, people understood that there was a constant cost associated with it and were willing to switch from a one-off fee to monthly payments. Software typically needs to be maintained too. So, there are ongoing development costs associated with a piece of software, even if it is not hosted. So, who benefits from open source in the end, if the original creators cannot keep up their work (assuming they need to eat and sleep)? Before pushing everyone to open source, maybe read here, here, here, or here about open source maintainers struggling under the pressure and dealing with burnout.  On the flip side, if a company markets itself heavily as an “open source company”, they should give considerable parts of their own value creating solution back to the community. Using open source tools and building on top of open source code (and even committing back to these solutions) does not mean you are an open source company: If you want to reap the marketing benefits of calling yourself an “open source company” then you should truly be one and commit your value back to open source.

Consideration 2: Who benefits if another company pulls the repo, adds “sparkles”, maybe even some “missing features”, or merely a big “brand name”, or the “marketing budget” and makes a ton of money selling the solution? This is of course assuming a permissive license was used. Well, from an open source perspective that is perfectly fine, and part of the intention of open source. So, it’s great, right? We think, it is easy to understand that some authors who have put all their “free time” / unpaid time into that code struggle to accept when this happens, especially if they have a hard time supporting themselves. But we also understand that big companies with investors (stakeholders…) that have invested heavily in R&D and might or might not yet have reached profitability, don’t really like to see this happen. Unless you are really in it for the fun and driven by altruism and will be in perfect harmony with other people using your code to make money, you should look closely if and how you want to open source your code.

Open Source to save development costs

There is the idea floating around that you can develop your project for free using the open source community. We doubt it works out for many. Of course, if Google maintains a repo that is a base technology used by many developers, developers might want to commit something (anything really) for fame, to be part of it, maybe to get noticed. However, the “anything really” is already a problem: Someone needs to review the submission, respond, potentially rework it and so on… Most other repos will probably not get too many commit requests (let alone from the best tech talent around). Even then, onboarding a large community of unknown developers and letting them commit to your code has its challenges – especially if you are quality-conscious and / or trying to build a business. It creates a lot of work to review commits and reject / merge them. And on top of that from a legal perspective you need to have a waterproof contributors license signed by anyone committing. There clearly is some work involved in the process, maybe more than what it is worth sometimes. 

Also consider this: Most successful open source projects that turned into a business success have limited contributors and / or only internal (contracted) contributors. For example, SQLite 99% of the code was done by Richard Hipp (author and founder of SQLite), and MongoDB stated that about 98-99% of the code was done internally. Redis was almost exclusively coded by Salvatore Sanfilippo. In a presentation from Index Ventures (one of the most renowned open source VCs), one criteria for potentially successful open source businesses was that at least 90% of the code base was developed internally – and of course that the team owned all the IP. If you are after cheap development and external help with your project, maybe take a closer look if open source is the right path.

What open source business models exist? 

The following open source business models are common, but typically used in combination and not as pure models, e.g. most open source companies offer paid support, but rarely only paid support. Note: With time the examples may become wrong/outdated, because once you look into it, you will notice that companies adapt / change their model regularly. If you need to understand one specific company’s model you need to dig into it individually at that time.

There are three basic open source licenses to be distinguished: permissive, weak copyleft and copyleft.

A quick high-level note on the major license effects

Copyleft – major point is that derived works must be open sourced with a compatible copyleft license, meaning any advancements and changes to the work will be contributed back to the community and freely available for unrestricted use.

Weak Copyleft – the weaker copyleft refers to licenses where not all derived works inherit the just described copyleft effect; typically used in software libraries, e.g. a database library used in app development, so the library can be used in a mobile app without needing to contribute the whole app to open source; only changes to the database library itself would carry the copyleft effect.

Permissive – a permissive open source license allows you to do anything with the source code including keeping derived works to yourself and commercialising on it

Description Examples Note
Paid Support Providing paid support, trainings, certificates RedHat Where has this approach been working – as a pure paid support approach – ever since Red Hat?
Open Core The core product is free and open source, extra features are paid; have an open-source core and sell closed-source features on top of it SugarCRM,
MySQL
It is basically the widely successful freemium model just with open source; typically you expect the large majority of users to use it for free. The open source part of course enables anyone to build the same features as you
Dual Licencing The free open source sw uses a copyleft license, whereas the paid license is a commercial license without copyleft effects MySQL,
elastic
This kind of license enables you to monetize your commercial (typically bigger users) and still enables the community to expand the product landscape and innovate based on the code base
Delayed Open Source All code will be fully open sourced with a time delay (details and timings vary) MariaDB,
Cockroach DB
The effect depends also on the licenses used, but typically it protects you from competition for a given time frame, so only you can exploit your development commercially and gain market share / develop an advantage based on market entry time. At the same time it reduces the risk for adopters, because they know the code will become available to them
Open SaaS Offering the software open source and hosted as a service (SaaS), which is the primary source of revenue allowing anyone to do the same with the software with a permissive license (self-host or host for others) WordPress,
Sharetribe,
MySQL,
MariaDB
This model has been the major point of discussion in the last 3 years and is seen by many as the holy grail for monetizing open source software; it also triggered many companies to move away from an open source licensing model as large cloud providers can easily host an open source product at better rates
“Closed SaaS” Strictly speaking / officially not “open source”. Offering the solution open source and hosting it as a service (SaaS) while NOT allowing anyone to host it, often times unless they contribute the whole solution back to open source (copyleft effect)) MongoDB,
elastic,
Cockroach DB
The first license that built this specific copyleft-effect into its license was MongoDB (SPSL). The license has since been adopted by e.g. elastic, …. Since then similar licenses have been developed. OSI did not approve the license as an official open source license.
“Ad model” For lack of a better name, I called it “Ad model”; it’s really having so much reach and traction that companies pay for customer access through your solution or similar co-operations AdBlock Plus,
Firefox
Can take many variations: For instance, the open-source application AdBlock Plus gets paid by Google for letting whitelisted acceptable Ads bypass the browser ad remover.
Or, in 2014 Yahoo struck a deal with the Mozilla Corporation to make Yahoo the default search engine in Firefox

 

A look at the open source market

Name Founding Year Funding Summary Started with Open Source (license) Open Source Evolvement Devtool Open to contributions / CLA HQ* Notes / Story synopsis
MongoDB 2007

6 funding rounds with a total of $311M

IPO was in autumn 2017; valuation $1.6B

started with AGPL Created SSPL in 2018 causing much debate in the community. SSPL is not an open source license Database “we own 100% of the IP”; 99.9% developed in house and the few contributions accepted were from people who signed a CLA US-based According to statements fromMongoDB, adoption went up after the license change (15 mill dwlds, more than in the prior 10 years together). In 2016 they launched their database-as-a-service offering, which is considered the game changer w. regards to building a business. Until Oct 2017 MongoDb downloads were >30M with 10M from the prior 21 months.
Data Bricks 2013 Total funding 1.9B; last round: Series G; Feb 2021 $1B proprietary PaaS their main service is proprietary, but they use a lot of open source software and have a strong footprint in the open source community Backend NA US-based “Databricks is the original creator of some of the world’s most popular Open Source data technologies” – open source is a large part of their positioning and marketing. However, it seems their main offering, while based on open source, is proprietary. So, not an open source business as defined here.
elastic predecessor released in 2004; first elasticsearch released in 2010; incorporation only in 2012 Total funding $162M; last round was a series D; elastic did IPO in autumn 2018 started with Apache 2 for for elastic search (which was the original main product) Last license change in 2021: You can now choose between the proprietary elastic license or SSPL; so stritly spaking not open source anymore Devtool CLA US-based 2018: elastic IPO –> shares doubled the first day. Note: With so many different products (not a single product company), the open source strategy is harder to grasp.
Confluent 2011 Total Funding Amount $455.9M, last round: series E Unlike Apache Kafka which is available under the Apache 2.0 license, the Confluent Community License is not open source and has a few restrictions Kafka is open source,
Confluent isn’t
Devtool NA US-based “Founded by the team that originally created Apache Kafka” – the team behind Confluent contributed a lot to open source prior to Confluent, but the Confluent code itself isn’t open source as far as we understand. They heavily rely on other open source software for their tech stack though.
RealmDB 2011, before the founders did “TightDB” on which the Realm DB was based 4 investment rounds. Then MongoDB acquired them for $39M on Apr 24, 2019 started out closed; then open sourced the database and went for the open core model, then subsequently open sourced the Sync solution too, going for the hosted (SaaS) model from closed to open core to open SaaS; acquired by Mongo to push their backend offerings and complement with an edge and sync (serving Mobile and IoT better) Database looks like they accepted contributions Started in Europe, but HQ went to the US when joining YC 2014; it was since bought bei MongoDB The founders both left the company the year before it was acquired by MongoDB. The acquisition prize was a little less than what Realm had raised in the years before. The Sync solution is now tied to using the Mongo servers / cloud and a huge part of their push for the IoT market.
SQLite 2000 Bootstrapped Public Domain, which we always considered one of the most “open source” ways to open source but in the light of recent discussions around the SSPL license, strictly speaking it is at least not OSI-approved Public Domain, mainly monetize big corporates for being in a Consortium; also offers services and since xxxx? encryption (basically paid feature); our guess is that this is not really a repeatable business model Database Richard Hipp owns all IP, 99% is developed by himself; very limited outside support (2 part-time freelancers that we are aware of, both don’t have any rights to the IP) US-based (privately held by Hipp, Wyrick & Company, Inc (author: Richard Hipp and all stock held by his wife G. Wyrick; both work for the company)), HQ The company has always been and still is run by Richard Hipp and his wife; from a development perspective it is a one-man-show. Richard wrote SQLite himself, as far as we are aware they have no other employees apart from 2-3 part-time supporters for specific versions; very special Open Source Story.
Couchbase Lite 2009 – Couchbase, Inc. is a merger of Membase + CouchOne in 02.2011; both former companies were started 2009 and had funding 251 million USD total funding; 8 rounds with latest Series G for $105 million Apache 2 Delayed Open Source Database US-based (both entities were US-based already before the merger) Couchbase now mainly sells Couchbase Servers; Couchbase Lite is the smallest part of their business; in 2020 there seemed to be a shift towards the Sync Gateway and Edge Computing market in communication; however, the main business still seems be on the server side and based on cloud lock-in.
redis 2009 Total Funding Amount $246.6M redis the database itself is and always was BSD; redislabs is the company that has secured certain rights for redis and sells extensions and add-ons under several licenses, they changed from APGL to Apache 2.0 with Common Clause to a proprietary license called “Redis Source Available License” redis itself is BSD but features / extensions around it from RedisLabs are licensed uner prorietary licenses Database Any contribution needs a CLA that is provided by redislabs; we believe anything committed under this CLA could also be used in redislabs proprietary products (which typically is the same for anything committed under a permissive license, but which has attracted some criticism from the OSS community) Redislabs is US-based. Salvatore Sanfillipo (antirez) was always bsaed in Europe; redislabs originated in Israel RedisLabs is the commercial entity that markets redis; redis was largely developed by Salvatore Sanfilippo. He left redis as a maintainer in 2020.
RedHat 1993 bought by IBM in 2019 for $34 billion; before that they had raised $240.7M Linux, which was the core of the success of RedHat, is GPL (though of course not the company’s decision) RedHat is a huge company, definetely not a single product company, and thus also does not really fit into this matrix, however, it is THE example for successful commercialisation of open source and we feel the matrix would lack without it Backend / Data centric we believe you can contribute to most (all?) projects without a CLA US-based Read here why there will never be another Red Hat (and there is no “Red Hat Model”). Note that of course the Red Hat founders did not write Linux (on which the majority of their success is based), but at the very least they (as well as VA Linux) gave option shares to Linus Torvald out of gratitude (at lest not out of obligation). When both companies successfully IPOd, Linus made 20 Mill USD (in total) from both sales.
MySQL 1995 (development started already in 1994) Total Funding Amount $39.8M, sold to Sun in 2008 for 1 USD billion started out with AGPL; several license adaptions and changes in the open source business model over the years, e.g. for a long time they had a 2 year delay for the open source version, but changed that to no delay at some point. Dual Licensing and Paid Support Database Yes, even though called OCA (Oracle Contributor Agreement) Sweedish company until it was acquired by Sun Microsystems in 2008 (who then were acquired by Oracle) The founders forked the latest MySQL version when Oracle acquired it. Most of the original database code base was developed by Michael Widenius; with regards to database technologies a pattern emerges: Often the core / most of the base technology is developed by one person – as building a database is a rather huge endeavor that’s kind of striking, isn’t it? BTW: MySQL is named after Monty Widenius daughter (“My”)
Hyper 2010 (academic research project at TUM) undisclosed proprietary, not open source None Database NA EU-based; German “university spinoff” acquired by Tableau very early 2016: HyPer acquired by Tableau. Terms of the deal were undisclosed
ParStream 2011 acquired by CISCO in November 3, 2015 proprietary, not open source NA Database NA Originally EU-based (German), then moved to US in 2012, acquired by Cisco in 2015 Cisco ParStream is no longer offered as a stand-alone product. The functionality of Cisco ParStream is now part of Cisco Kinetic.
Cockroach DB 2015 Series E in Jan 2021 for $160M Apache 2.0, plus a proprietary license for enterprise features Started as open core, now a form of closed SaaS with delayed open source: They changed to a proprietary license in 2019, called BSL, which prohibits users from offering CockroachDB as a service (DBaaS, SaaS), and each release converts to an open source license after three years. CockroachDB is therefore officially not considered open sorce anymore Database CockroachDB received significant contributions from the community (“we have had over 1590 commits from over 320 external contributors across all our open source repositories” (2020)), CLA: Yes US-based In June 2019, Cockroach Labs announced that CockroachDB would change its license from the free software license Apache License 2.0 to their own proprietary license, known as the Business Source License (BSL), which forbids “offer[ing] a commercial version of CockroachDB as a service without buying a license”, while remaining free for community use.
Berkeley DB 1994 Acquired by Oracle in 2006 BSD and Sleepycat Public License (a permissive OSS license) Oracle changed to dual licensing with APGL and a commercial license Database NA US-based It is still used in many routers and gutfeel is that the market share in that specific area is good. Unfortunately, no numbers available.
GitHub 2008 In 2018 Microsoft bought GITHUB for $7.5 billion. proprietary, not open source NA Backend / Data centric NA US-based Microsoft bought GitHub for the developer access; that would not have changed if it would have been open source and I do wonder what would have happened to GitHub if it would have been open source; one thing is for sure: GitLab wouldn’t have been able to position themselves as the open source alternative; however: the closed source model worked for them well, even though it is a developer tool.
GitLab development started in 2011; incorporated only in 2014 $434.2M Series E completely open source (MIT license) Now: Open Core Model; Community Edition: MIT License
Enterprise Edition: Source-available proprietary software
Backend / Data centric Originally CLA, now dropped and instead the code must be committed under the same license as the feature is (mainly Apache 2.0) plus a DCO US-based (development was started in Europe, the founders incorporated in the US in 2014 when joining YC) GitLab used being open source as a strong positioning factor against GitHub (which was never open source). It was an odyssey to find a sustainable business model (and it seems it is not SaaS). Note: The pure service model and the donation model did not work for them. Again: The code base of the core system was by and large developed by one person.
MariaDB 2009 Total Funding Amount $123.2M Dual licenscing with GPL license, version 2 and a prorietary source available license for some parts They evolved their dual licensing approach to using the proprietary source avaiable license (BSL) Database Yes, and the CLA is shared under a creative commons license that allows you to use it as you like https://mariadb.com/kb/en/mca/ Sweedish company 10 years after it was forked, MariaDB has 20M users, a fast growing database business and has >€100m backing. Note: The pure service model as well as the donation model did not work for them.

Building an Open Source business Exec Summary – TL; DR

  • There is a lot of evidence that open source companies struggle with open source models and licenses – this is also true for successful companies
  • There is no “Red Hat Model” – just selling services has rarely worked
  • The donation model typically hasn’t worked for open source companies, e.g. GitLab and MariaDB, so it is not astonishing that GitHub sponsorships don’t work out great for most maintainers. Also note: GitHub sponsorships may put you in a bad legal position depending on where you are based
  • There is a trend from successful open source companies towards Source Available licenses instead of “official Open Source licenses”, e.g. MongoDB, elastic, CockroachDB, …
  • There is an indication that successful open source companies are US-based (even if founded / started in Europe), which we believe is due to the funding opportunities provided in the US: 1) the US provides generally more funding (more and bigger funding opportunities; there is lots of market research on that), 2) US VCs and Silicon Valley have the reputation to also fund at earlier stages, e.g. idea stage, and companies with traction (instead of revenue), investing in a longterm perspective. Traditionally, European investors don’t.
  • Public domain is strictly speaking also not considered to be an open source license 😮 (at least not if it needs OSI-approval; does it? 🤔) 
  • While Open and Closed SaaS seem at this moment to have been the most successful models, it is no holy grail and definetely does not work for everyone, e.g. it didn’t work as the sole business model for GitLab

Conclusion

The open source market lacks flexibility and transparency from a licencing / legal perspective, and ever more Source Available licenses don’t help: A “license stack” with building blocks like the Creative Commons would be helpful to mark software easily and clearly with regards to the main terms, e.g. “source available”, “free for commercial use”, “attribution necessary” etc. It would help maintainers and users alike, but needs bigger entities to drive this (like an OSI).

The open source market also needs more balance, at the very least more understanding and “love” towards maintainers. More finanical support as well as other ways of giving back to demonstrate the appreciation of well-maintained repos and great free software, will keep the ecosystem healthy and thriving. That’s a community effort; everyone can contribute.

Flutter databases –  hive, ObjectBox, sqflite (+ Drift, floor)

Flutter databases – hive, ObjectBox, sqflite (+ Drift, floor)

Flutter, the renowned cross-platform mobile framework, has been gaining immense popularity among developers worldwide. As the Flutter community expands, the demand for efficient Flutter databases is also increasing. Developers now have access to a range of Flutter database options that cater to various needs and preferences.

In this article, we’ll focus specifically on local storage solutions, as these are essential for enabling offline functionality, improving performance, ensuring data persistence, enhancing data privacy and security, and supporting edge computing capabilities. Furthermore, local data storage is needed to promote sustainability. Let’s dive into the current local database landscape for Flutter and compare the most popular options.

Flutter databases / Flutter Dart data persistence

While the database market is huge and dynamic,  there are only few options to choose from if you are a Flutter / Dart app developer. Before we dive into the Flutter database options, advantages and disadvantages, we’re taking a very quick look at databases to make sure, we share a common ground. 

What is a database?

A database is a piece of software that allows the storage and systematic use of digital information, in other words: data persistence. As opposed to mere caching, data is reliably stored and available to work with unless actively deleted. A database typically allows developers to store, access, search, update, query, and otherwise manipulate data in the database via a developer language or API. These types of operations are done within an application, in the background, typically hidden from end users. Many applications need a database as part of their technology stack. The most typical database operations are CRUD: Create, Read, Update, Delete.

What are the major types of databases?

There are many types of databases. For our purpose, the most important differentiations are non-relational (NoSQL) versus relational databases (SQL), cloud databases versus edge databases, and maybe embedded versus in-memory. However, databases can be further distinguished by additional criteria e.g. the data types they support, or the way they scale – and definitions can vary.

What is an ORM?

An Object relational Mapper (ORM) is not a database. We’re bringing this up mainly, because we see it confused often. It is a layer that sits on top of a database and makes it easier to use. This is typically especially relevant when the database is a relational database (SQL) and the programming language used is object-oriented. As noted above, Dart is an object-oriuented programming language.

The Flutter local data persistence landscape

There are several Flutter databases that provide offline support, offering the ability to store and access data locally even without an internet connection. Here are some of the notable options:

  • Hive is a lightweight key-value database written in Dart for Flutter applications, inspired by Bitcask.
  • ObjectBox DB is a highly performant lightweight NoSQL database with an integrated Data Sync. It stores objects.
  • sqflite is a wrapper around SQLite, which is a relational database without direct support for Dart objects. 
  • Drift is a reactive persistence library for Flutter and Dart, built ontop of SQLite. 
  • Floor is another ORM on top of SQLite.

 

What is the best offline Flutter Dart database?

This of course depends… Make up your own mind with the following comparison matrix as a starting point. Note: With very few options to choose from, the following overview is sometimes a bit like comparing apples 🍎 and pears 🍐.

Name Description Primary Model Language License Data Sync
Hive Lightweight key-value database NoSQL Dart Apache 2.0
ObjectBox Lightweight NoSQL database with integrated Data Sync NoSQL Dart Bindings are Apache 2.0
Drift ORM on top of SQLite relational SQL SQLite is public domain, Drift is MIT
Floor ORM on top of SQLite relational SQL SQLite is public domain, Floor is Apache 2.0
sqflite SQLite plugin for Flutter relational SQL SQLite is public domain, sqflite lib is MIT
<body> <p>Diese Seite verwendet Frames. Frames werden von Ihrem Browser aber nicht unterstützt.</p> </body>

Flutter Database performance benchmarks

As with any benchmark, you need to take a look at the details. We take benchmarking very serious and strive to get accurate results. Therefore, we also always open source the benchmarking code and encourage you to check it out. If you note anything that does not even out in your oppinion, do let us know. We have a long history of updating and improving our benchmarks continually and are happy to take any recommendations.

Performance Benchmark Test Setup

We used an Android 10 device with a Kirin 980 CPU to run the benchmarks as a Flutter app. The app executed all operations (ops) in batches of 10.000 objects. Each batch formed a single transaction. We ran each test 50 times. The results you see in the diagram are averages across all runs. We set it up that way to ensure that neither the Virtual Machine warmup during the first run nor the garbage collections affect the overall result significantly. 

Flutter Databases CRUD Performance Results

Summary of the Flutter Dart DB Benchmarks

Hive and ObjectBox clearly outperform sqflite across all CRUD operations. The results show ObjectBox performing with up to 70 times the speedup for create and update operations. With regards to comparing Hive and ObjectBox, the results vary more. Hive can be faster at reading objects than ObjectBox. However, strictly speaking it’s not a fair comparison, because in Hive, the high read numbers result from Dart objects already cached in memory. If the objects are fetched using the async API from disk, the numbers drop by factor 1000.

Drift and Floor were not part of the benchmarking as they are ORMs. However, it is very likely they will perform similarly to sqflite, reflecting primarily the performance of SQLite.

Flutter Data persistence – Conclusion

Recently, the Flutter database landscape has experienced significant growth and diversification. With Flutter’s increasing popularity, developers now have a number of database options available. In this article, we focused on the best local databases, comparing their features in a comprehensive matrix, and showcasing performance benchmarks. In the end, the best choice depends on the specific needs of each project. The Flutter database landscape in 2023 is a thriving ecosystem, continuously evolving to meet the changing needs of Flutter app development. One upcoming change that we can see is the rise of vector databases for AI. So, we encourage you to keep an eye on the lively market of Flutter databases not to miss any important updates.

If you want to get started learning how to use a database, we suggest you check out this video tutorial series that teaches you how to build a Flutter app with ObjectBox from scratch.

 

Introducing: ObjectBox Generator, plus C++ API [Request for Feedback!]

Introducing: ObjectBox Generator, plus C++ API [Request for Feedback!]

We are introducing the ObjectBox Generator today to simplify ObjectBox development for more programming languages, starting with C/C++. Additionally, we are releasing a brand new C++ API that goes hand in hand with the new generator. Historically, our C API was rather low level as it was focused on providing the foundation for our Swift and Go APIs. With this release we want to provide C/C++ developers with ObjectBox convenience and ease of use. 

ObjectBox Generator takes over the burden of writing the binding code and data model declaration. Based on a single input file, it generates the code for you, so you can focus on the actual application logic.

Generator Example

ObjectBox let’s you handle data as FlatBuffers. For example, you can put and get data objects as FlatBuffers encoded bytes. To work with FlatBuffers, you need to define a FlatBuffer schema file (.fbs). And this file is also the input for ObjectBox Generator. This way, everything is defined in a single location.

Let’s say we have a FlatBuffers schema file “task.fbs” with the following content:

Now, we can tell ObjectBox Generator to use this file to generate C++ sources:

This makes ObjectBox Generator to generate the following files:

  • objectbox-model.h: source code to build the internal data model, that you need to pass when creating a store.
  • objectbox-model.json: keeps track of internal schema IDs; you don’t need to worry about this except that you should put it in your source control.
  • task-cpp.obx.h: the C++ value structs (data objects), binding code for FlatBuffers and the new Box class.

C++ API Example

Now, let’s use the previously generated code and the new C++ API around the Store and Box classes. A simple CRUD application boils down to a few lines:

Note that the generated code is header-only and compatible with the existing ObjectBox C-API, allowing both to be used from the same application. The C and C++ APIs both have unique advantages: the C++ API uses RAII so you do not need to worry about cleaning up, while the C API has additional features, e.g. queries.

Open Source, Docs

ObjectBox Generator is open source and available on GitHub. The repository comes with a readme file that also serves as a documentation. Among other things, you will find ObjectBox specific annotations there, which are used in fbs files to express ObjectBox-specific concerns. For example, in the definition of Task above, we used ulong as a FlatBuffers type to store dates. However, FlatBuffers does not know what a date is and we use ObjectBox annotations to express this:

For our initial release of ObjectBox Generator and the public C++ API we decided on labeling it as version 0.9. Although we are already very close to a 1.0 and we wanted to gather some feedback before our first major release. As we can still change the API or smooth out any rough edges you may find, we cannot stress enough how much we welcome and appreciate your feedback at this point. Thank you!

ObjectBox EdgeX v1.1 – database with ARM32 support

ObjectBox EdgeX v1.1 – database with ARM32 support

With EdgeX Foundry just reaching v1.1, we continue to provide ObjectBox as an embedded high-performance database backend so you can start using ObjectBox EdgeX v1.1 right away. If you need data reliability and high-speed database operations, ObjectBox is for you. Additionally, starting with ObjectBox EdgeX 1.1, you can use it on 32-bit ARM devices.

Combining the speed and size advantages of ObjectBox on the EdgeX platform, we empower companies to analyze more data locally on the machine, enabling new use cases.

With ObjectBox-backed EdgeX we’re bringing the efficiency, performance and small footprint of the ObjectBox database to all EdgeX applications. It is fully compatible, so you can use it as a drop-in replacement: you call against the same REST and Go EdgeX APIs. As simple as that;no need to change any code.

Performance comparison of EdgeX database backends

EdgeX Foundry comes with a choice of two database engines: MongoDB and Redis. ObjectBox EdgeX brings an alternative to Redis and MongoDB to the table.  Because ObjectBox is an embedded database, optimized for high speed and ease of use while also delivering data reliability, it enables a new set of use cases. As we all know, benchmarks are hard to do. This is why all our benchmarks are open source and we invite you to check them out for yourself. To give you a quick impression of how you could benefit from using ObjectBox, let’s have a look at how each compares in basic database operations on “Device Readings”, one of the most performance intensive data points.

ObjectBox EdgeX
ObjectBox EdgeX

Note: The Read and Write operations (all CRUD (Create, Read, Update, Delete) operations are measured in objects / second). The benchmarks test internal EdgeX database layer performance, not the REST APIs throughput.

These benchmarks provide a good perspective why you should consider ObjectBox with EdgeX. Benchmark sources are available publicly in ObjectBox EdgeX github repo.

So, why is ObjectBox EdgeX faster?

First of all, you are probably aware of the phrase “Lies, damned lies, and statistics benchmarks”. Of course, you should look at performance for yourself and consider based on your specific use case needs. That’s why we make our benchmarks available as open source. It is a good starting point.

To make it easier to compare ObjectBox (in addition to our open source benchmarks) here are some of the high-level “unfair advantages” that make ObjectBox fast:

  • Objects: As you can derive from its name, ObjectBox is all about for objects. It’s highly optimized for persisting objects. The EdgeX architecture and Go sources are a great fit here as it puts Go’s objects (structs) in the center of its interface. This means, we can omit costly transformations and this helps with speed.
  • Embedded database: Redis and MongoDB are client/server databases running in separate processes. ObjectBox, however, is running in the same process as EdgeX itself (each EdgeX microservice, to be precise). This has definite efficiency advantages, but it also comes with some restrictions: Whereas you can put Redis/MongoDB in separate Dockers or machines, this option is not available for ObjectBox yet.
  • Transaction merging: ObjectBox can execute individual write operations in a common database transaction. This means, we can reduce the costly transactions for a number of write operations. This is a great way to add on performance, delaying the transaction end by single digit milliseconds.

Get started with ObjectBox EdgeX

The simplest way to get started is to fetch the latest docker-compose.yml and start the containers:

You can check the status of your running services by going to http://localhost:8500/. At this point, you have the REST services running at their respective ports, available to access from your EdgeX applications.

Find more details, instructions for ARM32, and sources in our GitHub repo at  https://github.com/objectbox/edgex-objectbox.

If you’re new to EdgeX, find out all about the open source  IoT Edge Platform here. The EdgeX project is led by the Linux Foundation and supported by many industry players, including Dell, IBM, and Fujitsu.

We love to hear from you ?

We’re very interested to hear about the challenges you are facing on the edge and in IoT. As performance experts, we are always embracing a tough challenge. Reach out to us to set up a pilot project.

Is there something you are missing? Please do reach out to us. We want to make ObjectBox the best edge data persistence layer available. We love to receive your feedback.

What next?

Find out more about ObjectBox EdgeX and get started, go directly to GitHub or download the snap on Snapcraft.