SQLite and SQLite alternatives – a comprehensive overview

SQLite and SQLite alternatives – a comprehensive overview

SQLite and SQLite alternatives - databases for the Mobile and IoT edge

Overview of SQLite and SQLite alternatives as part of the mobile / edge database market with a comprehensive comparison matrix (last updated autumn 2024)

Digitalization is still on the rise, as is the number of connected devices (from 13 billion connected IoT devices + 15 billion mobile devices operating in 2021 already). Data volumes are growing accordingly ( 3.5 quintillion bytes of data is produced daily in 2023), and centralised (typically cloud-based) computing canbot support all the current needs. This has led to a shift from the cloud to the edge.Ā 

Therefore, there is a renewed need for on-device databases like SQLite and SQLite alternatives to persist and manage data on edge devices. On top, due to the distributed nature of the edge, there is a need to manage data flows to / from and between edge devices. This can be done withĀ Edge Databases that provide a Data Sync functionality (SQLite alternatives only, as SQLite doesn’t support this).Ā  Below, we’ll take a close look at SQLite and its alternatives with consideration of today’s needs.

Databases for the Edge

While being quite an established market with many players, the database market is still growing consistently and significantly. The reason is that databases are at the core of almost any digital solution, and directly impact business value and therefore never going out of fashion. With the rapid evolvements in the tech industry, however, databases evolve too. This, in turn, yields new database types and categories. We have seen the rise of NoSQL databases in the last 20 years, and more recently some novel database technologies, like graph databases and time-series databases, and vector databases.

With AI and accordingly vector databases being all the hype since 2022/2023, the database market is indeed experiencing fresh attention. Due to the speed with which AI is evolving, we’re however already leaving the “mainframe era of AI” and entering the distributed Edge AI space. With SQLite not supporting vector search and related vector database functions, this adds a new dimension to this ever-present topic. There is a need for local, on-device vector databases to support on-device AI that’s independent of an Internet connection, reliably fast, and keeps data on the device (100% private).Ā 

We’re expecting vector databases that run locally on a wide variety of devices (aka Edge Vector Databases) to become the next big thing, surpassing even what we have seen happening in the server vector database space. And we wouldn’t be astonished if the synchronizing of vector data is a game changer for Edge AI. Time will tell šŸ˜‰


Both, the shift back from a centralised towards a decentralised paradigm, and the growing number of restricted devices call for a “new type” of an established database paradigm. SQLite has been around for more than 20 years and for good reason, but the current market shift back to decentralized computing happens in a new environment with new requirements. Hence, the need for a “new” database type, based on a well-established database type: “Edge databases”. Accordingly, a need for SQLite alternatives that consider the need for decentralized data flows and AI functionalities (depending on the use case of course; after all SQLite is a great database).

database-evolution-towards-edge-vector-databases
What is an Edge Database?

Edge databases are a type of databases that are optimised for local data storage on restricted devices, like embedded devices, Mobile, and IoT. Because they run on-device, they need to be especially resource-efficient (e.g. with regards to battery use, CPU consumption, memory, and footprint). The term ā€œedge databaseā€ is becoming more widely-used every year, especially in the IoT industry. In IoT, the difference between cloud-based databases and ones that run locally (and therefore support Edge Computing) is crucial.

What is a Mobile Database?

We look at mobile databases as a subset of edge databases that run on mobile devices. The difference between the two terms lies mainly in the supported operating systems / types of devices. Unless Android and iOS are supported, an edge database is not really suited for the mobile device / smartphone market. In this article, we will use the term ā€œmobile databaseā€ only as ā€œdatabase that runs locally on a mobile (edge) device and stores data on the device”. Therefore, we also refer to it as an ā€œon-deviceā€ database.

What are the advantages and disadvantages of working with SQLite?

SQLite is a relational database that is clearly the most established database suitable to run on edge devices. Moreover, it is probably the only ā€œestablishedā€ mobile database. It was designed in 2000 by Richard Hipp and has been embedded with iOS and Android since the beginning. Now letā€™s have a quick look at its main advantages and disadvantages:

Advantages Ā Disadvantages
  • 20+ years old (should be stable ;))
  • Toolchain, e.g. DB browser
  • No dependencies, is included with Android and iOS
  • Developers can define exactly the data schema they want
  • Full control, e.g. handwritten SQL queries
  • SQL is a powerful and established query language, and SQLite supports most of it
  • Debuggable data: developers can grab the database file and analyse it
  • 20+ years old ( less state-of-the-art tech)
  • Using SQLite means a lot of boilerplate code and thus inefficiencies ( maintaining long running apps can be quite painful)
  • No compile time checks (e.g. SQL queries)
  • SQL is another language to master, and can impact your appā€™s efficiency / performance significantlyā€¦
  • The performance of SQLite is unreliable
  • SQL queries can get long and complicated
  • Testability (how to mock a database?)
  • Especially when database views are involved, maintainability may suffer with SQLite

 

What are the SQLite alternatives?

There are a bunch of options for making your life easier, if you want to use SQLite. You can use an object abstraction on top of it, an object-Relational-Mapper (ORM), for instance greenDAO, to avoid writing lots of SQL. However, you will typically still need to learn SQL and SQLite at some point. So what you really want is a full blown database alternative, like any of these: Couchbase Lite, Interbase, LevelDB, ObjectBox, Oracle Berkeley DB, Mongo Realm, SnappyDB, SQL Anywhere, or UnQLite.

While SQLite really is designed for small devices, people do run it on the server / cloud too. Actually, any database that runs efficiently locally, will be highly efficient on big servers too, making them a sustainable lightweight choice for some scenarios. However, for server / cloud databases, there are a lot of alternatives you can use as a replacement like e.g. MySQL, MongoDB, or Cloud Firestore.

Bear in mind that, if you are looking to host your database in the cloud with apps running on small distributed devices (e.g. mobile apps, IoT apps, any apps on embedded devices etc.), there are some difficulties. Firstly, this will result in higher latency, i.e. slow response-rates. Secondly, the offline capabilities will be highly limited or absent. As a result, you might have to deal with increased networking costs, which is not only reflected in dollars, but also CO2 emissions. On top, it means all the data from all the different app users is stored in one central place. This means that any kind of data breach will affect all your and your usersā€™ data. Most importantly, you will likely be giving your cloud / database provider rights to that data. (Consider reading the general terms diligently). If you care about privacy and data ownership, you might therefore want to consider a local database option, as in an Edge Database. This way you can decide, possibly limit, what data you sync to a central instance (like the cloud or an on-premise server).

SQLite alternatives Comparison Matrix

To give you an overview, we have compiled a comparison table including SQLite and SQLite alternatives. In this matrix we look at databases that we believe are apt to run on edge devices. Our rule of thumb is the databasesā€™ ability to run on Raspberry Pi type size devices. If you’re reading this on mobile, click here to view the full matrix.

Edge Database Short description License / business model Android / iOS* Type of data stored Central Data Sync P2P Data Sync Offline Sync (Edge) Data level encryption Flutter / Dart support Vector Database (AI support) Minimum Footprint size Company
SQLite C programming library; probably still 90% market share in the small devices space (personal assumption) Public domain embedded on iOS and Android Relational No No No No, but option to use SQLCipher to encrypt SQLite Flutter plugins (ORMs) for SQLite, but nothing from Hwaci No, but various early & unofficial extensions are available < 1 MB Hwaci
Couchbase Mobile / Lite Embedded / portable database with P2P and central synchronization (sync) support;Ā pricing upon request;Ā some restrictions apply for the free version. Secure SSL. Partly proprietary, partly open-source, Couchbase Lite is BSL 1.1 Android / iOS JSON Documents / NoSQL db Yes Yes No Database encryption with SQLCipher (256-bit AES) Unofficial Flutter plugin for Couchbase Lite Community Edition No < 3,5 MB Couchbase
InterBase ToGo / IBLite Embeddable SQL database. Proprietary Android / iOS Relational No No No 256 bit AES strength encryption No No < 1 MB Embarcadero
LevelDB Portable lightweight key-value store, NoSQL, no index support; benchmarks from 2011 have been removed unfortunately New BSD Android / iOS Key-value pairs / NoSQL db No No No No Unofficial client that is very badly rated No < 1 MB LevelDB Team
LiteDB A .Net embedded NoSQL database MIT license Android / iOS (with Xamarin only) NoSQL document store, fully wirtten in .Net No No No Salted AES No No < 1 MB LiteDB team
Realm DBĀ  Embedded object databaseĀ  Apache 2.0 Android / iOS Object Database deprecated No deprecated Yes Yes No 5 MB+ Acquired by MongoDB in 2019, Data Sync deprecated in 2024; DB still available as open source, not maintained
ObjectBox NoSQLĀ Edge Vector DatabaseĀ withĀ out-of-the-box Data SyncĀ for Mobile and IoT; fully ACID compliant; benchmarks available as open source. Open Core (plus Apache 2.0Ā bindings) Android / iOS / Linux / Windows / any POSIX Object-oriented NoSQL edge database for high-performance on edge devices in Mobile and IoT Yes WIP Yes transport encryption; additional encryption upon request Yes First local vector database fo on-device Edge AI released May 2024 < 1 MB ObjectBox
Oracle Database Lite Portable with P2P and central sync support as well as support for sync with SQLite Proprietary Android / iOS Relational Yes Yes No 128-bit AES Standard encrytion No No < 1 MB Oracle Corporation
SQL Anywhere Embedded / portable database with central snyc support with a stationary database,Ā pricing now available here Proprietary Android / iOS Relational Yes, tied to using other SAP tech though (we believe) No No AES-FIPS cipher encryption for full database or selected tables No No Ā  SAP (originally Sybase)
UnQLite Portable lightweight embedded db; self-contained C library without dependency. 2-Clause BSD Android / iOS Key-value pairs / JSON store / NoSQL db No No No 128-bit or 256-bit AES standard encryption not yet; might be coming though; there was a 0.0.1 released some time ago No ~ 1.5 MB Symisc systems
extremeDB Embedded relational database Proprietary iOS In-memory relational DB, hybrid persistence No No No AES encryption No No < 1 MB McObject LLC
redis DB High-performance in-memory Key Value store with optional durability Three clause BSD license, RSAL and Proprietary No K/V in-memory store, typically used as cache No No No TLS/SSL-based encryption can be enabled for data in motion. Unofficial redis Dart client available No on-device vector database, but cloud vector support An empty instance uses ~ 3MB of memory redislabs (the original author of redis left in 2020)
Azure SQL EdgeĀ  Designed as a SQL database for the IoT edge; however, due to the footprint it is noĀ Edge Database Proprietary No Relational DB for IoT No No No will provide encryption No Not on-device 500 MB+ Microsoft

If you are interested in an indication of the diffusion rate of databases, check out the following database popularity ranking: http://db-engines.com/en/ran. If you are interested to learn more about SQLite, there is a great Podcast interview with Richard Hipp that is worthwhile listening to.

Is there anything weā€™ve missed? What do you agree and disagree with? Please share your thoughts with us via Twitter or email us on contact[at]objectbox.io.Ā 

Make sure to check out the ObjectBox Database & try out ObjectBox Sync. You can get started in minutes and itā€™s perfect if you are using an object-oriented programming language, as it empowers you to work with your objects within the database. More than 1,000,000 developers already use this Edge Database designed specifically for high performance on small, connected, embedded devices.

In-Memory Database Use Cases

In-Memory Database Use Cases

ObjectBox was a purely disk-based database until now. Today, we added in-memory storage as a non-persistent alternative. This enables additional use cases requiring temporary in-process data. Itā€™s also great for testing.Ā 

ObjectBox In-Memory Database

Disk + In-memory: simply use the best of both worlds

When opening a new database, you can now choose if the database is stored on disk or in-memory. Because this is a per database option, it is possible to use both types in your application. Itā€™s very simple to use: when opening the store, instead of providing an actual directory, provide an pseudo-directory as a string with the prefix ā€œmemory:ā€. After the prefix, you pick a name for the database to address it, e.g. ā€œmemory:myAppā€.

Note: in-memory databases are kept after closing a store; they have to be explicitly deleted or are automatically deleted if the creating process exists.

So, what are typical in-memory database use cases?

Caching and temporary data

If data is short lived, it may not make sense involving the disk with persistent storage. Unlike programming language containers like maps and hash tables, caches built on in-memory databases have advanced querying capabilities and support complex object graphs. For example, databases allow lookups by more than one key (e.g. ID, name and URL). Or deleting certain entries using a query. As ObjectBox is closely integrated with programming languages, putting and getting an object are typically just ā€œone linersā€ similar to map and hash table containers.

Bringing ā€œonline-onlyā€ and ā€œoffline-firstā€ apps closer together

Letā€™s say you want to start simple by creating an application that always fetches the data from the cloud. You can put that data in an in-memory database (similar to the caching approach above). The data is available (ā€œcachedā€) for all app components via a common Box-based API, which is already great. But letā€™s say later on, you want to go ā€œoffline-firstā€ with your app to respond quicker to user requests and save cloud and/or mobile networking operator (MNO) costs. Since you are already using the Box-based API, you simply ā€œturn on persistenceā€ by using a disk-based database instead.

Performance and app speed

Shouldnā€™t this be the first point in the list? Well, ObjectBox did already operate at ā€œin-memory speedā€ for mostly-read scenarios even though it used a disk-based approach. So, do not expect huge improvements for reads. Writes (Create, Update, Delete) are different though: to fully support ACID, a disk-based database must wait on the disk to fully complete the operation. Contrary to this, an in-memory database can immediately start the next transaction.

Diskless devices

Some small devices, e.g. sensors, may not have a disk or an accessible file system. This update makes it possible to run ObjectBox here too. This can be an interesting combination with ObjectBox Sync and automatically getting data from another device.

Testing

For example in unit tests, you can now spin up ObjectBox databases even faster than before, e.g. opening and closing a store in less than a millisecond.

ā€œTransactional memoryā€

In concurrent (multi-threaded) scenarios, you may want to provide transactional consistent views (or ā€œcheckpointsā€) of your data. Letā€™s say bringing the data from one consistent view to another is a rather complex operation involving the modification of several objects. In such cases locking may be a concern (complex or blocking), so having an in-memory database may be a nice alternative. It ā€œnaturallyā€ offers transactions and thus transactional safe view on data. Thus, you can always read consistent data without worrying about data being modified at the same time. Also, you never have to wait for a modifying thread to finish.

What’s next?

This is only our first version of our in-memory store. Consider it as an starting point for more to come:

  • Performance: to ship early, we made rather big performance tradeoffs. At this point, starting a new write transaction will copy all data internally, which of course is not great for performance. A future version will be a lot smarter than that.
  • Persistence: While this version is purely in-memory without persistence, we want to add persistence gradually. This will include a write-ahead-log (WAL) and snapshots. This constellation may become even preferable over the default disk-base store for some scenarios.

We are currently rolling out the in-memory feature to all language supported by ObjectBox:

Let us know your thoughts

ā¤ļø

Data Viewer for Objects – announcing ObjectBox Admin

Data Viewer for Objects – announcing ObjectBox Admin

ObjectBox Admin (Docker container) allows you to analyze ObjectBox databases that run on desktop and server machines. Releasing ObjectBox Admin as a standalone Docker image makes it possible to run Admin on a larger number of platforms.

ObjectBox Admin is available as a Linux x86_64 Docker image, which runs on all common platforms including Windows and macOS. We offer a convenience script (objectbox-admin.sh) but itā€™s also simple enough to run it via plain Docker. See the docs for details, or get started by following this short tutorial.

Data Browser

The ObjectBox Admin Web App comprises a menu on the left (Data, Schema, Status, GraphQL…) and the corresponding content pane on the right-hand side.

ObjectBox Admin Web App (Data, Schema, Status, GraphQL...)

The data browser provides a table of objects of a specific type. By clicking on the Type we can select an entity type for viewing its entity objects.

WebAdmin_Data_Type

Next to the type selection is a small filter icon (the dashed triangle right of the type selection).

When selected, a query editor pops up that allows to filter data by adding a Property/Operator/Value expression.

ObjectBox Admin Filtering

When finished, click the check mark, and the data table gets updated with an active filter.

Data Filter

At the bottom, you will find a download link that exports the objects of the currently viewed box in JSON format.

DownloadDataAsJson

Schema Browser

You can get a detailed list of elements that make up an object type in the “Schema” pane.

Schema pane

In accordance with the “Data” pane, you can click on Type to select the schema of a specific entity type of your database.

Status

Base level database and ObjectBox Admin information can be viewed on the “Status” pane.

Status pane

GraphQL

The Docker-version of ObjectBox Admin offers a pane to query the database using GraphQL.

GraphQL Data Browser

Vector Databases for Edge AI

Vector Databases for Edge AI

The intersection of AI and Edge Computing is where Edge AI happens – and it needs databases that support AI and can run on the edge (for lack of a better term ā€œEdge Vector Databasesā€, also refered to as On-device vector databases or local vector databases). Vector Databases are the databases for AI and are an important piece of the AI tech stack. Edge Databases are databases that can run on edge devices.

Edge Vector Databases are the basis for Edge AI

Ā Edge Vector Databases – the intersection of Edge Computing and AI needs a database

In 2023, the Edge Computing Market is estimated to be at $53B,[1] while the AI market is expected to reach a whopping $87B.[2] Both markets are expected to grow dramatically in the coming years with the two technologies enhancing each other. For many use cases it is advantageous, and oftentimes necessary, to use both Edge Computing and AI in conjunction. This is what is called Edge AI and Gartner prognosed its plateau within 2023-2024.[3] In fact, recently, Gartner named Edge AI as one of the breakthrough technologies of 2023 due to the growing demand for real-time AI solutions and the need for decentralized data processing.[4] The global Edge AI market size was valued at roughly $14.5B in 2022 with expected CAGRs of 20-30% from 2023 to 2030.[5] In this article we will take a closer look at the use of vector databases in Edge AI.

The AI market: AI model trainings vs. using trained AI models

To understand the AI market, it is important to distinguish between the AI model generation, initial training, and the use of these models.

AI model training

AI model training is the most resource-intensive and costly part of AI – and is considered hard.[6] The larger the model, the higher the costs and the more time it needs. Also with models getting larger, the training fails more often and must be restarted, adding to duration and costs. Initially, AI models were specifically trained for one task only. Now, however, a new type of AI models, namely, foundation models, have evolved. They are general-purpose models. This development was a main driver of the current AI boom we are seeing.

Foundation models

Foundation models (also base models) are large-scale AI systems trained on a vast quantity of data at scale in such a way that they can be used for a wide variety of tasks. Costs for training such large foundation AI models from scratch (GPT-4) are typically in the millions of USD and the expectation is that large model training costs will go up to 500 million USD by 2030.[7]Ā They can be ā€œfine-tunedā€ (trained) with specific data sets (on top of the already trained model) to adapt them to specific environments,[8]Ā e.g. GitHub Copilot is based on GPT-3.5 Turbo and was tweaked as well as additionally trained on the code from GitHub repos.[9] Fine-tuning is typically way less costly than training a specific one-task model from scratch (let alone the initial foundation model training). Examples of foundation models include the GPT-models from OpenAI, LLaMa, and Gemini. LLMs (Large Language Models) are basically a specific subset of foundation models.

Using trained AI models

Using trained foundation models (like GPT-4 or LLaMA), additionally tuning them with specific data sets, is what is empowering most current AI tools / apps and all the innovations we are seeing around that. Basically, at this moment, a handful of popular foundation models are empowering a whole and currently thriving ecosystem. As AI models use vector embeddings, using trained models can be supported and enhanced with a vector database.[10] Weā€™ll dive into this a bit more below.

šŸ“ˆ Side note on the market

From a market perspective, this means there is a relevant market entry barrier to training foundation models, and therefore it is reasonable to expect only few companies doing it (as opposed to many startups ;)). Yet, training new models is likely where the greatest innovation potential and most significant advancements lie. Given that the current AI market largely depends on large trained models (mainly foundation models) being deployed as free (often open source) models, there is a real risk of few tech giants owning the market later on, whereas the market that was built depending on those models starts stalling.[11] For example, OpenAI didnā€™t open source GPT-4 anymore, because they felt it could impede their business interests; Sam Altman even went as far as saying it was wrong to ever open it – likely they should change their name.[12]

At the same time, Sam Altman went lobbying around the world to regulate AI (model trainings) and enhance entry barriers. The most reasonable explanation I read is that they are trying to ensure corporate dominance, and likewise, the reason behind the movement to pause AI model training from other actors in the space is a move to avoid a monopoly by OpenAI and get meat in the game quickly.[13]

Vector Databases and their important role in AI

Many machine learning and deep learning algorithms, as well as the AI models described above, depend on vector embeddings / embeddings. The increasing demand for creating, storing, and managing these embeddings has led to the emergence of vector databases.Ā 

What are vector embeddings?

Vector embeddings represent (multimodal) data as n-dimensional vectors, meaning n-dimensional matrices, which comes down to a set of numbers. This is what makes them easy and efficient to compute with. The power of vector embeddings lies in their ability to capture the essence of the data they represent.

Vector embeddings basically are ā€œthe output of the process of learning embeddingsā€,Ā which is done by feeding raw input data, like texts, images, words, into a trained AI model.[14]

In this process, the input data is translated into a lower-dimensional space. The result is a set of n-dimensional vectors in an embedding space.

The process of embedding

Ā The process of embedding [15]

The embedding space is specific to the data on which the embeddings were trained, but it can be transferred to other tasks and domains via transfer learning. Two embeddings that are close together in the embedding space indicate that the data they represent is similar.[16]

Once generated, the vector embeddings can be stored for efficient retrieval and use by AI / ML apps.[17]Ā We visualized the process below.Ā Ā 

Vector Database initial preparation

Vector Database initial preparation [26]

How are embeddings generated?

Creating vector embeddings used to be a time-consuming process requiring domain experts and manual work. Today, however, there are many specialized models available for generating embeddings:

  • For text data, for example: Word2Vec, GloVe, and BERT. These models translate the semantic meaning of words and phrases into numerical form.
  • For image data, for example: VGG and Inception. These models capture visual characteristics of images and translate them into numerical form.

–> The Huggingface Model Hub offers many models that can create embeddings for different types of data. Anyone can do it with very little to no coding know-how.

But… how are embeddings generated?

There is no easy answer to this, at least I didn’t find one. If you really want to know, you will need to spend the time and effort to dig very deep. Today’s large language models were built over decades by many brilliant minds. They entail many fundamental concepts of converting different types of data (like words or images)Ā  into numerical representations. The following three fundamental concepts seem to be part of most LLMs and are worth having heard of:[18]

  • Encoding – Non-numerical, multimodal data needs to be transferred into numbers, so models can be created out of them.Ā 
  • Vectors – In order to store the encoded data and efficiently perform mathematical functions on them, encodings are stored as vectors (typically floating-point arrays).
  • Lookup matrices – Also known as lookup or hash tables; this table maps dataĀ  to quickly jump from numerical to word representations (and back) across large chunks of text.

From the database perspective, the primary interest is in working efficiently with the generated embeddings [19]. Once a piece of information (a sentence, a document, an image) is a vector embedding and stored in a database, itā€™s time to get creative.

Where are vector embeddings used?Ā 

One of the most common vector embedding applications is in recommendation systems and search engines, e.g., Google Search uses embeddings to match text to text and text to images; Snapchat uses them to “serve individual ads depending on the user and time; and Meta (Facebook) uses them for social search.[20] But use cases are endless, e.g. they can also be used in chatbots, fraud detection, predictive maintenance, and autonomous driving. The ability to convert complex data into numerical form opens up endless applications. Generative AI applications typically work with vector embeddings. All of these applications benefit from using a vector database to enhance speed and efficiency.Ā 

How are vector databases used?Ā 

While a vector database can sometimes return responses directly, it typically returns results in conjunction with an LLM as we have depicted below. The vector database improves the accuracy of the LLM responses by using domain specific data from the database. More specifically: The vector search will give you the most relevant data for a specific query to provide to the LLM.[21] In both cases, the vector database helps reduce the number of queries to the LLM, which are costly, and also speeds up response times.

The use of vector databases: In use / production []

Vector databases: In use / production [26]

Accordingly, apart from efficiently handling vectors (as a data type), nearest neighbour search (primarily Approximate Nearest Neighbour (ANN) Search) is the most important feature of a vector database. The ANN algorithm finds the most similar vectors quickly. Additional filtering optimizes the result further, making queries even more efficient. Using the retrieved vectors (for context) and the initial query, an LLM generates the response. Typically, the response will be stored as a response embedding in the database, so over time, the database can serve more questions directly and / or improve the accuracy of the answer even more. This is depicted in the image above.

Why to use a vector database?

Vector databases enhance the efficiency and accuracy of AI applications, especially of those applications that are heavy on similarity searches, e.g. recommendation systems, natural language processing, or computer vision. Vector databases are also essential for scaling AI applications as the efficiency and speed starts to matter more. Through this, vector databases also bring costs down and heighten the sustainability of the AI application. On top, developers benefit from the additional functionalities a database offers for managing data, especially its querying capabilities, making any AI application more adaptable and therefore future-proof.

why use a vector database?

Edge vector databases – giving AI an edge

Edge Computing in a gist

Since data is produced and used everywhere (decentralized), using cloud computing for storage and processing is inefficient, wasteful, and often impossible. To unlock the value of decentralized data and drive digitization, you need to compute on the edge of the network (i.e. locally, closer to where data is generated). Gartner emphasizes Edge Computing’s importance for digital transformation.[22] To fully utilize Edge Computing, we need edge-specific infrastructure technologies, or ā€œto make the edge as easy as the cloud for developers.ā€ Edge databases are one such core infrastructure software. They enable rapid implementation of edge solutions by providing fast local data persistence (on the edge) and the capability to control and direct decentralized data flows (on the edge as well as in conjunction with a cloud).Ā 

Edge computing and edge databases can unlock decentralized data’s full potential, drive digital transformation, and create a sustainable and efficient data management landscape.

What is Edge AI?

Edge AI is the implementation of AI applications on the edge of the network without using a cloud, meaning, the necessary AI computations are performed on the edge directly, where the data is produced, e.g. in the car (onboard AI) or on a mobile device or simply within a specific location like a shop floor. A local Edge AI enables making decisions reliably in milliseconds, also when offline, and way cheaper. At this moment, this is particularly interesting for mission-critical use cases, offline-scenarios, and applications with high data security / privacy requirements. To run AI models directly on the edge, they need to be optimized for edge devices. The good news is, there are several AI models optimized for small devices available, e.g. Googleā€™s Gecko, which is open source.[23] ā€œGecko is so lightweight that it can work on mobile devices and is fast enough for great interactive applications on-device, even when offline.”[24]Ā Edge AI applications benefit tremendously from using a local vector database; only few use cases could do without one.

Edge AI basically offers the advantages of Edge Computing, whereas the disadvantages are more specific to AI.

Advantages of Edge AI (vs. cloud) [25]
Disadvantages of Edge AI (vs. cloud) [25]
Edge AI is faster, can guarantee QoS requirements, and works offline Decentralized data access can be challenging and needs specific skills
Edge AI saves Internet bandwidth, cloud and networking costs (e.g. MNO costs) The initial setup for the ongoing training of decentralized Edge AIs is more complex and therefore costly (while the cloud setup is quick and easy)
Edge AI helps ensure data privacy, data security, data ownership Edge AI needs specific skills (entailing ā€œoldschool programming skillsā€) – dev talent is hard to attract and expensive to keepĀ 
Edge AI is more sustainable to run (less wasteful data traversal meaning less energy use, less costs for the energy and less CO2 emissions) The heterogeneity of the edge makes it difficult to develop solutions for a wide range of devices
Edge AI is a young market and therefore holds opportunities to capture market share and competitive advantages Edge AI is a young market and still lacks infrastructure software

Edge AI setup / architecture

There are generally two setups for Edge AI applications: Full edge setup, running the AI model on the edge devices directly, or a hybrid approach, using the cloud or a central server for the AI model.

Edge AI - setup / architecture options

Edge AI: general setup / architecture options

The edge / cloud approach has the advantage that the AI model (re-)training, enhancement will happen centrally, on the cloud without additional efforts. On the other hand, in the full edge setup, you have all the advantages of the edge (offline, cost-effective, private, ā€¦) and you can use the power of all edge devices, making it even more affordable and fast. However, the caveat lies in the challenge of organizing the learning. The individual local models diverge and you need to get these learnings distributed to all devices. This will be done in a ā€œglobal AI modelā€ (centrally).

Depending on the details, this can be done locally on a central server or in the cloud, or even on an edge device. Also, depending on the number of edge devices, connectivity, and need for speedy updates, the distribution can be organized in a more decentralized way, fully using the power of the edge. Once all devices are harmonized, any edge device could, if it has the capabilities from a hardware perspective, be the global model combining all updates and sending them out. This offers great advantages with regards to availability and resilience. You can find more about decentralized Edge AI setups under the term federated learning.

Edge AI - decentralized setup, using the full power of the edge

Edge AI – decentralized setup, using the full power of the edge

Summary: Vector Databases for the edge

According to our research and experience, no “Edge Vector Database” exists yet. The Edge Database market has always been limited to fewer players – and is certainly not as crowded as the central server / cloud database space. However, the cloud / server databases cannot be used on the edge (big to small just doesnā€™t work ;)), whereas Edge Databases can run anywhere and can sometimes be a good choice for a server / cloud setup.

Opinions differ on whether there is a need for specific, dedicated vector databases, or whether general databases will evolve to include vector support and become the go-to-solution. In any case, the vector database space is hot and has recently been added as a category on db-engines. Adding a new database category is a rare occurrence for db-engines, the established platform for databases. In any case, we can see that both types of databases are converging towards each other and we firmly believe that in the future, all databases will support vectors.

So, when it comes to Edge Databases, there are the same two options: Either someone will implement a dedicated edge vector database or Edge Databases will evolve to support vectors. Because so far we have seen neither, we have extended the ObjectBox Edge Database with vector support. And the huge advantage this brings is that ObjectBox already offers an excellent, highly efficient, and battle-tested out-of-the-box Sync that takes care of the ā€œhard stuffā€ of decentralized data management for developers. As a developer tool that is self-hosted and can be used on all kinds of edge devices, and certainly on premise, it offers companies the flexibility to implement a myriad of applications, reduce costs (especially cloud and networking costs), while not jeopardizing data ownership in any way.

References and Notes

  1. https://www.marketsandmarkets.com/Market-Reports/edge-computing-market-133384090.html
  2. https://www.forbes.com/advisor/business/ai-statistics/
  3. https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2022-gartner-hype-cycle
  4. https://www.gartner.com/en/articles/4-emerging-technologies-you-need-to-know-about
  5. https://www.grandviewresearch.com/industry-analysis/edge-ai-market-report
  6. https://www.technologyreview.com/2023/05/12/1072950/open-source-ai-google-openai-eleuther-meta/
  7. https://mpost.io/ai-model-training-costs-are-expected-to-rise-from-100-million-to-500-million-by-2030/
  8. https://en.wikipedia.org/wiki/Foundation_models
  9. https://en.wikipedia.org/wiki/GitHub_Copilot, https://github.com/features/preview/copilot-x
  10. While this is great for the current landscape of AI-driven tools and apps, and thus the consumers, these are incremental innovations and will not take the foundation of AI forwards. So, there is a rightful fear that big corporations will discontinue open sourcing advancements, once they feel protecting their business interests outweighs the benefits they have from open sourcing.
  11. https://www.technologyreview.com/2023/05/12/1072950/open-source-ai-google-openai-eleuther-meta/
  12. https://www.theverge.com/2023/3/15/23640180/openai-gpt-4-launch-closed-research-ilya-sutskever-interview
  13. https://analyticsindiamag.com/openai-has-stopped-caring-about-open-ai-altogether/
  14. An AI training model is the initial version that goes through the training process, while the AI model refers to the trained and optimized version that is ready for deployment and inference in real-world applications.
  15. Adapted from https://vickiboykis.com/what_are_embeddings/
  16. If you are interested in understanding distances in vector databases more, read this post; it explains the most important commonly used distance algorithms in a straightforward and highly understandable way.
  17. Note: Traditional databases can offer vector support too. Our guess is that there will be a consolidation of databases, with traditional databases moving towards vector support and vector databases moving towards supporting other (more traditional) database functionalities. In fact, we cannot imagine a future where any database will not support AI applications (as in: Support vectors, nearest neighbour search etc.).
  18. Based on: https://vickiboykis.com/what_are_embeddings/
  19. If you are interested in diving deeper, read this article, which is written in a highly understandable and comprehensive way, explaining everything you would need to know to develop a basic understanding.
  20. https://huggingface.co/blog/getting-started-with-embeddings
  21. https://hackernoon.com/how-llms-and-vector-search-have-revolutionized-building-ai-applications
  22. https://www.gartner.com/en/documents/4263499
  23. Though the merit of it being open source is unclear an ongoing discussion in the more legal oriented open source community.
  24. https://blog.google/technology/ai/google-palm-2-ai-large-language-model/
  25. https://builtin.com/artificial-intelligence/edge-ai, https://objectbox.io/what-is-edge-computing/, https://objectbox.io/what-is-an-edge-database-and-why-do-you-need-one/, https://www.marketsandmarkets.com/Market-Reports/edge-ai-software-market-70030817.html, https://www.run.ai/guides/machine-learning-operations/edge-ai#What-Are-the-Benefits-Of-Edge-AI, https://cambrian-ai.com/wp-content/uploads/edd/2023/07/Large-Language-Models-On-Edge-Publication-FINAL.pdf
  26. Adapted from https://www.swirlai.com/

Vector types (aka arrays) added with ObjectBox Java 3.6 release

Vector types (aka arrays) added with ObjectBox Java 3.6 release

Vector embeddings (multi-dimensional vectors) are a central building block for AI applications. And accordingly, the ability to store vectors to add long-term memory to your AI applications (e.g. via vector databases) is gaining importance. Sounds fancy, but for the basic use cases, this simply boils down to ā€œarrays of floatsā€ for developers. And this is exactly what ObjectBox database now supports natively. If you want to use vectors on the edge, e.g. in a mobile app or on an embedded device, when offline, independent from an Internet connection, removing the unknown latency, try it…

See the release notes for all new features this release brings.

Code Examples

Letā€™s start with a simple example: letā€™s assume some shapes that use a palette of RGB colors. An entity for this might look like this:

We can now create a query to find all shapes that use a certain color:

Another typical use case is the embedding of certain types of data, like text, audio or images, as vector coordinates. To store such a vector embedding, in the following example we store the floating point coordinates that were computed by a machine learning model for an image together with a reference to the actual image:

Ready to go?

To update to this release, change the version of objectbox-gradle-plugin to 3.6.0.

To add ObjectBox database to your JVM or Android project read our Getting Started guide.
As always, we look forward to your feedback on GitHub or via our anonymous feedback form and hope you have a great time building apps with ObjectBox! ā¤ļø

Green Coding: Developing Sustainable Software for a Greener Future

Green Coding: Developing Sustainable Software for a Greener Future

Digitization helps to save COā‚‚ ā€“ many experts agree on that. But things are not that simple, because the creation of software and its use contribute to greenhouse gas emissions too. All code creates a carbon footprint. Software development and use affect the environment from the energy consumed while running to the associated electronic device waste. Choosing a sustainable software architecture matters, but every developer also can make a difference by applying green coding principles.Ā 

This article will explore the importance of green software development and its main principles.

Green Software Development: Balancing Digitization and Environmental Sustainability

In this section, we’ll first define some important terms in the topic of environmentally conscious software development. Then, we’ll discuss why it is relevant and discussing the broader benefits of adopting green coding practices.

What does sustainability in software development mean?

In our view, sustainability in software development (also ā€œgreen software developmentā€) entails developing and maintaining software in a way that is not only environmentally, but also socially and economically responsible. So, what really counts is the long-term bottom-line value from a general societal perspective, not an “individual balance sheetā€.

There are many trade-offs in such an ambition, and therefore sustainable software development is rather a set of guiding principles than hands-on measures that are truly the same for everyone. Letā€™s dive a bit into how sustainable software development can contribute to all three aspects:

Environmental aspects

Since software is a significant source of direct greenhouse gas emissions, it is becoming more important to create software that reduces resource use as much as possible. As the world becomes more reliant on technology, energy consumption and carbon footprint of software will continue to grow. By adopting green software development practices, software developers can help to mitigate these environmental impacts.

earth-teal

Broader Economic contribution

If a software uses less energy and resources to accomplish the same tasks as another software, the users of that software can reduce their operating costs and improve their bottom line. Increasing the longevity of hardware (less wear, but also less hw requirements extending the usability of existing hw) also yields direct economic savings for the software users (companies as well as individuals). On a broader level, this compounds significantly over the number of users and with time and thus contributes to economic welfare. What sounds like a small contribution does add up tremendously in the endā€¦

Social impact

Sustainable software development includes responsibility for the social impact of the software created. As a result, sustainable software aims to be transparent, inclusive, and offer data sovereignty. By giving individuals and organizations greater control over their own data, software empowers them and protects their privacy. At the same time, it promotes greater accountability and transparency in data-driven decision-making.

Overall, sustainability in software development involves taking a holistic approach. On top, sustainable software companies take steps to minimize negative impacts and promote positive ones over the long term.

This is why it has been one of our core values since we started ObjectBox:

Be Sustainable in every respect ā€“ we apply sutainability to our technology, as well as the people and small every-day decisions. ObjectBox aims to be the most resourceful data management solution for connected devices. We strive to save resources (energy, COā‚‚, bandwidth, time, etc.), but also always choose the sustainable path (recycled paper, saving energy, etc.), andĀ support our employees to lead balanced and sustainable lives.

What is green coding / green software development?

Recently, the term “green coding” has emerged to describe the practice of creating and writing code (aka software) in a way that minimizes its environmental impact. This can involve using efficient code that consumes less energy, optimizing data usage, and reducing electronic waste.

What is the difference between Green IT and Green Coding?

Green IT is primarily about the hardware and the optimization of data centers. Today, it often actually is about optimizing cloud usage. The code decides whether this hardware is used efficiently. By contrast, green coding is about making the code more efficient, so that running the code (e.g. using an app on the smartphone, or using an email program) uses less resources and less electricity, thus producing less COā‚‚.Ā 

Why is it time for developers to prioritize environmental sustainability?

Various studies estimate the Carbon footprint of the digital economy to be between 2.3 – 3.7% percent of global COā‚‚ emissions šŸ˜± [1]. Although the impact of software on the environment may not yet be as dramatic as that of manufacturing, it keeps growing rapidly each year. By taking sustainable decisions in software development, we can make it part of the carbon solution of the future.Ā 

Every line of code – scaled up to hundreds, thousands, or even millions of devices (desktops, smartphones, tablets…) worldwide – has the potential to significantly reduce energy consumption and COā‚‚ emissions.

How to put sustainable software development into practice?

We believe two key aspect to develop sustainable software, that creates bottom-line value, are:

  • minimize the resource consumption of software especially during operation, where most resources are consumed – be dilligent about that; it compounds
  • keep data as much as possible where it is produced, used and belongs (e.g. with the end users) and avoid unnecessary data transferals, superfluous cloud use, and unnecessarily storing data in the cloud

Both measures have significant environmental, social, and economic impact, short- and long-term.

It’s time we as developers start thinking about our impact on the planet and make sustainability a part of our everyday coding mindset. We can make a difference by incorporating sustainability into every action and decision we take when developing software. Careful measuring and optimizing the resource along the way is also important. The welcome side effect: fast software that is cheap to run and fun to use šŸ™‚

For example, at ObjectBox, we’re all about maximizing the use of computing resources and minimizing resource waste of every line of code (LOC). This makes ObjectBox not only environmentally sustainable, but at the same time superfast, usable on low end devices w. little hw requirements, and cheap in operational costs šŸ¤Æ

šŸ’š Responsible development practices pay off in several respects and we really cannot see a huge tradeoff. All it costs is spending more time and brain on optimizations, benchmarking, and dilligently applying this approach to every line of code.

šŸ’š As a developer tool, our impact is broader than a developer’s impact on end-users. So, we’re committed to using resources efficiently and reducing waste at every stage of the game.

Guidelines to start making your code more sustainable

Some more tipps how to put sustainable software development into practice:

  • Energy efficiency: Developing software that is energy-efficient can help to reduce its environmental impact by minimizing the amount of energy required to run software.Ā 
  • Responsible sourcing: Using responsibly sourced hardware, software, and other materials can help to reduce the environmental impact of software development.
  • Longevity: Developing software that is designed to last can help to reduce waste and promote sustainability by reducing the need for frequent updates and replacements.
  • Accessibility: Making software accessible to a wide range of users can help to promote social sustainability by ensuring that everyone has access to the benefits of technology.
  • Data sovereignty, privacy and security: Protecting user data and maintaining strong cybersecurity measures can help to promote sustainability by preventing data breaches and other security incidents that can have negative social and economic impacts.

Examples of sustainable coding: More impactful than you would expect

1. How can a millisecond be worth 2 days?

Real world example: By reducing the resolution of images in a banking app with 500.000 users, whose users on average opened it daily, developers saved more than 2 days of total operational time (up time) [2].

Ā 2. How can 2 grams of COā‚‚ savings / hour be worth 330.000 t CO2?

Theoretical consideration: Netflix states that streaming its content produces 55 grams of COā‚‚ per hour [3]. This gives us 40 kilograms of COā‚‚ per year for daily streaming of two hours per person [4]. With Netflix users being 230M, a reduction would have an enormous scaling factor [5]. Assuming a Netflix developer reduces the 55 grams to 53 grams, you get 330 kt of COā‚‚ in potential savings. Note: This is a highly theoretical example, just to demonstrate the thinking.
Anyways: Individuals can’t save that much as easily. Thatā€™s the impact you as a programmer have!

3. How much COā‚‚ can local storage save in 1 million cars?

Sending and storing 1 GB of data in the cloud needs about 5 kWh of electricity, while local storage only needs about 0.000005 kWh, which is a million times lower. Making the switch to local storage in 1 Million cars would lead to saving 905 kg of COā‚‚ every second. If you want to know what that actually means, you can translate that into equivalents: CO2 equivalencies or the CO2 calculator

šŸ‘‰ These examples clearly illustrate the potential impact of shifting towards an environmentally conscious mindset when developing software. Now that we know the why, itā€™s time to discuss the how.

Sustainable Edge Data Managment w. ObjectBox – a ready-made developer tool

ObjectBox is a free Edge Database that can help reduce the environmental impact of apps. It is optimized for computing resource efficiency and empowers developers to store and use data locally and create offline-first apps. Unless the data is really needed in the cloud, this is way more energy-efficient and sustainable compared to a cloud setup. On top, it works independant from an Internet connection being available and is superfast while saving battery, making it an ideal choice for apps that prioritize sustainability.

What is an Edge Database?

An Edge Database is a type of database that is used on the “edge” of a network, closer to the data sources and devices generating data. Traditional databases, on the other hand, are usually set up in centralized data centers or in the cloud.

Edge databases are essential when devices need to work offline, guarantee response times, speed is of the essence, you have limited Internet connectivity, mission-critical scenarios, or when handling high-frequency data. By processing data locally on the edge, Edge Databases can reduce latency and improve performance while also reducing the amount of data transferred over the network.

Edge databases have a small footprint and are designed to run on restricted devices such as routers, IoT gateways, mobile phones, and other embedded systems. They typically incorporate features needed in distributed systems, such as data synchronization, caching, and offline support to ensure that data remains available even in the event of network outages or other disruptions.

ObjectBox Sync is a highly efficient and sustainable data synchronization solution. It reduces the amount of energy used by having as little overhead as possible when sending data combined with solid compression, avoiding data transformations, and only syncing data changes instead of sending all data to the cloud all the time. Developers have control over what data is synced when.

Overall, ObjectBox DB + Sync is a powerful tool for building fast apps that prioritize consuming less energy and saving device resources. By storing data locally and only syncing when and where needed, developers can ensure that their apps are as sustainable as possible, and save on cloud costs along the way.Ā