Can Small Language Models (SLMs) really do more with less? In this article, we discuss the unique strengths of SLMs, learn about the top SLMs, local vector databases, and how SLMs + local vector databases are shaping the future of AI,prioritizing privacy, immediacy, and sustainability.
Now, Small Language Models (SLMs) are stepping into the spotlight, enabling sophisticated AI to run directly on devices (local AI) like your phone, laptop, or even a smart home assistant. These models not only reduce costs and energy consumption but also bring the power of AI closer to the user, ensuring privacy and real-time performance.
What Are Small Language Models (SLMs)?
LLMs are designed to understand and generate human language. Small Language Models (SLMs) are compact versions of LLMs. So, the key difference between SLMs and LLMs is their size. While LLMs like GPT-4 are designed with hundreds of billions of parameters, SLMs use only a fraction of that. There is no strict definition of SLM vs. LLM yet. At this moment, SLM sizes can be as small as single-digit million parameters and go up to several billion parameters. Some authors suggest 8B parameters as the limit for SLMs. However, in our view that opens up the question if we need a definition for Tiny Language Models (TLMs)?
Advantages and disadvantages of SLM
According to Deloitte’s latest tech trends report, SLMs are gaining increasing importance in the AI landscape due to their cost-effectiveness, efficiency, and privacy advantages. Small Language Models (SLMs) bring a range of benefits, particularly for local AI applications, but they also come with trade-offs.
Benefits of SLMs
Privacy: By running on-device, SLMs keep sensitive information local, eliminating the need to send data to the cloud.
Offline Capabilities: Local AI powered by SLMs functions seamlessly without internet connectivity.
Speed: SLMs require less computational power, enabling faster inference and smoother performance.
Sustainability: With lower energy demands for both training and operation, SLMs are more environmentally friendly.
Accessibility: Affordable training and deployment, combined with minimal hardware requirements, make SLMs accessible to users and businesses of all sizes.
Limitations of SLMs
The main disadvantage is the flexibility and quality of SLM responses: SLMs typically cannot tackle the same broad range of tasks as LLMs in the same quality. However, in certain areas, they already match their larger counterparts. For example, Artificial Analysis AI Review 2024 highlights that GPT-4o-mini (July 2024) has a similar Quality Index to GPT-4 (March 2023), while being 100x cheaper in price.
Small Language Models vs LLMs
A recent study comparing various SLMs highlights the growing competitiveness of these models, demonstrating that in specific tasks, SLMs can achieve performance levels comparable to much larger models.
Overcoming limitations of SLMs
A combination of SLMs with local vector databases is a game-changer. With a local vector database, the variety of tasks and the quality of answers cannot only be enhanced but also for the areas that are actually relevant to the use case you are solving. E.g. you can add internal company knowledge, specific product manuals, or personal files to the SLM. In short, you can provide the SLM with context and additional knowledge that has not been part of its training via a local vector database. In this combination, an SLM can already today be as powerful as an LLM for your specific case and context (your tasks, your apps, your business). We’ll dive into this a bit more later.
In the following, we’ll have a look at the current landscape of SLMs – including the top SLMs – in a handy comparison matrix.
"The Gemma performs well on the Open LLM leaderboard. But if we compare Gemma-2b (2.51 B) with PHI-2 (2.7 B) on the same benchmarks, PHI-2 easily beats Gemma-2b."
iPhone 14: Phi-3-mini processing speed of 12 tokens per second. From the H2O Danube3 benchmarks you can see that the Phi-3 model shows top performance compared to similar size models, oftentimes beating the Danube3
OpenELM
270M, 450M, 1.1B, 3B
Apple
Apple License, but pretty much reads like you can do as much with it as a permissive oss license (of course not use their logo)
OpenELM 1.1 B shows 1.28% (Zero Shot Tasks), 2.36% (OpenLLM Leaderboard), and 1.72% (LLM360) higher accuracy compared to OLMo 1.2 B, while using 2× less pretraining data
"competitive performance compared to popular models of similar size across a wide variety of benchmarks including academic benchmarks, chat benchmarks, as well as fine-tuning benchmarks"
GPT-4o mini scores 82% on MMLU and currently outperforms GPT-4 on chat preferences in LMSYS leaderboard. GPT-4o mini surpasses GPT-3.5 Turbo and other small models on academic benchmarks across both textual intelligence and multimodal reasoning, and supports the same range of languages as GPT-4o
Smaller and faster variant of 1.5 Flash features half the price, twice the rate limits, and lower latency on small prompts compared to its forerunner. Nearly matches 1.5 Flash on many key benchmarks.
MMLU score of 69.4% and a Quality Index across evaluations of 53. Faster compared to average, with a output speed of 157.7 tokens per second. Low latency (0.37s TTFT), small context window (128k).
MMLU score 60.1%. Mistral 7B significantly outperforms Llama 2 13B on all metrics, and is on par with Llama 34B (since Llama 2 34B was not released, we report results on Llama 34B). It is also vastly superior in code and reasoning benchmarks. Was the best model for its size in autumn 2023.
Claimed (by Mistral) to be the world's best Edge models.
Ministral 3B has MMLU score of 58% and Quality index across evaluations of 51. Ministral 8B has MMLU score of 59% and Quality index across evaluations of 53.
Granite 3.0 8B Instruct matches leading similarly-sized open models on academic benchmarks while outperforming those peers on benchmarks for enterprise tasks and safety.
Quality Index across evaluations of 77, MMLU 85%, Supports a 16K token context window, ideal for long-text processing. Outperforms Phi3 and outperforms on many metrices or is comparable with Qwen 2.5 , and GPT-4o-mini
SLM Use Cases – best choice for running local AI
SLMs are perfect for on-device or local AI applications. On-device / local AI is needed in scenarios that involve hardware constraints, demand real-time or guaranteed response rates, require offline functionality or need to comply with strict data privacy and security needs. Here are some examples:
Mobile Applications: Chatbots or translation tools that work seamlessly on phones even when not connected to the internet.
IoT Devices: Voice assistants, smart appliances, and wearable tech running language models directly on the device.
Healthcare: Embedded in medical devices, SLMs allow patient data to be analyzed locally, preserving privacy while delivering real-time diagnostics.
Industrial Automation: SLMs process language on edge devices, increasing uptime and reducing latency in robotics and control systems.
By processing data locally, SLMs not only enhance privacy but also ensure reliable performance in environments where connectivity may be limited.
On-device Vector Databases and SLMs: A Perfect Match
Imagine a digital assistant on your phone that goes beyond generic answers, leveraging your company’s (and/or your personal) data to deliver precise, context-aware responses – without sharing this private data with any cloud or AI provider. This becomes possible when Small Language Models are paired with local vector databases. Using a technique called Retrieval-Augmented Generation (RAG), SLMs access the additional knowledge stored in the vector database, enabling them to provide personalized, up-to-date answers. Whether you’re troubleshooting a problem, exploring business insights, or staying informed on the latest developments, this combination ensures tailored and relevant responses.
Key Benefits of using a local tech stack with SLMs and a local vector database
Privacy. SLMs inherently provide privacy advantages by operating on-device, unlike larger models that rely on cloud infrastructure. To maintain this privacy advantage when integrating additional data, a local vector database is essential. ObjectBox is a leading example of a local database that ensures sensitive data remains local.
Personalization. Vector databases give you a way to enhance the capabilities of SLMs and adapt them to your needs. For instance, you can integrate internal company data or personal device information to offer highly contextualized outputs.
Quality. Using additional context-relevant knowledge reduces hallucinations and increases the quality of the responses.
Traceability. As long as you store your metadata alongside the vector embeddings, all the knowledge you use from the local vector database can give the sources.
Offline-capability. Deploying SLMs directly on edge devices removes the need for internet access, making them ideal for scenarios with limited or no connectivity.
Cost-Effectiveness. By retrieving and caching the most relevant data to enhance the response of the SLM, vector databases reduce the workload of the SLM, saving computational resources. This makes them ideal for edge devices, like smartphones, where power and computing resources are limited.
Use case: Combining SLMs and local Vector Databases in Robotics
Imagine a warehouse robot that organizes inventory, assists workers, and ensures smooth operations. By integrating SLMs with local vector databases, the robot can process natural language commands, retrieve relevant context, and adapt its actions in real time – all without relying on cloud-based systems.
For example:
A worker says, “Can you bring me the red toolbox from section B?”
The SLM processes the request and consults the vector database, which stores information about the warehouse layout, inventory locations, and specific task history.
Using this context, the robot identifies the correct toolbox, navigates to section B, and delivers it to the worker.
The future of AI is – literally – in our hands
AI is becoming more personal, efficient, and accessible, and Small Language Models are driving this transformation. By enabling sophisticated local AI, SLMs deliver privacy, speed, and adaptability in ways that larger models cannot. Combined with technologies like vector databases, they make it possible to provide affordable, tailored, real-time solutions without compromising data security. The future of AI is not just about doing more – it’s about doing more where it matters most: right in your hands.
Centralized data centers use a lot of energy and water, emit a lot of CO2, and generate a lot of electronic waste. In fact, cloud data centers are already responsible for around 300 Mt of CO2-eq greenhouse gas emissions [1]. And the energy consumption of data centers is increasing at an exponential rate [2].
This challenge is further compounded by the exploding demand for AI workloads. With AI adoption accelerating, the demand for data center capacity is projected to grow by over 20% annually, potentially reaching ~300 GW by 2030. Remarkably, 70% of this capacity will be dedicated to hosting AI workloads. Gartner predicts that without sustainable AI practices, AI alone could consume more energy than the human workforce by 2025, significantly undermining carbon-zero initiatives.
While more data centers are switching to green energy [3], this approach is not nearly enough to solve the problem. A more sustainable approach is to reduce unnecessary cloud traffic, central computation, and storage as much as possible by shifting computation to the edge. In our experience, just reducing data overhead and unnecessary data traversals can easily cut 60-90% of data traffic and thus significantly impact the CO2 footprint of an application, as well as costs.
Edge Computing stores and uses data on or near the device on which it was created. This reduces the amount of traffic sent to the cloud and, on a large scale, has a significant impact on energy consumption and carbon emissions.
Why do Digitization projects need to think about sustainability now?
Given the gravity of the climate crisis, every industry needs to assess its potential environmental impact and find ways to reduce its carbon footprint. The digital world, and its most valuable commodity, data, should not be any different. The digital transformation is ongoing and with it electronic devices and IT usage numbers are exploding. Thus, new apps must consider their carbon footprint throughout their lifecycle, especially resource use in operation and at scale [4].
Also, think about this: The share of global electricity used by data centers is already estimated to be around 1-1.5% [1] and data centers generate 2% of worldwide CO2 emissions (on par with the aviation industry) [5]. Recent analyses by Gardian suggests that the greenhouse gas emissions from the in-house data centers of major tech companies—Google, Microsoft, Meta, and Apple—are likely about 7.62 times higher than their official reports indicate. [6]. On top of this, providing and maintaining cloud infrastructure (manufacturing, shipping of hardware, buildings and lines) also consumes a huge amount of greenhouse gasses [7] and produces a lot of abnormal waste (e.g. toxic coolants) at the end of life [8].
Bearing that in mind, the growth rate for data center demand is concerning. The steady increase in data processing, storage, and traffic in the future, comes with a forecasted electricity consumption by data centers to grow by 10% a year [9]. In fact, estimations expect the communications industry to use 20% of all the world’s electricity by 2025 [10].
Shifting to green energy is a good step. However, a more effective and ultimately longer term solution requires looking at the current model of data storage, filtering, processing and transferal. By implementing Edge Computing, we can reduce the amount of useless and wasteful data traversing to and from the cloud as much as possible, thus reducing overall energy requirements in the long term. Of course, everyone can make a difference with their daily behavior and for developers that is especially true: Applying green coding principles helps producing applications that produce lower CO2 emissions over the whole app lifetime.
What is Edge Computing?
Until recently 90% of enterprise data was sent to the cloud, but this is changing rapidly. In fact, this number is dropping to only 25% by 2025,according to Gartner. By then, most of the data will be stored and used locally, on the device it was created on, e.g. on smartphones, cars, trains, machines, watches. This is Edge Computing, and it is an inherently decentralized computing paradigm (as opposed to the centralized cloud computing approach). Accordingly, every edge device needs the same technology stack (just in a much smaller format) as a cloud server. This means: An operating system, a data storage / persistence layer (database), a networking layer, security functionalities etc. that run efficiently on restricted hardware.
As you can only use the devices’ resources, which can be pretty limited, inefficient applications can push a device to its limits, leading to slow response rates, crashes, and battery drain.
EDGE DEVICE ARCHITECTURE
Edge Computing is much more than some simple data pre-processing, which takes advantage of only a small portion of the computing that is possible on the edge. AnEdge Database is a prerequisite for meaningful Edge Computing. With an Edge Database, data can be stored and processed on the devices directly (the so-called edge). Only useful data is sent to the server and saved there, reducing the networking traffic and computing power used in data centers tremendously, while also making use of the computing resources of devices which are already in use. This greatly reduces bandwidth and energy required by data centers. On top, Edge Computing also provides the flexibility to operate independently from an Internet connection, enables fast real time response rates, and cuts cloud costs.
Why is Edge Computing sustainable?
Edge Computing reduces network traffic and data center usage
With Edge Computing the amount of data traversing the network can be reduced greatly, freeing up bandwidth. Bandwidth is a measure of the quantity / size of data a network can transfer in a given time frame. Bandwidth is shared among users. Accordingly, the more data is supposed to be sent via the network at a given moment, the slower the network speed. Data on the edge is also much more likely to be useful and indeed used on the edge, in context of its environment. Instead of constantly sending data strems to the cloud, it therefore makes sense to work with the data on the edge and only send that data to the cloud that really is of use there (e.g. results, aggregated data etc.).
Edge computing is optimized for efficiency
Edge “data centers” are typically more efficient than cloud data centers. As described above, resources on edge devices are restricted. Therefore, and as opposed to cloud infrastructure, edge devices do not scale horizontally. That is one reason why every piece of the edge tech stack is – typically and ideally – highly optimized for resource efficiency. Any computing done more efficiently helps reduce energy consumption. Taking into account the huge number of devices already deployed , the worldwide impact of reducing resource use for the same operations is significant.
Edge Computing uses available hardware
There is a realm of edge devices already deployed that is currently underused. Many existing devices are capable of data persistence, and some even for fairly complex computing. When these devices – instead – send all of their data to the cloud, an opportunity is lost. Edge Computing enables companies to use existing hardware and infrastructure (retrofitting), taking advantage of the available computing power. If these devices continue to be underused, we will need to build bigger and bigger central data centers, simultaneously burdening existing network infrastructure and reducing bandwidth for senselessly sending everything to the cloud.
Cloud versus Edge: an Example
Today, many projects are built based on cloud computing. Especially in first prototypes or pilots, cloud computing offers an easy and fast start. However, with scale, cloud computing often becomes too slow, expensive, and unreliable. In a typical cloud setup, data is gathered on edge devices and forwarded to the cloud for computation and storage. Often a computed result is sent back. In this design, the edge devices are dumb devices that are dependent upon a working internet connection and a working cloud server; they do not have any intelligence or logic of their own. In a smart home cloud example, data would be sent from devices in the home, e.g. a thermostat, the door, the TV etc. to the cloud, where it is saved and used.
If the user would want to make changes via a cloud-based mobile app when in the house, the changes would be sent to the cloud, changed there and then from there be sent to the devices. When the Internet connection is down or the server is not working, the application will not work.
With Edge Computing, data stays where it is produced, used and where it belongs – without traversing the network unnecessarily. This way, cloud infrastructure needs are reduced in three ways: Firstly, less network traffic, secondly, less central storage and thirdly less computational power. Rather, edge computing makes use of all the capable hardware already deployed in the world. E.g. in a smart home, all the data could stay within the house and be used on site. Only the small part of the data truly needed accessible from anywhere would be synchronized to the cloud.
Take for example a thermostat in such a home setting: it might produce 1000s of temperature data points per minute. However, minimal changes typically do not matter and data updates aren’t necessary every millisecond. On top, you really do not need all this data in the cloud and accessible from anywhere.
With Edge Computing, this data can stay on the edge and be used within the smart home as needed. Edge Computing enables the smart home to work fast, efficiently, and autonomous from a working internet connection. In addition, the smart home owner can keep the private data to him/herself and is less vulnerable to hacker attacks.
How does ObjectBox make Edge Computing even more sustainable?
ObjectBox improves the sustainability of Edge Computing with high performance and efficiency: our 10X speed advantage translates into less use of CPU and battery / electricity. With ObjectBox, devices compute 10 times as much data with equivalent power. Due to the small size and efficiency, ObjectBox runs on restricted devices allowing application developers to utilize existing hardware longer and/or to do more instead of existing infrastructure / hardware.
ObjectBox’ Sync solution takes care of making data available where needed when needed. It allows synchronization in an offline setting and / or to the cloud. Based on efficient syncing principles, ObjectBox Sync aims to reduce unnecessary data traffic as much as possible and is therefore perfectly suited for efficient, useful, and sustainable Edge Computing. Even when syncing the same amount of data, ObjectBox Sync reduces the bandwidth needed and thus cloud networking usage, which incidentally reduces cloud costs.
Finally, ObjectBox’ Time Series feature, provides users an intuitive dashboard to see patterns behind the data, further helping users to track thousands of data points/second in real-time.
How Edge Computing enables new use cases that help make the world more sustainable
As mentioned above, there are a variety of IoT applications that help reduce waste of all kinds. These applications can have a huge impact on creating a more sustainable world, assuming the applications themselves are sustainable. Three powerful examples to demonstrate the huge impact IoT applications can have on the world:
Reducing Food Waste
From farm to kitchen, IoT applications can help to reduce food waste across the food chain. Sensors used to monitor the cold chain, from field to supermarket, can ensure that food maintains a certain temperature, thus guaranteeing that products remain food safe and fresh longer, reducing food waste. In addition, local storage can be used to power apps that fight household waste (you can learn how to build a food sharing app yourself in Flutter with this tutorial).
Smart City Lighting
Smart City Lighting: Chicago has implemented a system which allows them to save approx. 10 million USD / year and London estimates it can save up to 70% of current electricity use and costs as well as maintenance costs through smart public lighting systems [10].
Reducing Water Waste
Many homes and commercial building landscapes are still watered manually or on a set schedule. This is an inexact method of watering, which does not take into account weather, soil moistness, or the water levels needed by the plant. Using smart IoT water management solutions, landscape irrigation can be reduced, saving water and improving landscape health.
These positive effects are all the more powerful when the applications themselves are sustainable.
Sustainable digitization needs an edge
The benefits of cloud computing are broad and powerful, however there are costs to this technology. A combination of green data centers and Edge Computing helps to resolve these often unseen costs. With Edge Computing we can reduce the unnecessary use of bandwidth and server capacity (which comes down to infrastructure, electricity and physical space) while simultaneously taking advantage of underused device resources. Also with AI growing in popularity, Edge Computing will become very relevant for sustainable AI applications. AI applications are very resource intensive and Edge AI will help to distribute workloads in a resourceful manner, lowering the resource-use. One example of this is an efficient local vector database. ObjectBox amplifies these benefits, with high performance on small devices and efficient data synchronization – making edge computing an even more sustainable solution.
After 6 years and 21 incremental “zero dot” releases, we are excited to announce the first major release of ObjectBox, the high-performance embedded database for C++ and C. As a faster alternative to SQLite, ObjectBox delivers more than just speed – it’s object-oriented, highly efficient, and offers advanced features like data synchronization and vector search. It is the perfect choice for on-device databases, especially in resource-constrained environments or in cases with real-time requirements.
What is ObjectBox?
ObjectBox is a free embedded database designed for object persistence. With “object” referring to instances of C++ structs or classes, it is built for objects from scratch with zero overhead — no SQL or ORM layer is involved, resulting in outstanding object performance.
The ObjectBox C++ database offers advanced features, such as relations and ACID transactions, to ensure data consistency at all times. Store your data privately on-device across a wide range of hardware, from low-profile ARM platforms and mobile devices to high-speed servers. It’s a great fit for edge devices, iOS or Android apps, and server backends. Plus, ObjectBox is multi-platform (any POSIX will do, e.g. iOS, Android, Linux, Windows, or QNX) and multi-language: e.g., on mobile, you can work with Kotlin, Java or Swift objects. This cross-platform compatibility is no coincidence, as ObjectBox Sync will seamlessly synchronize data across devices and platforms.
Why should C and C++ Developers care?
ObjectBox deeply integrates with C and C++. Persisting C or C++ structs is as simple as a single line of code, with no need to interact with unfamiliar database APIs that disrupt the natural flow of C++. There’s also no data transformation (e.g. SQL, rows & columns) required, and interacting with the database feels seamless and intuitive.
As a C or C++ developer, you likely value performance. ObjectBox delivers exceptional speed (at least we haven’t tested against a faster DB yet). Having several 100,000s CRUD operations per second on commodity hardware is no sweat. Our unique advantage is that, if you want to, you can read raw objects from “mmapped” memory (directly from disk!). This offers true “zero copy” data access without any throttling layers between you and the data.
Finally, CMake support makes integration straightforward, starting with FetchContent support so you can easily get the library. But there’s more: we offer code generation for entity structs, which takes only a single CMake command.
“ObjectBox++”: A quick Walk-Through
Once ObjectBox is set up for CMake, the first step is to define the data model using FlatBuffers schema files. FlatBuffers is a building block within ObjectBox and is also widely used in the industry. For those familiar with Protocol Buffers, FlatBuffers are its parser-less (i.e., faster) cousin. Here’s an example of a “Task” entity defined in a file named “task.fbs”:
1
2
3
4
tableTask{
id:ulong;
text:string;
}
And with that file, you can generate code using the following CMake command:
Among other things, code generation creates a C++ struct for Task data, which is used to interact with the ObjectBox API. The struct is a straightforward C++ representation of the data model:
1
2
3
4
structTask{
obx_id id;// uint64_t
std::stringtext;
};
The code generation also provides some internal “glue code” including the method create_obx_model() that defines the data model internally. With this, you can open the store and insert a task object in just three lines of code:
1
2
3
obx::Store store(create_obx_model());// Create the database
obx::Box<Task>box(store);// Main API for a type
obx_id id=box.put({.text="Buy milk"});// Object is persisted
And that’s all it takes to get a database running in C++. This snippet essentially covers the basics of the getting started guide and this example project on GitHub.
Vector Embeddings for C++ AI Applications
Even if you don’t have an immediate use case, ObjectBox is fully equipped for vectors and AI applications. As a “vector database,” ObjectBox is ready for use in high-dimensional vector similarity searches, employing the HNSW algorithm for highly scalable performance beyond millions of vectors.
Vectors can represent semantics within a context (e.g. objects in a picture) or even documents and paragraphs to “capture” their meaning. This is typically used for RAG (Retrieval-Augmented Generation) applications that interact with LLMs. Basically, RAG allows AI to work with specific data, e.g. documents of a department or company and thus individualizes the created content.
To quickly illustrate vector search, imagine a database of cities including their location as a 2-dimensional vector. To enable nearest neighbor search, all you need to do is to define a HNSW index on the location property, which enables the nearestNeighbors query condition used like this:
This release marks an important milestone for ObjectBox, delivering significant improvements in speed, usability, and features. We’re excited to see how these enhancements will help you create even better, feature-rich applications.
As always, we’re here to listen to your feedback and are committed to continually evolving ObjectBox to meet your needs. Don’t hesitate to reach out to us at any time.
P.S. Are you looking for a new job? We have a vacant C++ position to build the future of ObjectBox with us. We are looking forward to receiving your application! 🙂
What is Edge AI?Edge AI (also: “on-device AI”, “local AI”) brings artificial intelligence to applications at the network’s edge, such as mobile devices, IoT, and other embedded systems like, e.g., interactive kiosks. Edge AI combines AI with Edge Computing, a decentralized paradigm designed to bring computing as close as possible to where data is generated and utilized.
What is Cloud AI? As opposed to this, cloud AI refers to an architecture where applications rely on data and AI models hosted on distant cloud infrastructure. The cloud offers extensive storage and processing power.
An Edge for Edge AI: The Cloud
Example: Edge-Cloud AI setup with a secure, two-way Data Sync architecture
Today, there is a broad spectrum of application architectures combining Edge Computing and Cloud Computing, and the same applies to AI. For example, “Apple Intelligence” performs many AI tasks directly on the phone (on-device AI) while sending more complex requests to a private, secure cloud. This approach combines the best of both worlds – with the cloud giving an edge to the local AI rather than the other way around. Let’s have a look at the advantages on-device AI brings to the table.
Faster Response Rates. Processing data locally cuts down travel time for data, speeding up responses.
Increased Availability. On-device processing makes apps fully offline-capable. Operations can continue smoothly during internet or data center disruptions.
Sustainability/costs. Keeping data where it is produced and used minimizes data transfers, cutting networking costs and reducing energy consumption—and with it, CO2 emissions.
Challenges of Local AI on the Edge
Data Storage and Processing: Local AI requires an on-device database that runs on a wide variety of edge devices (Mobile,IoT, Embedded) and performs complex tasks such as vector search locally on the device with minimal resource consumption.
Data Sync: It’s vital to keep data consistent across devices, necessitating robust bi-directional Data Sync solutions. Implementing such a solution oneself requires specialized tech talent, is non-trivial and time-consuming, and will be an ongoing maintenance factor.
Small Language Models:Small Language Models (SLMs) like Phi-2 (Microsoft Research), TinyStories (HuggingFace), and Mini-Giants (arXiv) are efficient and resource-friendly but often need enhancement with local vector databases for better response accuracy. An on-device vector database allows on-device semantic search with private, contextual information, reducing latency while enabling faster and more relevant outputs. For complex queries requiring larger models, a database that works both on-device and in the cloud (or a large on-premise server) is perfect for scalability and flexibility in on-device AI applications.
On-device AI Use Cases
On-device AI is revolutionizing numerous sectors by enabling real-time data processing wherever and whenever it’s needed. It enhances security systems, improves customer experiences in retail, supports predictive maintenance in industrial environments, and facilitates immediate medical diagnostics. On-device AI is essential for personalizing in-car experiences, delivering reliable remote medical care, and powering personal AI assistants on mobile devices—always keeping user privacy intact.
Personalized In-Car Experience: Features like climate control, lighting, and entertainment can be adjusted dynamically in vehicles based on real-time inputs and user habits, improving comfort and satisfaction. Recent studies, such as one by MHP, emphasize the increasing consumer demand for these AI-enabled features. This demand is driven by a desire for smarter, more responsive vehicle technology.
Remote Care: In healthcare, on-device AI enables on-device data processing that’s crucial for swift diagnostics and treatment. This secure, offline-capable technology aligns with health regulations like HIPAA and boosts emergency response speeds and patient care quality.
Personal AI Assistants: Today’s personal AI assistants often depend on the cloud, raising privacy issues. However, some companies, including Apple, are shifting towards on-device processing for basic tasks and secure, anonymized cloud processing for more complex functions, enhancing user privacy.
ObjectBox for On-Device AI – an edge for everyone
The continuum from Edge to Cloud
ObjectBox supports AI applications from Edge to cloud. It stands out as the first on-device vector database, enabling powerful Edge AI on mobile, IoT, and other embedded devices with minimal hardware needs. It works offline and supports efficient, private AI applications with a seamless bi-directional Data Sync solution, completely on-premise, and optional integration with MongoDB for enhanced backend features and cloud AI.
Interested in extending your AI to the edge? Let’s connect to explore how we can transform your applications.
In today’s fast-paced, decentralized world valuable data is generated by everything, everywhere, and all at once. To harness the vast opportunities offered by this data for data-driven organizations and AI applications, you need to be able to access the data and seamlessly distribute it to when and where it’s needed.
The key to achieving this lies in efficient, offline-first on-device data storage, reliable bi-directional data sync, and a scalable data management backend in the cloud. In other words, you need the infrastructure to manage data flows bi-directionally to tap into fresh data throughout your organization, processes, and applications at the right time.
Together, MongoDB and ObjectBox provide developers with a robust solution to empower seamless workload and data flows on the edge and from the edge to the cloud. ObjectBox seamlessly syncs data bi-directionally across devices even without Internet and syncs back to the cloud and MongoDB when connected. With ObjectBox devices stay in sync also in environments with intermittent connectivity, high latency, or flaky networks. Capture and unlock the value of all your data, anytime, anywhere, without relying on a constant Internet connection, with MongoDB + ObjectBox.
Seamless Offline-First Data Sync for Edge Devices
Maintaining service continuity is essential, even when devices are offline. Your customers, users, operations, and employees need to be able to rely on essential data at all times. That’s where ObjectBox comes in. It comprises of two key components: the ObjectBox Database and ObjectBox Data Sync.
The ObjectBox Database is a lightweight, on-device solution that is highly resource-efficient and fast on restricted hardware like mobile, IoT, and embedded devices, and even in the cloud.
ObjectBox Data Sync enables seamless bi-directional data synchronization between devices. By handling only incremental changes in a compressed binary format, ObjectBox Sync ensures minimal data transfer, automatic conflict resolution, and a seamless user experience even in fluctuating network conditions. This approach effectively simplifies the development process by offering complex sync logic via easy native-language APIs, allowing developers to focus on core app functionality.
Once a connection is available, ObjectBox Data Sync instantly synchronizes changes with MongoDB, providing real-time, bi-directional data sync between edge devices and MongoDB’s robust cloud backend.
The Benefits of Offline-First and Real-Time Data Sync with MongoDB and ObjectBox:
Resource-efficiency & Highspeed: ObjectBox excels at consuming minimal computational resources (CPU, power, memory, …) while delivering data persistence speed that is typically on-par with in-memory caches for read operations.
Offline-First Operation: Ensure continuous app performance, even with no internet connection. ObjectBox stores and syncs data bi-directionally on the edge and additionally with MongoDB once connected.
Real-Time Data Sync: Get reliable, bi-directional data synchronization across devices and MongoDB, enabling real-time updates and data consistency.
Scalable Edge: Easily handle 100k operations / second on a single device. Host the Sync server on any edge device (like a phone) and easily handle 3M clients with a three-node cluster.
Scalable Cloud Backend: With MongoDB, businesses can scale their applications to handle growing data and performance demands, seamlessly syncing data between millions of devices and the cloud.
Flexible Setup Scenarios: Tailor Data Sync to Your Needs
ObjectBox and MongoDB offer flexible setup scenarios to meet different application needs. The two main setup options are the central sync and the edge sync setup.
The Central Sync Setup syncs data between edge devices and MongoDB in the cloud, providing centralized data management while retaining offline-first functionality. The ObjectBox Sync Server runs in the cloud or on-premise.
The Edge Sync Setup allows devices to operate and sync data efficiently offline between ObjectBox instances within an edge, e.g. within one location, or within a car. When reconnected, changes are synchronized back to MongoDB making it ideal for environments with intermittent connectivity or distributed devices that need to function independently while syncing back to the cloud when possible.
This structure offers a flexible approach to integrating edge and cloud systems, empowering organizations to choose the setup that best fits their specific use case. More details.
Use Cases for MongoDB + ObjectBox :
Data-Driven Organizations: In a data-driven organization, every decision relies on access to relevant, up-to-date data. ObjectBox enables real-time data collection and synchronization from edge devices, ensuring access to critical data, even when devices are intermittently connected. This streamlines operations, improves decision-making, and enhances analysis across distributed teams and IoT systems. With MongoDB’s scalable cloud infrastructure, decentralized data integrates seamlessly with the cloud backend for efficient management.
Point-of-Sale (PoS) & Retail Edge Computing: A seamless customer experience and the ability to keep selling and never lose a transaction, even during internet outages, are essential for PoS systems / in retail. ObjectBox enables offline-first data storage and syncing for PoS systems, allowing transactions to be processed locally, even without internet connectivity. When connectivity returns, ObjectBox syncs transaction data back to MongoDB in real time, ensuring data consistency across multiple locations. Retailers can then leverage MongoDB’s analytics to gain insights into customer behavior and optimize inventory management.
Software-Defined Vehicle (SDV) & Connected Cars: Modern vehicles generate vast amounts of data from sensors and onboard systems. ObjectBox enables efficient on-device storage and processing, providing real-time access to data for navigation, diagnostics, and infotainment systems. ObjectBox Data Sync ensures that local data is synced back to MongoDB when connectivity is available, supporting centralized analytics, fleet management, and predictive maintenance, optimizing performance and safety while enhancing the user experience.
Manufacturing & Smart Shopfloor Apps: In smart factories, machines and sensors continuously generate data that must be analyzed and processed in real time. ObjectBox enables local data storage and fast data sync on-premise without the necessity for an Internet connection, ensuring that critical systems that are not connected to the Internet can run smoothly on-site. With a connected instance, ObjectBox takes care of synchronizing this data with the cloud and MongoDB for further analysis and central dashboards.
AI-Applications with On-device Vector Search: ObjectBox is the first and only on-device vector database, empowering developers to run AI locally on mobile, IoT, embedded, and other commodity devices (Edge AI). In combination with a Small Language Model (SLM), this allows developers to build local AI applications (e.g. RAG, genAI) that run directly on the device—without needing a cloud connection. By syncing with MongoDB, businesses can combine the power of on-device AI with centralized cloud data for even greater insights and performance. This is especially beneficial in scenarios requiring real-time decision-making, such as personalized customer experiences and predictive maintenance.
In today’s data-driven world, a data-first strategy requires seamless integration between edge and cloud data management. The combination of MongoDB and ObjectBox unlocks the full potential of your data. MongoDB’s powerful cloud platform, together with ObjectBox’s efficient on-device database and offline-first capabilities, is ideal for capturing the value of your data from anywhere, including distributed edge devices where valuable data is generated all the time. This partnership empowers businesses to seamlessly handle decentralized data, enabling fast and reliable operations at the edge while syncing back to the cloud for centralized management. Whether on IoT devices, mobile, embedded systems, or commodity hardware, ObjectBox and MongoDB ensure optimal performance everywhere. From remote areas to bad networks, our joint solution keeps data flowing reliably between the edge and the MongoDB backend, even when connectivity or nodes are lost.
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.