First on-device Vector Database (aka Semantic Index) for iOS

First on-device Vector Database (aka Semantic Index) for iOS

Easily empower your iOS and macOS apps with fast, private, and sustainable AI features. All you need is a Small Language Model (SLM; aka “small LLM”) and ObjectBox – our on-device vector database built for Swift apps. This gives you a local semantic index for fast on-device AI features like RAG or GenAI that run without an internet connection and keep data private.

The recently demonstrated “Apple Intelligence” features are precisely that: a combination of on-device AI models and a vector database (semantic index). Now, ObjectBox Swift enables you to add the same kind of AI features easily and quickly to your iOS apps right now.


Not developing with Swift? We also have a Flutter / Dart binding (works on iOS, Android, desktop), a Java / Kotlin binding (works on Android and JVM), or one in C++ for embedded devices.

Enabling Advanced AI Anywhere, Anytime

Typical AI apps use data (e.g. user-specific data, or company-specific data) and multiple queries to enhance and personalize the quality of the model’s response and perform complex tasks. And now, for the very first time, with the release of ObjectBox 4.0, this will be possible locally on restricted devices.

Local AI Tech Stack Example for on-device RAG

Swift on-device Vector Database and search for iOS and MacOS

With the ObjectBox Swift 4.0 release, it is possible to create a scalable vector index on floating point vector properties. It’s a very special index that uses an algorithm called HNSW. It’s scalable because it can find relevant data within millions of entries in a matter of milliseconds.
Let’s pick up the cities example from our vector search documentation. Here, we use cities with a location vector and want to find the closest cities (a proximity search). The Swift class for the City entity shows how to define an HNSW index on the location:

Inserting City objects with a float vector and HNSW index works as usual, the indexing happens behind the scenes:

To then find cities closest to a location, we do a nearest neighbor search using the new query condition and “find with scores” methods. The nearest neighbor condition accepts a query vector, e.g. the coordinates of Madrid, and a count to limit the number of results of the nearest neighbor search, here we want at max 2 cities. The find with score methods are like a regular find, but in addition return a score. This score is the distance of each result to the query vector. In our case, it is the distance of each city to Madrid.

The ObjectBox on-device vector database empowers AI models to seamlessly interact with user-specific data — like texts and images — directly on the device, without relying on an internet connection. With ObjectBox, data never needs to leave the device, ensuring data privacy.

Thus, it’s the perfect solution for developers looking to create smarter apps that are efficient and reliable in any environment. It enhances everything from personalized banking apps to robust automotive systems.

ObjectBox: Optimized for Resource Efficiency

At ObjectBox, we specialize on efficiency that comes from optimized code. Our hearts beat for creating highly efficient and capable software that outperforms alternatives on small and big hardware. ObjectBox maximizes speed while minimizing resource use, extending battery life, and reducing CO2 emissions.

With this expertise, we took a unique approach to vector search. The result is not only a vector database that runs efficiently on constrained devices but also one that outperforms server-side vector databases (see first benchmark results; on-device benchmarks coming soon). We believe this is a significant achievement, especially considering that ObjectBox still upholds full ACID properties (guaranteeing data integrity).

Cloud/server vector databases vs. On-device/Edge vector databases

Also, keep in mind that ObjectBox is a fully capable database. It allows you to store complex data objects along with vectors. Thus, you have the full feature set of a database at hand. It empowers hybrid search, traceability, and powerful queries.

Use Cases / App ideas

ObjectBox can be used for a million different things, from empowering generative AI features in mobile apps to predictive maintenance on ECUs in cars to AI-enhanced games. For iOS apps, we expect to see the following on-device AI use cases very soon:

  • Across all categories we’ll see Chat-with-files apps:
    • Travel: Imagine chatting to your favorite travel guide offline, anytime, anywhere. No need to carry bulky paper books, or scroll through a long PDF on your mobile.
    • Research: Picture yourself chatting with all the research papers in your field. Easily compare studies and findings, and quickly locate original quotes.
  • Lifestyle:
    • Health: Apps offering personalized recommendations based on scientific research, your preferences, habits, and individual health data. This includes data tracked from your device, lab results, and doctoral diagnosis.  
  • Productivity: Personal assistants for all areas of life.
    • Family Management: Interact with assistants tailored to specific roles. Imagine a parent’s assistant that monitors school channels, chat groups, emails, and calendars. Its goal is to automatically add events like school plays, remind you about forgotten gym bags, and even suggest birthday gifts for your child’s friends.
    • Professional Assistants: Imagine being a busy sales rep on the go, juggling appointments and travel. A powerful on-device sales assistant can do more than just automation. It can prepare contextual and personalized follow-ups instantly. For example, by summarizing talking points, attaching relevant company documents, and even suggesting who to CC in your emails.
  • Educational:
    • Educational apps featuring “chat-with-your-files” functionality for learning materials and research papers. But going beyond that, they generate quizzes and practice questions to help people solidify knowledge.

Run the local AI Stack with a Language Model (SLM, LLM)

Recent Small Language Models (SMLs) already demonstrate impressive capabilities while being small enough to run on e.g. mobile phones. To run the model on-device of an iPhone or a macOS computer, you need a model runtime. On Apple Silicone the best choice in terms of performance typically MLX – a framework brought to you by Apple machine learning research. It supports the hardware very efficiently by supporting CPU/GPU and unified memory.

To summarize, you need these three components to run on-device AI with an semantic index:

  • ObjectBox: vector database for the semantic index
  • Models: choose an embedding model and a language model to matching your requirements
  • MLX as the model runtime

Start building next generation on-device AI apps today! Head over to our vector search documentation and Swift documentation for details.

Retrieval Augmented Generation (RAG) with vector databases: Expanding AI Capabilities

Retrieval Augmented Generation (RAG) with vector databases: Expanding AI Capabilities

What is RAG?

Retrieval Augmented Generation (RAG) is a technique to enhance the intelligence of large language models (LLMs) with additional knowledge, such as reliable facts from specific sources, private or personal information not available to others, or just fresh news to improve their answers. Typically, the additional knowledge is provided to the model from a vector database. For example, you can add internal data from your company, the latest news or the data from your personal devices to get responses that use your context. It can truly help you like an expert instead of giving generalized answers. This technique also reduces hallucinations. 

Why RAG?

Let’s take a look at the key benefits that RAG in general offers:

  • Customization and Adaptation: RAG helps LLMs to tailor responses to specific domains or use cases by using vector databases to store and retrieve domain-specific information. It turns general intelligence into expert intelligence.
  • Contextual Relevance: By incorporating information retrieved from a large corpus of text, RAG models can generate contextually relevant responses. It improves the quality of generated responses compared to traditional generation models.
  • Accuracy and diversity: Incorporation of external information also helps to generate more informative and accurate responses and keep LLM up-to-date. This also helps to avoid repetitive or generic responses and allows for more diverse and interesting conversations.
  • Cost-effective implementation: RAG requires less task-specific training data compared to fine-tuning the foundation models. When we compare retrieval augmented generation vs fine-tuning, RAG’s ability to use external knowledge stands out. While fine-tuning requires lots of labeled data, RAG can rely on external sources. This can be particularly beneficial in scenarios where annotated training data is limited or expensive to obtain, thus, providing a cost-effective implementation. 
  • Transparency: RAG models provide transparency in their responses by explicitly indicating the source of retrieved information. This allows users to understand how the model arrived at its response and helps enhance trust in the generated output.

Therefore, RAG is suitable for applications where access to a vast amount of specialized data is necessary. For example, a customer support bot that pulls details from FAQs and generates coherent, conversational responses. Another example is an email drafting tool that fetches information about recent meetings and generates a personalized summary.

How retrieval augmented generation works

Let’s discuss the mechanics of how RAG operates with databases, covering its main stages from dataset creation to response generation (see figure).

This image has an empty alt attribute; its file name is RAG.png
Retrieval augmented generation diagram


  • DB creation: Creation of external dataset

Before the real use, the vector database should be created. The new data, that lies outside the training dataset of LLM, should be identified and added to the dataset (e.g. up-to-date information or specific information). This dataset is then transferred into vector embeddings via an AI model (embedding language models) and is stored in the vector database. 

  • DB in use: Retrieval of relevant information
    Once a query comes in, it is also transferred into a vector / embedding. It is used then to retrieve the most relevant result from the database. To achieve this, RAG uses semantic search techniques also known as vector search to understand the user’s query and/or context, retrieving contextually relevant information from a large dataset. Vector search goes beyond keyword matching and focuses on semantic relationships, improving the quality of the retrieved information and the overall performance of the RAG system in generating contextually relevant responses. 
  • DB in use: Augmentation
    At this stage, the user’s query is augmented by adding the relevant data retrieved in the previous stage. Often, only the top responses from vector search are considered as relevant data. Many databases have additional filtering techniques in place here.
  • Generation
    The augmented query is sent to the LLM to generate an accurate answer.

The Role of Long Context Windows

The rise of the new LLMs with long (1+million tokens) context windows, like Gemini 1.5, raised the discussion on whether long context windows will replace RAG. A long context window enables users to directly incorporate huge amounts of data into a query. Thus, it increases context to the LLM to improve its efficiency. 

Long context length and RAG have pros and cons, and neither will kill the other. Rather than being mutually exclusive, large context windows and RAG can be complemented. Large context windows can enhance RAG applications by expanding the margin of precision and accommodating vast amounts of data. However, the capability of the model to take a long context does not mean that it can efficiently leverage all the information. If the relevant information is located in the middle of the context window, LLM’s ability to recall it is worse than the one located in the beginning. In order to use RAG with the long context window, the reranking (e.g. Cross-Encoder) should be used. The reranking model first calculates a matching score between a given query and vectors in the database (e.g. representing documents). And then it rearranges vector search results so that the most relevant ones are prioritized.

Future Directions of RAG

While RAG offers numerous benefits, there are still opportunities for improvement. Researchers are exploring ways to enhance RAG by combining it with other techniques. These include fine-tuning (RAFT) or the long context window (in combination with reranking). Another direction of research is expanding RAG capabilities by advancing data handling (including multimodal data), evaluation methodologies, and scalability. Finally, RAG is also affected by the new advances in optimizing LLMs to run locally on restricted devices (mobile, IoT), along with the emergence of the first on-device vector database. Now, RAG can be performed directly on your mobile device, prioritizing privacy, low latency, and offline capabilities.

The on-device Vector Database for Android and Java

The on-device Vector Database for Android and Java

ObjectBox 4.0 is an on-device vector database allowing Android and Java developers to enhance their apps with local AI capabilities. A vector database facilitates advanced vector data processing and analysis, such as measuring semantic similarities across different document types like images, audio files, and texts. A classic use case would be to enhance a Large Language Model (LLM), or a small language model (like e.g. the Phi-3), with your domain expertise, your proprietary knowledge, and / or your private data. Combining the power of AI models with a specific knowledge base empowers high-quality, perfectly matching results a generic model simply cannot provide. This is called “retrieval-augmented generation” (RAG). Because ObjectBox works on-device, you can now do on-device RAG with data that never leaves the device and therefore stays 100% private. This is your chance to explore this technology on-device.

Vector Search (Similarity Search)

With this release it is possible to create a scalable vector index on floating point vector properties. It’s a very special index that uses an algorithm called HNSW. It’s scalable because it can find relevant data within millions of entries in a matter of milliseconds.

We pick up the example used in our vector search documentation. In short, we use cities with a location vector to perform proximity search. Here is the City entity and how to define a HNSW index on the location:

Vector objects are inserted as usual (the indexing is done automatically behind the scenes):

To perform a nearest neighbor search, use the new nearestNeighbors(queryVector, maxResultCount) query condition and the new “find with scores” query methods (the score is the distance to the query vector). For example, let’s find the 2 closest cities to Madrid:

Vector Embeddings

In the cities example above, the vectors were straight forward: they represent latitude and longitude. Maybe you already have vector data as part of your data. But often, you don’t. So where do you get the vector emebeddings of texts, images, video, audio files from?

For most AI applications, vectors are created by a embedding model. There are plenty of embedding models to choose from, but first you have to decide if it should run in the cloud or locally. Online embeddings are the easier way to get started and great for first testing; you can set up an account at your favorite AI provider and create embeddings online (only).

Depending on how much you care about privacy, you can also run embedding models locally and create your embeddings on your own device. There are a couple of choices for desktop / server hardware, e.g. check these on-device embedding models. For Android, MediaPipe is a good start as it has embedders for text and images.

Updated open source benchmarks 2024 (CRUD)

A new release is also a good occasion to update our open source benchmarks. The Android performance benchmark app provides many more options, but here are the key results:

CRUD is short for the basic operations a database does: create, read, update and delete. It’s an important metric for the general efficiency of a database.

Disclaimer 1: our focus is the “Object” performance (you may find a hint for that in our product name 🙂); so e.g. relational systems may perform a bit better when you directly work with raw columns and rows.

Disclaimer 2: ObjectBox delete performance was cut off at 800k per second to keep the Y axis within reasonable bounds. The actually measured value was 2.5M deletes per second.

Disclaimer 3: there cannot be enough disclaimers on any performance benchmark. It’s a complex topic where details matter. It’s best if you make your own picture for your own use case. We try to give a fair “arena” with our open source benchmarks, so it could be a starting point for you.

Feedback and Outlook: On-device vector search Benchmarks coming soon

We’re still working on a lot of stuff (as always 😉 and with on-device / local vector search being a completely new technology for Android, we need your feedback, creativity and support more than ever. We’ll also soon release benchmarks on the vector search. Follow us on LinkedIn, GitHub, or Twitter to keep up-to-date.

Vector search: making sense of search queries

Vector search: making sense of search queries

Today, finding the most valuable information to your search is more complicated than finding a needle in a haystack. Traditional search engines match keywords and favor SEO-optimized content, but what if there was a way for search engines to truly understand the meaning behind our queries? Enter vector search – a powerful technology that is transforming how we navigate information, not just for users, but also for applications performing background searches. In this article, we will discuss what vector search is and how it works.

What is a vector search and why should you care?

Example Results with a traditional search for “Simple Fruit Cake”.

Vector search, which is also known as semantic search, is a technology that improves search accuracy by understanding the meaning (semantics) of the data and relations between its parts. Unlike traditional search, vector search efficiently handles synonyms, typos, ambiguous language, and broad or fuzzy queries. This is because it focuses on meaning, not just keywords.

Imagine that you are searching for a dessert to cook during the weekend. In a traditional search engine, the “simple fruit cake” query will reveal only websites that include these keywords. However, a vector search engine is able to provide results like “apple pie in 20 minutes” or “easy summer desserts”, which capture the essence of the query and align with your desire for a straightforward dessert option, providing more valuable results to you. 

At its core, vector search uses Large Language models (LLMs), like GPT, to transform data into mathematical vectors, also known as vector embeddings

What is a vector embedding?

2D Vector Space Representation. “Easy apple pie” is close to “simple fruit cake” as they are both simple and have fruit as an ingredient. “Easy chocolate mousse” shares simplicity but does not contain fruit. “Fancy plum cake” has fruit but is not simple to make. And “extravagant chocolate mousse” does not share either simplicity or fruit as an ingredient. Thus, it is the farthest from “simple fruit cake”.

A vector or vector embedding is a numerical representation of any kind of unstructured data (e.g. texts, images, videos, audio). It captures its meaning while being easy and efficient to compute with. Think of it like this: imagine you have a collection of cake recipes. You can convert each recipe into a vector embedding, which is like a unique numerical code that represents the recipe’s characteristics (ingredients, cooking methods, flavors, etc.).

Once all the recipes are encoded into embeddings, we can perform a similarity search. This means we can compare the vectors to see how similar the recipes are. For example, the vector for an easy apple pie recipe would be close to the vector for a simple fruit cake recipe because they share similar characteristics (e.g. simplicity, fruitiness). On the other hand, the vector for an extravagant chocolate mousse cake would be farther away because it involves different ingredients and methods.

How to compare vectors?

Vector similarity is a measure of how similar two vectors are (see ep. 4 of ObjectBox Bites). There are three ways to compare vectors: Jaccard Similarity, Cosine Similarity, and L2 Distance (also known as Euclidean distance). Jaccard Similarity calculates the ratio of elements that are common to both vectors divided by the total number of elements in both vectors. Cosine Similarity calculates the cosine of the angle between two vectors. The last method is the L2 distance. It calculates the straight-line distance between two points in space represented by the vectors. This is the most frequently used method in AI applications. It is important to note that the choice of vector comparison method does not affect the mechanics of similarity search.

What is a vector database and how is it related to vector search?

A vector database is a specialized database designed to store, manage, and search vectors efficiently. This efficiency is crucial for handling large datasets and performing fast vector similarity searches. Also, with a vector database, the knowledge of AI models can be improved, adapted, and updated. Therefore, today, most AI apps use a vector database.

Imagine having an AI that knows your habits, your preferences, your health data, maybe even what’s in your fridge, and can use this knowledge to suggest recipes that fit your lifestyle and individual preferences. A standard AI model doesn’t have that data and wouldn’t learn that way, but with a vector database it can. Now, when you search for a “fruit cake recipe”, using this data, it can suggest a “simple fruit cake” without sugar if you usually prefer quick, easy, and healthy recipes, or a “fancy plum cake” if you enjoy more challenging baking projects and don’t like apples. Or, a vegan option, if you have neither milk nor eggs left in the fridge.

This technique is called Retrieval-Augmented Generation (RAG). It enhances the capabilities of LLMs with additional data (e.g. personal data, company data, fresh data) stored in a vector database.

When you query a vector database, it uses the query’s vector representation to find the nearest neighbors in the database.

Nearest Neighbor Search

How do we find the nearest neighbor to our query vector? The most straightforward approach is a brute-force search. It calculates the distance between our query vector and all other vectors in the database, one by one. Any metrics discussed in “How to compare vectors” can be used. However, this brute-force approach has a time complexity of O(N*d), where N is the number of vectors and d is the dimensionality. This becomes computationally expensive for large datasets.

Since exact nearest neighbor search can be slow for massive datasets, we often turn to approximate nearest neighbor (ANN) algorithms. These algorithms prioritize efficiency by finding neighbors that are very close (but not necessarily the absolute closest) to the query vector, significantly reducing search time. 

Continuing with the cooking assistant app example, imagine you’re searching for a “fruit cake recipe”. Assume that in our database, the real closest recipe is “simple apple pie”. With a massive database, an exact nearest neighbor search might take a long time to find the perfect match. However, an ANN algorithm can quickly find a recipe that is very similar to what you’re looking for, such as a “simple fruit cake” or a “basic apple pie”, even if it might not be the exact closest match. This efficiency ensures you get relevant and useful recipe suggestions promptly, enhancing your overall experience without a noticeable compromise in quality.

Approximate Nearest Neighbour Search

Now, let’s delve into the world of Approximate Nearest Neighbor (ANN) algorithms. The way you search for nearest neighbors depends on how the data is stored in the vector database. One of the earliest ANN algorithms, established in 1975, is called k-d trees. These trees work by recursively splitting the data space using hyperplanes, making the search process more efficient (see ep. 5 of ObjectBox Bites). However, k-d trees, like many exact nearest neighbor algorithms, suffer from the dimensionality curse. This means that as the number of dimensions (features) in your data increases, the distance between points becomes less meaningful, making searching very slow in high-dimensional spaces like those used in vector databases. 

For instance, consider simple fruit recipes. With a few features, such as cooking time and number of ingredients, finding similar recipes would be relatively straightforward. However, if we also include many other features like sweetness level, calorie count, fruit type, all specific ingredients, preparation complexity, and user ratings, the number of dimensions increases significantly. In such high-dimensional spaces, the traditional k-d tree method becomes inefficient because the distances between points (recipes) become less distinct and meaningful.

To overcome this challenge, ANN algorithms leverage two main approaches: indexing methods and sketching methods. Indexing methods work by creating a hierarchical data structure that allows for faster exploration of the search space. Imagine a well-organized library with categorized sections instead of just randomly placed books.  Sketching methods, on the other hand,  don’t search the entire dataset directly.  Instead, they create compressed versions (sketches) of the data that are faster to compare with the query vector. This reduces the search time significantly. Often, these two approaches are combined for optimal performance.


A popular example of an ANN search implementation for high-dimensional data is the Hierarchical Navigable Small World (HNSW) algorithm (e.g. implemented in Azure AI). HNSW relies on graph-based indexing to efficiently navigate the data space and find nearest neighbors. For more details watch episodes 6, 7, and 8 of ObjectBox Bites miniseries, where we describe the fundamentals of HNSW.

Take-away notes

To sum up, vector search offers a significant leap forward in how we search for information. By understanding the meaning and relationships behind data, it delivers more relevant and accurate results, even for unstructured data and complex queries. This technology has the potential to revolutionize various fields, from enhancing search engines to empowering AI applications. As vector search continues to evolve, we can expect even more exciting possibilities for navigating the ever-growing ocean of information and unlocking its full potential. This includes operating with data directly on the devices it was created on, reducing cloud costs, eliminating the reliance on an internet connection, and opening up using your private data without it ever being shared (100% private). If you’re interested in other AI and vector database-related topics, check out the ObjectBox mini-series. Stay tuned for more articles in the future.

Python on-device Vector and Object Database for Local AI

Python on-device Vector and Object Database for Local AI

Python developers can now use the very first on-device object/vector database for AI applications that run everywhere, locally. With its latest release, the battle-tested ObjectBox database has extended its Python support. This embedded database conveniently stores and manages Python objects and vectors, offering highly performant vector search alongside CRUD operations for objects.

What is ObjectBox?

ObjectBox is a lightweight embedded database for objects and vectors. Note that “objects” here refers to programming language objects, e.g. instances of a Python class. And because it was built for this purpose, ObjectBox is typically the fastest database option in this category. In terms of performance, it easily beats wrappers and ORMs running on top of SQL databases. This is because middle layers like SQL and row/column mapping simply do not exist.

ObjectBox is also a vector database storing high dimensional vector data and offering a highly scalable vector search algorithm (HNSW). Even with millions of documents, ObjectBox is capable of finding nearest neighbors within milliseconds on commodity hardware. And for ObjectBox, a vector is “just another” property type and thus, you can combine vector data with regular data using your own data model.

The ObjectBox API

Note: for an interactive version of the example, check our vector search Jupyter notebook on Google Colab, or one of the two vector-search-city examples in our repository.

Having an easy-to-use API is a top priority for ObjectBox. The following example uses a City entity, which has a name and a location. The latter is a two dimensional vector of latitude and longitude. We create a Store (aka the database) with default options, and use a Box to insert a list of Cities:

With cities stored in the database, let’s do a simple search for cities starting with “Be”:

Vector search follows the same pattern. This query locates the nearest neighbors to a given location:

LangChain Integration

ObjectBox is integrated as a Vector Database in LangChain via the langchain-objectbox package:

pip install langchain-objectbox --upgrade

Then, create an ObjectBox VectorStore using e.g. one of the from_ class methods e.g. from_texts class method:

from langchain_objectbox.vectorstores import ObjectBox
obx_vectorstore = ObjectBox.from_texts(texts, embeddings, ...)

We will look into details in one of our next blog posts.

Vector Search Performance

While ObjectBox is a small database, you can expect great performance. We ran a quick benchmark on using the popular and independent ANN benchmark open source suite. First results indicate that ObjectBox’ vector search is quite fast and that it can even compete with vector databases built for servers and the cloud. For more details, we will have a special ANN benchmark post that goes in more detail soon (follow us to stay up-to-date: LinkedIn, Twitter).

From Zero to 4: our first stable Python Release

We jumped directly to version 4.0 to align with our “core” version. The core of ObjectBox is written in high-performance C++ and with the release of vector search, we updated its version to 4.0. Thus you already get all the robustness you would expect from a 4.0 version of a product that has been battle tested for years. By aligning the major version, it’s also easy to tell that all ObjectBox bindings with version 4 include vector search.

What’s next?

There are a lot of features still in the queue. For example our Python binding does not support relations yet. Also we would like to do further benchmarks and performance work specific to Python. We are also open for contributions, check our GitHub repository

Evolution of search: traditional vs vector search

Evolution of search: traditional vs vector search

Introduction

In today’s digital landscape, searching for information is integral to our daily lives, whether for education, research, work, or shopping. However, as the volume and complexity of data kept growing, traditional search methods faced more and more challenges in providing accurate and relevant results. That’s where vector search comes in. We’re already seeing Google changing its search engine to empower the vector search (RankBrain, BERT, Neural matching), and expecting even greater incorporation of AI tools to improve search experience. Let’s explore the differences between traditional (keyword) search and vector search to understand how these technologies are shaping our search experiences, and how this impacts the discoverability of any content you might produce.

Traditional (keyword) search

Traditional search performs exact keyword matching from user queries to the data to retrieve relevant results. For example, searching for “programming languages” with traditional search will list every source containing those words. A more advanced version can also incorporate additional rules to enhance search results, such as:

  • keyword frequency (how often the term “programming languages” is used within the result text),
  • the presence of related terms (e.g. “Java”, “Python”, “C++” versus “cooking”, “gardening”),
  • or location (results closer to your location are favored).

While this approach has served us well, it struggles with ambiguous language, synonyms as well as the impact of SEO strategies, often resulting in less accurate or less valuable search results. This can be especially frustrating for businesses who are trying to get their content seen by the right people. For example, a business that publishes a blog post about “sustainable fashion tips” might miss out on potential customers who are searching for “eco-friendly clothing recommendations” or “green clothing ideas,” simply because their keywords don’t exactly match.

Vector (semantic) search

2D Vector Space Representation. In this space, “Python” and “Java” are close to each other as well as to the “Programming language” query we are searching for because they are similar (they share high values in their features).


On the other hand, vector search takes a different approach by seeking out related objects that share similar characteristics or semantics. You can think of it as finding results based on meaning or understanding rather than just exact wording. For example, searching for “programming languages” with vector search will not only find sources mentioning those exact words but also identify specific languages like “Python” or “Java” as well as related concepts such as “coding tutorials” or“development frameworks”.

To do a vector search, first of all, the content, such as texts, images, audio files, or videos, needs to be represented as vectors/embeddings (also often called vector embeddings) by AI models. These embeddings represent data in a multidimensional vector space. Vectors capture the essence or semantics of the data they represent while remaining computationally efficient.

Once these vector representations are generated, they are basically sets of numbers, and therefore easy to compute with. For instance, instead of searching for a specific word in text, we aim to find the closest vector (from the text embeddings) to the query vector (representing the word we’re searching for). This process relies on well-established vector computing methods, such as calculating the distance between vectors or minimizing the angle between them (Cosine similarity).

Comparison

Let’s now compare different aspects of searching to understand the main differences between traditional search and vector search (also summarized in Table). 

  • Search Approach
    In traditional search, the approach relies on matching keywords directly from the user query to the content. Vector search uses vector embeddings to catch the semantics of the data to perform a meaning-based approach.
  • Ambiguity handling
    Therefore, vector search shines when it comes to handling ambiguity. It is superior for handling synonyms, ambiguous language, and broad or fuzzy queries compared to traditional search. This also automatically influences the relevance of the search results.
  • Search relevance calculation
    The metrics used for search relevance calculations are different. The traditional search uses term frequency-inverse document frequency (TF-IDF) and BM25, while vector search uses Jaccard Similarity, Cosine Similarity, and L2 Distance (or Euclidean).
  • Speed and Implementation
    Traditional search is easy to implement, straightforward in usage, and fast for simple queries. Vector search may be slower for simple queries and more complicated to implement, once it comes to huge datasets. However, the implementation of approximate nearest neighbor techniques (ANN) allows to significantly speed up the process.
  • Scalability
    Continuous expansion of content, challenges the scalability of traditional search, while for vector search scalability is one of the advantages. 
  • Cost
    While traditional search may have lower computational requirements, the superior performance and accuracy of vector search often justify the investment in additional computing power. Furthermore, the computational costs for vector search can be significantly reduced with the use of ANN.
Comparison table between keyword search and vector search

Conclusions

In summary, both traditional search and vector search offer distinct advantages and drawbacks. Vector search excels in handling ambiguity, correcting typos, enhancing relevance, and managing extensive datasets. Traditional search remains advantageous for straightforward queries, exact matches, or smaller datasets. Historically, limited computational resources, particularly for on-device computation (i.e. Edge Computing), favored traditional search. However, the landscape is evolving rapidly with the introduction of the first edge vector database solution by ObjectBox. This innovation promises to revolutionize the scenario by optimizing vector search for devices with constrained resources, extending the benefits of semantic search to the Edge.