Python developers can now use the very first on-device object/vector database for AI applications that run everywhere, locally. With its latest release, the battle-tested ObjectBox database has extended its Python support. This embedded database conveniently stores and manages Python objects and vectors, offering highly performant vector search alongside CRUD operations for objects.
What is ObjectBox?
ObjectBox is a lightweight embedded database for objects and vectors. Note that “objects” here refers to programming language objects, e.g. instances of a Python class. And because it was built for this purpose, ObjectBox is typically the fastest database option in this category. In terms of performance, it easily beats wrappers and ORMs running on top of SQL databases. This is because middle layers like SQL and row/column mapping simply do not exist.
ObjectBox is also a vector database storing high dimensional vector data and offering a highly scalable vector search algorithm (HNSW). Even with millions of documents, ObjectBox is capable of finding nearest neighbors within milliseconds on commodity hardware. And for ObjectBox, a vector is “just another” property type and thus, you can combine vector data with regular data using your own data model.
The ObjectBox API
Note: for an interactive version of the example, check our vector search Jupyter notebook on Google Colab, or one of the two vector-search-city examples in our repository.
Having an easy-to-use API is a top priority for ObjectBox. The following example uses a City entity, which has a name and a location. The latter is a two dimensional vector of latitude and longitude. We create a Store (aka the database) with default options, and use a Box to insert a list of Cities:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | from objectbox import * @Entity() class City: id = Id() name = String() location = Float32Vector(index=HnswIndex(dimensions=2)) # In the code example, there are 213 capitals of the world cities = [City(name=”Berlin”, location=[52.5200, 13.4050]), City(name=”London”, location=[51.5072, -0.1276])] store = Store() box = store.box(City) box.put(cities) |
With cities stored in the database, let’s do a simple search for cities starting with “Be”:
1 2 3 4 5 | query = box.query(City.name.starts_with("Be")).build() results = query.find() for city in results: print(f"{city.name:>10s} {city.location}") |
Vector search follows the same pattern. This query locates the nearest neighbors to a given location:
1 2 3 4 | query_location = [51.0, 12.0] # Somewhere in Germany, south-west of Berlin query = box.query(City.location.nearest_neighbor(query_location, 15)).build() results = query.find_with_scores() |
LangChain Integration
ObjectBox is integrated as a Vector Database in LangChain via the langchain-objectbox package:
pip install langchain-objectbox --upgrade
Then, create an ObjectBox VectorStore using e.g. one of the from_
class methods e.g. from_texts
class method:
from langchain_objectbox.vectorstores import ObjectBox
obx_vectorstore = ObjectBox.from_texts(texts, embeddings, ...)
We will look into details in one of our next blog posts.
Vector Search Performance
While ObjectBox is a small database, you can expect great performance. We ran a quick benchmark on using the popular and independent ANN benchmark open source suite. First results indicate that ObjectBox’ vector search is quite fast and that it can even compete with vector databases built for servers and the cloud. For more details, we will have a special ANN benchmark post that goes in more detail soon (follow us to stay up-to-date: LinkedIn, Twitter).
From Zero to 4: our first stable Python Release
We jumped directly to version 4.0 to align with our “core” version. The core of ObjectBox is written in high-performance C++ and with the release of vector search, we updated its version to 4.0. Thus you already get all the robustness you would expect from a 4.0 version of a product that has been battle tested for years. By aligning the major version, it’s also easy to tell that all ObjectBox bindings with version 4 include vector search.
What’s next?
There are a lot of features still in the queue. For example our Python binding does not support relations yet. Also we would like to do further benchmarks and performance work specific to Python. We are also open for contributions, check our GitHub repository.
Fantastic