HelixDB Launches New Graph-Vector Database with Claims of 1000x Performance Over Neo4j

BigGo Editorial Team
HelixDB Launches New Graph-Vector Database with Claims of 1000x Performance Over Neo4j

The open-source database landscape has a new contender with the launch of HelixDB, a graph-vector database written in Rust that's specifically designed for RAG (Retrieval Augmented Generation) and AI applications. What's catching the community's attention are the bold performance claims and its unique approach to combining graph and vector functionalities.

This GitHub page for HelixDB showcases its structure as an open-source graph-vector database for AI applications
This GitHub page for HelixDB showcases its structure as an open-source graph-vector database for AI applications

Performance Claims Raise Eyebrows

HelixDB's developers claim their database is 1000x faster than Neo4j and 100x faster than TigerGraph while being on par with Qdrant for vectors. These assertions have prompted community members to request evidence, with one user directly asking for benchmarks to support these claims. The HelixDB team has acknowledged they ran these benchmarks but hadn't published them before announcing the project, promising to add detailed performance data to their documentation.

Vector Capabilities and Dimensions

The database appears to have robust vector support, with developers confirming there's currently no cap on vector dimensions. They mentioned they'll likely implement a cap around 64,000 dimensions in the future, similar to other vector databases like Qdrant and Pinecone. The team also revealed plans to implement binary quantization in the coming months to improve performance with higher-dimensional vectors, showing an awareness of the performance trade-offs involved in vector operations.

Graph-Vector Integration Sets It Apart

What distinguishes HelixDB from competitors like KuzuDB is its approach to integrating graph and vector functionalities. According to the developers, HelixDB supports incremental indexing on vectors, allowing updates without requiring a complete re-indexing of all vectors. This addresses a pain point with some existing solutions where the vector index is completely separate from the graph structure, requiring full re-indexing when updates occur.

Pretty much the same way you would with any graph DB, with the added benefit of being able to treat a vector as a node by creating those explicit relationships between them.

Custom Query Language Sparks Discussion

HelixDB's custom query language has generated mixed reactions. Some users expressed concern about having to learn a new domain-specific language (DSL), particularly regarding the ability to use it with LLMs for query generation. The developers defended this choice, explaining that no existing language properly encapsulates both graph and vector functionality, and they wanted to create a type-safe query language. They mentioned they're working on integrating their grammar into LLaMa's CPP code to ensure LLMs can generate grammatically correct queries in their language.

Browser Compatibility and Embedded Use

Several users inquired about running HelixDB in the browser via WebAssembly (WASM) for privacy-focused applications and about using it as an embedded database similar to SQLite. The team acknowledged that LMDB, their current storage engine, is a roadblock for browser compatibility, but mentioned they have plans to develop their own storage engine with WASM support. For now, HelixDB cannot run as an embedded database, which limits some potential use cases.

Future Development and Roadmap

The HelixDB team has outlined several upcoming features, including sparse search using BM25, with some community members suggesting consideration of SPLADE models for enhanced search capabilities. Their roadmap also includes expanding vector capabilities, enhancing the query language, implementing a test suite, building a deterministic simulation testing engine, and eventually developing their own graph-vector storage engine to replace LMDB.

As HelixDB enters the increasingly competitive space of vector and graph databases, its performance claims and unique approach to combining these functionalities have certainly captured attention. The community seems cautiously optimistic, with many expressing interest in trying the database and providing feedback. How HelixDB will differentiate itself in the long term from established players and other newcomers remains to be seen, but its focus on developer experience and performance for AI applications appears to be resonating with potential users.

Reference: HelixDB/helix-db