Onyx's Deep Research Capabilities Impress Users with Hybrid Search Approach

BigGo Editorial Team

Onyx's Deep Research Capabilities Impress Users with Hybrid Search Approach

In the rapidly evolving landscape of enterprise search and knowledge management, Onyx (formerly known as Danswer) has emerged as a noteworthy solution that's generating significant buzz among technical users. The platform combines open-source generative AI with enterprise search capabilities, allowing organizations to connect their internal documents, applications, and people into a unified knowledge system.

What's particularly interesting about Onyx isn't just its feature set, but the technical architecture that powers its search capabilities, which has become a focal point of community discussion.


The GitHub repository for Onyx, showcasing its codebase and development structure

Hybrid Indexing Approach

At the core of Onyx's effectiveness is its hybrid document indexing system that combines keyword frequencies with vector embeddings. Unlike solutions that rely on the native search capabilities of individual applications, Onyx builds a comprehensive document index across all connected sources. This approach addresses several key challenges in enterprise search, including team-specific terminology, natural language queries, and non-exact matching.

The document index is a hybrid index of keyword frequencies and vectors. The keyword component addresses issues like team specific terminology and the vector component allows for natural language queries and non-exact matching.

This architecture allows Onyx to process documents prior to query time, creating LLM-friendly representations that enable fast inference. The system also incorporates additional signals such as document recency, applying time-based weighting to prioritize more up-to-date information across all sources.

Deep Research vs. Traditional RAG

Many community members have questioned how Onyx's deep research capabilities differ from standard Retrieval-Augmented Generation (RAG) systems. The distinction lies in how the agent interacts with the underlying search infrastructure. While RAG serves as the foundational tool, Onyx's deep research agent can perform multiple searches, reflect on previous results, and generate chain-of-thought outputs to explore information more thoroughly.

The agent can decide which questions to explore further, similar to how a human researcher might follow different threads of inquiry when investigating a complex topic. This creates a more dynamic and thorough research process compared to single-query RAG implementations.

Permissions Management

A significant challenge for enterprise knowledge systems is handling the complex permissions models across different applications. Onyx addresses this by mapping external objects and their associated users/groups into a unified representation within the platform.

The system runs asynchronous jobs that check for permission updates at configurable intervals, with defaults tailored to each external source type. This approach maintains security while enabling cross-application search, always defaulting to the least permissive access model to prevent unauthorized information exposure.

Performance and Evaluation

In internal evaluations using a dataset comprising typical enterprise content (Slack messages, technical documentation, etc.), Onyx reports impressive results. With a 10,000-document test set, the system achieved over 94% recall at 4,000 tokens, maintaining over 90% recall even when expanded to hundreds of thousands of documents with added noise.

The platform was primarily developed against GPT-4o but has been tuned to work effectively with other recent models including Claude 3.5, Gemini, and Deepseek.

Future Directions

Looking ahead, Onyx is exploring several advanced information retrieval methods, including customized LLM-based knowledge graphs inspired by approaches like LightGraphRAG. Other planned features include personalized search, organizational understanding with expert suggestion capabilities, code search, and structured query language support.

For organizations seeking to improve knowledge discovery and utilization across their digital ecosystem, Onyx represents an interesting open-source option that can be deployed locally, on-premise, or in the cloud. The community edition is freely available under the MIT Expat license, while an enterprise edition with additional features targeted at larger organizations is also available.

As AI-powered enterprise search continues to evolve, Onyx's approach of combining deep research capabilities with a unified document index demonstrates how the gap between disparate information sources can be bridged effectively, potentially reducing the time and effort required for knowledge workers to find and synthesize information.

Reference: Open Source Gen-AI + Enterprise Search