The reason why your RAG solution may have poor performance

Zalwert
4 min readJun 14, 2024
low vs high dimensional space

TL;DR: Approximate nearest neighbors may get lost in high-dimensional space. This is especially important in RAG solutions where small differences in embeddings matter for exact match.

While HNSW (and other proxies, see: benchmarking nearest neighbors) are designed to handle high-dimensional data efficiently compared to brute-force methods, it’s important to consider their limitations in higher dimensions.

As dimensionality increases, the algorithm’s ability to maintain accurate and efficient nearest neighbor search may diminish due to the curse of dimensionality, increased computational complexity, and challenges in maintaining local connectivity.

Let’s see the example of HNSW algorithm.

The HNSW (Hierarchical Navigable Small World) algorithm is a data structure algorithm used for approximate nearest neighbor search in high-dimensional spaces.

How HNSW algorithm works:

Graph Structure:

  • HNSW organizes data points (vectors) into a hierarchical graph structure where each node represents a data point.
  • Nodes are connected to other nodes based on their proximity in the vector space.

--

--

Zalwert

Experienced in building data-intensive solutions for diverse industries