4 months ago

Sun Oct 12, 2025 2:58pm PST

Show HN: Pyversity – Fast Result Diversification for Retrieval and RAG

Hey HN! I’ve just open-sourced Pyversity, a lightweight library for diversifying retrieval results. Most retrieval systems optimize only for relevance. This often leads to top-k results that look almost identical. Pyversity efficiently re-ranks results to balance relevance and diversity, surfacing items that remain relevant but are less redundant.

Main features:

- Unified API: one function (diversify) supporting several well-known strategies: MMR, MSD, DPP, and COVER (with more to come)

- Lightweight: the only dependency is NumPy, keeping the package small and easy to install

- Fast: efficient implementations for all supported strategies; diversify results in milliseconds

Re-ranking with cross-encoders is very popular right now, but also very expensive. From my experience, you can usually improve retrieval results with simpler and faster methods, such as the ones implemented in this package. This helps retrieval, recommendation, and RAG systems present richer, more informative results by ensuring each new item adds new information.

Code and docs: github.com/pringled/pyversity

Let me know if you have any feedback, or suggestions for other diversification strategies to support!

read article

comments:

add comment

loading comments...