Semantic Search and RAG on a FOSS stack

Wikimedia Hackathon Tallinn 2024

github.com/rti/barebone-rag

Robert Timm robert.timm@wikimedia.de

Semantic Search and RAG on a FOSS stack

Given a query, find texts with a meaning similar.

Retrieval Augmented Generation (RAG)

Create texts based on information loaded from external sources.

Free and Open Source Software (FOSS) Stack

All software components are released under OSI approved licenses.

Semantic Search and RAG on a FOSS stack

Demo Time

Semantic Search and RAG on a FOSS stack

GPU stacks

  • ⛔ NVIDIA CUDA has a proprietary license
  • ✅ AMD ROCm stack is MIT licensed
    • amdgpu driver in kernel mainline
Semantic Search and RAG on a FOSS stack

Components for Semantic Search

  • Encode semantics ▶ Embeddings
  • Find semantically similar objects ▶ Vector Database
Semantic Search and RAG on a FOSS stack

Embedding Models

Dims OSI License Pre Train Data Fine Tune data
all-MiniLM-L6-v2 384 ✅ Apache-2.0 ✅ ✅
nomic-embed-text-v1 768 ✅ Apache-2.0 ✅ ✅
bge-large-en-v1.5 1024 ✅ MIT ⛔ ⛔
mxbai-embed-large-v1 1024 ✅ Apache-2.0 ⛔ ⛔
Semantic Search and RAG on a FOSS stack

Embedding Inference

🦙 Ollama

Get up and running with large language models.

  • Inference engine based on llama.cpp
  • Supports AMD GPU via ROCm
  • CPU support (AVX, AVX2, AVX512, Apple Silicon)
  • Quantization
  • Model Library
  • One model, one inference at a time
Semantic Search and RAG on a FOSS stack

Embedding Inference

OSI License ROCm support Production
Ollama ✅ MIT ✅ ⛔
llama.cpp ✅ MIT ✅ ⛔
HF Text Embeddings Interface ✅ Apache-2.0 ⛔ ✅
Infinity ✅ MIT ✅ ✅
Semantic Search and RAG on a FOSS stack

Embedding Implementation

Start ollama

$ ollama serve &
$ ollama pull nomic-embed-text-v1

Generate embedding

import ollama # pip install ollama

res = ollama.embeddings(
    model="nomic-embed-text-v1",
    prompt="This string")

res["embedding"] # [0.33, 0.62, 0.19, ...]
Semantic Search and RAG on a FOSS stack

Vector Database

🐘 Postgres can do it 🎉

pgvecto.rs

Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres.

A PostgreSQL extension written in Rust.

Semantic Search and RAG on a FOSS stack

Vector Database Licensing

OSI License
PostgreSQL ✅ PostgreSQL License
pgvector.rs ✅ Apache-2.0
Semantic Search and RAG on a FOSS stack

Vector Database Implementation

Create PostgreSQL table using pgvecto.rs

CREATE EXTENSION vectors;

CREATE TABLE chunks (
  text TEXT NOT NULL,
  embedding VECTOR( 768 ) NOT NULL
);

Find most similar chunks

SELECT text FROM chunks
  ORDER BY embedding <-> [0.33, 0.62, 0.19, ...]
  LIMIT 5;
Semantic Search and RAG on a FOSS stack

Components for Retrieval Augmented Generation (RAG)

  • Find matching sources ▶ Semantic Search
  • Generate Response ▶ Large Language Model (LLM) Inference
Semantic Search and RAG on a FOSS stack

LLM Inference

🦙 Ollama

Semantic Search and RAG on a FOSS stack

LLM Inference

OSI License ROCm Support Production
Ollama ✅ MIT ✅ ⛔
llama.cpp ✅ MIT ✅ ⛔
vllm ✅ Apache-2.0 ✅ ✅
HF Text Generation Interface ✅ Apache-2.0 ✅ ✅
Semantic Search and RAG on a FOSS stack

LLM Inference Implementation

Generate a text based on a prompt

import ollama # pip install ollama

res = ollama.chat(
    model="zephyr:7b-beta",
    messages=[{"role": "user", "content": f"Summarize this text: {text}"}],
    stream=False,
)
res["message"]["content"] # "The given text..."
Semantic Search and RAG on a FOSS stack

Large Language Model Building Blocks

  • Weights (binary)
  • Pre training (source)
  • Fine tuning data (source)
  • Training code (build scripts)
Semantic Search and RAG on a FOSS stack

OSI - Open Source AI Initiative

  • Intends to define Open Source models
  • Defines which parts need to have OSD-compliant licenses
  • Draft, Release planned for October 2024
  • Latest draft (April 2024)
    • Marks training data sets as optional
    • But requires data characteristics, labeling procedures, etc.
Semantic Search and RAG on a FOSS stack

LLMs with Openly Licensed Weights

OSI Weights PT Data FT Data Code
Mistral 0.2 7b ✅ Apache-2.0 ⛔ ⛔ ⛔
HF Zephyr 7b beta ✅ MIT ⛔ ✅ ✅
Microsoft Phi-3 Mini 3.8b ✅ MIT ⛔ ⛔ ⛔
Apple ELM 3b ⛔❓ASCL ✅ ✅ ✅
Meta Llama 3 8b ⛔ Custom ⛔ ⛔ ⛔
Google Gemma 1.1 7b ⛔ Custom ⛔ ⛔ ⛔

PT: pre-training - FT: fine tuning

Semantic Search and RAG on a FOSS stack

Open Source LLM Projects

Semantic Search and RAG on a FOSS stack

Conclusion

  • ✅ Almost all software components are available with OSI approved licenses
  • ✅ ROCm works and people are using it
  • ❓ Definition of open source models unclear
  • 🤔 Identifying truly open source models is complicated
  • ⏳ Interesting developments ongoing

TODO: what is hybrid-enabled https://pgvecto-rs-docs-git-fork-gaocegege-hybrid-tensorchord.vercel.app/use-cases/hybrid-search.html