Semantic Search and RAG on a FOSS stack
LLM Inference Implementation
Generate a text based on a prompt
import ollama
res = ollama.chat(
model="zephyr:7b-beta",
messages=[{"role": "user", "content": f"Summarize this text: {text}"}],
stream=False,
)
res["message"]["content"]