AI – Using Python for a RAG (Part I)

By | 25/06/2025

In this post, we will see what a RAG is and how to execute it locally using Python, Ollama, and Google’s Gemma3 model with three differents projects:

  • RAG over PDFs: Indexing and querying multiple PDF documents.
  • RAG over a tabular dataset: Turning CSV data into a searchable knowledge base.
  • RAG over a SQLite database: Embedding and retrieving from structured tables.

But first of all, what is a RAG?
“Retrieval Augmented Generation (RAG) is a hybrid approach that integrates an information retrieval system with a generative language model to produce contextually relevant and factually accurate responses. Unlike traditional LLMs that rely solely on their pre-trained knowledge, RAG retrieves relevant documents or data from an external knowledge base, which the LLM then uses to generate informed responses.”
In a nutshell with a RAG we can insert our data like documents, databases or any knowledge base, into a Large Language Model. It pulls relevant info from our data and uses the LLM’s smarts to generate accurate, context-aware answers. It’s like giving the model a custom library to work with!

WHEN TO USE RAG?

  • Domain-specific applications: When we need to build applications specialized for legal, medical, technical, or other fields where domain knowledge is crucial.
  • Data recency requirements: When information changes frequently and we need answers that reflect the most current data without constantly retraining the model.
  • Proprietary information access: When our application must reference internal documentation, knowledge bases, or data not available in the model’s training corpus.
  • Resource constraints: When fine-tuning an entire LLM is prohibitively expensive in terms of computational resources or dataset requirements.
  • Privacy considerations: When we need to keep sensitive information local rather than sending it to third-party APIs.


COMPONENTS OF A RAG SYSTEM
To build a RAG system, we need several key components. Below, each is described in detail.

[‘data source’]

  • description: The external knowledge base containing the information to be retrieved. This can be PDFs, datasets (e.g., CSV, JSON), databases (e.g., SQLite), or other structured/unstructured data.
  • role in RAG: Provides the raw data that the retrieval system searches.
  • example: A set of PDF research papers, a CSV dataset, or a SQLite table with text data.


[‘document loader’]

  • Description: A tool or library that extracts text from the data source and preprocesses it into manageable chunks. For PDFs, this might involve extracting text; for datasets or databases, it involves reading rows or records.
  • Role in RAG: Converts raw data into a format suitable for embedding generation.
  • Tools: PyPDF2 or langchain.document_loaders for PDFs, pandas for datasets, sqlite3 for SQLite databases.


[‘text chunkers’]

  • Description: Components that split documents into smaller, semantically meaningful segments of text.
  • Role in RAG: Chunkers create digestible pieces that can be individually embedded and retrieved, improving precision by allowing retrieval of specific relevant sections rather than entire documents.


[’embedding model’]

  • Description: A model that converts text chunks into numerical vectors (embeddings) that capture semantic meaning. These vectors enable similarity search.
  • Role in RAG: Transforms text into a format that can be stored and searched in a vector database.
  • Example: nomic-embed-text (available via Ollama) or Hugging Face’s sentence-transformers.


[‘vector database’]

  • Description: A database optimized for storing and searching embeddings using similarity metrics (e.g., cosine similarity). It indexes embeddings for fast retrieval.
  • Role in RAG: Stores embeddings of the data source and retrieves the most relevant chunks based on the query’s embedding.
  • Tools: ChromaDB (lightweight, local), FAISS, or SQLite with vector extensions (e.g., sqlite-vss).


[‘retriever’]

  • Description: A component that queries the vector database to fetch the most relevant documents or chunks based on the input query’s embedding.
  • Role in RAG: Bridges the query and the vector store, ensuring relevant context is retrieved.
  • Tools: LangChain’s retriever interface, which integrates with vector stores like ChromaDB.


[‘Large Language Model’]

  • Description: The generative model that produces the final response based on the query and retrieved context. In this post, we use Gemma3 via Ollama.
  • Role in RAG: Combines the retrieved context with the query to generate coherent, context-aware answers.
  • Example: Gemma3 (4B parameters), a lightweight, high-performing model suitable for local deployment.

[‘framework for integration’]

  • Description: A library or framework that orchestrates the RAG pipeline, connecting loaders, embedding models, vector stores, retrievers, and the LLM.
  • Role in RAG: Simplifies the implementation of the RAG workflow.
  • Tools: LangChain (widely used for RAG) or LlamaIndex.


Before diving into our projects (that we will see in the next Posts), let’s install Ollama and download two models that we will use in our projects:

1 – Install Ollama
2 – Pull Gemma 3 (default is 4 B parameters):

ollama pull gemma3

3 – Pull nomic-embed-text (it is a model used to convert text into numerical vectors to capture semantic meaning:

ollama pull nomic-embed-text




Leave a Reply

Your email address will not be published. Required fields are marked *