This exercise will use Flowise hosted on Hugging Face to build a RAG (Retrieval-Augmented Generation) using LangChain.
A RAG (Retrieval-Augmented Generation) pipeline in LangChain connects your data to a language model through a structured flow. Documents are loaded, split, converted into embeddings, and stored in a vector database, which is then accessed via a VectorDB QA Chain to retrieve relevant context for a user’s query. This retrieved information is passed to a chat model (e.g., OpenAI GPT) along with the question to generate a grounded answer. By combining retrieval with generation, RAG improves accuracy and allows the model to use up-to-date or private knowledge.
Load your data and text chunking
|
- LangChain has document loader nodes that read PDFs, text files, webpages, etc. into a usable format. In this setup, I have uploaded a LTA Sustainability Report 2024/2025 using a standard PDF loader.
- Caveat: A standard PDF loader may struggle with tables and ignore images.
- Use the Recursive Text Splitter node to break large documents into smaller chunks. This makes searching faster and more accurate.
|
Convert to embeddings
|
- Each chunk is turned into a numerical representation using embedding models. There are many embedding models available (e.g., OpenAI embeddings). In this RAG, I am using GoogleGenerativeAI Embedding. Connect with a Google Gemini API key, choose the embedding model, and select RETRIEVAL DOCUMENT for task type.
- A point to take note: by default, newer Google models like gemini-embedding-001 or gemini-embedding-2 output 3072 dimensions. Therefore, when creating the Pinecone index, the dimension must match accordingly.
|
Store in a vector database
|
Store those embeddings in a vector database like Pinecone. This is where your “knowledge base” lives. For free tier accounts, Pinecone allows a maximum of five indexes.
In this step, prepare your Pinecone API key and the index name created in your Pinecone account.
|
|
Retrieval and Generation
|
The VectorDB QA Chain will access the vector store to retrieve relevant context for a user’s query.
This retrieved information is passed to a chat model along with the question to generate a grounded answer. In this RAG setup, I utilized the ChatOpenRouter API endpoint via its dedicated node, allowing access to various free models without incurring API costs.
|
The complete RAG Chain

Interface
Once the RAG is completed, you can test the chatbot inside the flowise editor. After that, the final step is to move the chatbot out of the Flowise editor and into a real-world interface.
There are several options available, you can embed the chatbot as a popup widget on any website or share the chatbot link with others.
Share Chatbot creates a standalone, hosted webpage just for that specific bot.
I will use the share chatbot link to make a quick video demo of this RAG chatbot.

Pro-Tips for a $0 RAG Build
My goal is a totally free development environment, here is the “Golden Stack” for 2026 provided by Gemini 3:

This post is not sponsored and there are no affiliate links.
Back to Projects portfolio