All you know about RAG is a lie

This content originally appeared on DEV Community and was authored by Andreea Miclaus

Everything starts with a PoC, right? A client approaches you with basic requirements and a vision to create something groundbreaking. That’s when the excitement begins—turning an idea into a proof of concept (PoC) feels like the first step toward innovation.

Over the past twelve months, I’ve gone through five different attempts to launch a fully functional Retrieval-Augmented Generation (RAG) system in production. Every single one ended up on the scrap heap for different reasons. Some projects died early in the prototyping phase, while others crashed and burned when scaling issues reared their ugly heads.

The journey taught me one critical lesson: choosing the right focus areas during the PoC phase can make or break the project.

As shown in the graphic above, the RAG pipeline consists of multiple moving parts—from preprocessing documents to integrating with vector databases and large language models. Each layer comes with its own engineering challenges; not all are worth solving during a PoC.

The key to a successful PoC is identifying which parts of the RAG pipeline truly matter and warrant deeper engineering effort.

Focusing too broadly or tackling production-scale issues prematurely is a recipe for wasted time, blown budgets, and, ultimately, failed projects.

In the following sections, I’ll share the lessons I learned across five different attempts, highlighting what worked and what didn’t and how careful selection during the PoC phase could have saved me a lot of headaches.

Project #1: Let’s "LangChain" everything

Generative AI was everywhere.

It seemed like everyone was talking about the next generation of chatbots, proclaiming that classical machine learning was outdated.

A lot of noise was in my mind, so I decided to take what appeared to be the easiest route: using open-source LLM orchestrators like LangChain.

I went into their documentation, binge-watched YouTube tutorials, and for a moment, I felt invincible—like everything was finally falling into place, as if a divine hand was guiding me.

Armed with an open-source framework, I figured hooking up a vector database to a large language model was no big deal. After all, I had worked with AI APIs and text embeddings before.

But I couldn’t have been more wrong.

What went wrong?

Dependency hell: LangChain and its associated libraries were frequently updated, and with every update came compatibility issues. The vector database APIs and LLM integrations would often break, requiring constant troubleshooting and rework.
Loss of control: Using an external framework meant I had little control over its internal workings. Changes in the framework’s imports or logic disrupted my implementation, forcing me to rewrite parts of my code every time the framework evolved.
Scalability issues: While LangChain worked well for a single-user PoC, scaling it to multiple concurrent users introduced latency and resource allocation issues that the framework was not equipped to handle.
Security gaps: Sensitive information, such as user data, leaked through generated responses because there was no built-in mechanism to manage private or confidential data securely. These leaks led to compliance concerns and blocked progress.

Takeaway:
LangChain and similar frameworks are fantastic for building quick proofs of concept, offering a way to validate ideas and experiment with LLMs.

However, transitioning to production requires an entirely different approach.

For production, you need complete control over your pipeline, robust scalability strategies, and a security-first mindset. The flexibility and speed that make frameworks like LangChain appealing for PoCs can become liabilities when faced with real-world demands.

Project #2: The "It can’t be that hard" prototype → no frameworks, 100% control over data

In my mind, data is the most crucial part of any AI system. So, in one of our projects, I decided to build the data ingestion and indexing components entirely from scratch. My thinking was simple: if we could ensure 100% control over the data pipeline, we’d avoid the issues that come with off-the-shelf frameworks and guarantee long-term flexibility.

To make this approach even more robust, we decided to build custom data connectors for various sources like Google Drive, Microsoft Outlook, PDFs, and wikis.

On top of that, we added the Ray framework for distributed processing and used low-level control with the Qdrant SDK for vector indexing. This would give us unparalleled control—or so I thought.

What went wrong?

Document ingestion nightmares: Parsing files turned into a complete fiasco. Hidden metadata in PDFs caused chunking logic to break. Microsoft Outlook attachments came in unpredictable formats, and wikis were riddled with inconsistent structure. Each source introduced unique quirks that required constant debugging.
Hallucinations: Despite the focus on data quality, the LLM still generated references to nonexistent documents. Adjusting prompt parameters helped marginally, but hallucinations were far from eliminated.
Complexity overload: Developing custom connectors and indexing logic during the PoC phase created a flood of bugs. Prematurely adding production-level features—like distributed processing with Ray—complicated the system far beyond what was necessary for a proof of concept.
Qdrant SDK challenges: While Qdrant is powerful, using its low-level SDK demands a deeper understanding of how vector databases work. This introduced a steep learning curve, and bugs in query performance and indexing logic delayed progress.

Takeaway:
Preprocessing and data consistency are critical to AI success, but trying to build everything from scratch for a PoC is overkill.

Building custom data connectors is hard enough without the added complexity of integrating distributed frameworks like Ray or low-level vector database tools like Qdrant SDK. For a PoC, simplicity should be the priority—production-level features can (and should) wait for later.

I learned that while data is king, focusing solely on data quality during a PoC can derail the entire project if it comes at the expense of speed and simplicity.

👇👇👇
This is an excerpt from the full article over on Substack. If you found it helpful, please consider subscribing, it helps us know we're on the right track!

This content originally appeared on DEV Community and was authored by Andreea Miclaus

Print Share Comment Cite Upload Translate Updates

APA

Andreea Miclaus | Sciencx (2025-01-29T19:53:34+00:00) All you know about RAG is a lie. Retrieved from https://www.scien.cx/2025/01/29/all-you-know-about-rag-is-a-lie/

MLA

" » All you know about RAG is a lie." Andreea Miclaus | Sciencx - Wednesday January 29, 2025, https://www.scien.cx/2025/01/29/all-you-know-about-rag-is-a-lie/

HARVARD

Andreea Miclaus | Sciencx Wednesday January 29, 2025 » All you know about RAG is a lie., viewed ,<https://www.scien.cx/2025/01/29/all-you-know-about-rag-is-a-lie/>

VANCOUVER

Andreea Miclaus | Sciencx - » All you know about RAG is a lie. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/01/29/all-you-know-about-rag-is-a-lie/

CHICAGO

" » All you know about RAG is a lie." Andreea Miclaus | Sciencx - Accessed . https://www.scien.cx/2025/01/29/all-you-know-about-rag-is-a-lie/

IEEE

" » All you know about RAG is a lie." Andreea Miclaus | Sciencx [Online]. Available: https://www.scien.cx/2025/01/29/all-you-know-about-rag-is-a-lie/. [Accessed: ]

rf:citation

» All you know about RAG is a lie | Andreea Miclaus | Sciencx | https://www.scien.cx/2025/01/29/all-you-know-about-rag-is-a-lie/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Project #1: Let’s "LangChain" everything

Project #2: The "It can’t be that hard" prototype → no frameworks, 100% control over data

Related Posts