Introduction

Retrieval Augmented Generation (RAG) is a technique that uses a large language model to generate responses using a combination of a large language model and a vector database.

In this technique, the data (Knowledge) is embedded into a vector database and then the query is embedded and matched with the vector database to retrieve the most relevant data. This data is then used to generate the response using a large language model.

FloTorch allows you to create RAG Endpoints without any additional configuration, with out any code changes to your existing codebase.

With FloTorch RAG Endpoints, you can:

Generate responses using your own data
Apply Guardrails to the requests and responses
Use mulitple LLM providers with no code changes
Manage and publish prompt templates and partials on the fly
Manage and publish System Prompts on the fly without any downtime