Skip to content

Introduction

Retrieval Augmented Generation (RAG) is a technique that uses a large language model to generate responses using a combination of a large language model and a vector database.

In this technique, the data (Knowledge) is embedded into a vector database and then the query is embedded and matched with the vector database to retrieve the most relevant data. This data is then used to generate the response using a large language model.

FloTorch allows you to create RAG Endpoints without any additional configuration, with out any code changes to your existing codebase.

With FloTorch RAG Endpoints, you can:

  • Generate responses using your own data
  • Apply Guardrails to the requests and responses
  • Use mulitple LLM providers with no code changes
  • Manage and publish prompt templates and partials on the fly
  • Manage and publish System Prompts on the fly without any downtime