Introduction
Retrieval Augmented Generation (RAG) is a technique that uses a large language model to generate responses using a combination of a large language model and a vector database.
In this technique, the data (Knowledge) is embedded into a vector database and then the query is embedded and matched with the vector database to retrieve the most relevant data. This data is then used to generate the response using a large language model.
FloTorch allows you to create RAG Endpoints without any additional configuration, with out any code changes to your existing codebase.
With FloTorch RAG Endpoints, you can:
- Generate responses using your own data
- Apply Guardrails to the requests and responses
- Use mulitple LLM providers with no code changes
- Manage and publish prompt templates and partials on the fly
- Manage and publish System Prompts on the fly without any downtime