Simplify RAG with Snowflake Cortex. Deploy in 5 mins via rlama wizard

Simplify RAG with Snowflake Cortex. Deploy in 5 mins via rlama wizard

Creating a Snowflake Cortex RAG System in Minutes with RLAMA

One of Rlama-pro's most powerful features is its ability to create Retrieval-Augmented Generation (RAG) systems powered by Snowflake Cortex Search. In this guide, we'll walk through using RLAMA's interactive wizard to set up a complete Cortex RAG system in just a few minutes—without writing a single line of code. For a visual demonstration, check out the demo here.

The RLAMA Snowflake Wizard

RLAMA's Snowflake wizard automates the entire process of creating a RAG system with Snowflake Cortex. It guides you through:

  1. Setting up a Snowflake stage for document storage
  2. Uploading and processing documents
  3. Creating the necessary database structures
  4. Configuring Cortex Search Service
  5. Setting up the RAG with your choice of inference model

Let's walk through the process step by step.

Step-by-Step Setup Process

1. Create a Snowflake Profile

First, you need to set up a Snowflake profile with your account credentials:

rlama snowflake setup my-snowflake --account=myaccount --user=myuser --warehouse=compute_wh

This command creates a profile that securely stores your Snowflake connection details.

Usage:

Parameters:

  • profile-name: Name to identify your Snowflake profile

Options:

  • --account: Your Snowflake account identifier
  • --user: Username for Snowflake authentication
  • --warehouse: Default warehouse to use

Once executed, RLAMA will prompt you for your password and store your credentials securely for future commands.

2. Launch the Wizard

Next, launch the interactive wizard:

rlama snowflake wizard my-snowflake

This command starts the wizard, guiding you through the entire process:

🧙 Welcome to the RLAMA Snowflake Cortex Wizard! 🧙

Snowflake profile: my-snowflake

Step 1: Select database and schema
Enter database name: RLAMA_AI
Enter schema name [PUBLIC]: 

3. Create Text Chunker Function

The wizard will help you create a text chunker function in Snowflake:

Step 2: Create text chunker function
Create chunker function? (y/n) [y]: 
Enter chunk size [1000]: 
Enter chunk overlap [200]: 

This creates a Python UDF in Snowflake based on LangChain's RecursiveCharacterTextSplitter that will efficiently chunk your documents.

4. Set Up Document Storage

Next, the wizard helps you create a stage for document storage:

Step 3: Create storage stage
Enter stage name [rlama_docs]: rlama_demo

Once created, you'll have options for uploading documents:

Step 4: Upload documents to stage
Choose how you want to upload documents:
1. I'll upload them manually (using SnowSQL or Web UI)
2. Upload documents from my local folder
Enter your choice (1/2) [1]: 

If you choose manual upload, the wizard provides detailed instructions:

Please upload your documents to stage @rlama_demo using one of these methods:
1. Using SnowSQL CLI:
   PUT file:///path/to/local/files/*.pdf @RLAMA_AI.PUBLIC.rlama_demo;
2. Using Snowflake Web UI:
   Navigate to RLAMA_AI.PUBLIC database and schema, then use the 'Stages' tab to upload files to @rlama_demo

5. Create the Documents Table

The wizard generates a SQL script to create and populate a table for your documents:

Step 5: Create documents table
Enter table name [DOCUMENTS]: TB_AI_RAG

The generated SQL will:

  • Create a table for document storage
  • Extract text from your files using Snowflake's PARSE_DOCUMENT function
  • Apply chunking to prepare your data for embedding

You can choose to execute this SQL automatically or copy it to run manually.

6. Create the Cortex Search Service

After your documents are processed, the wizard helps you set up a Cortex Search Service:

Step 6: Create Cortex Search Service
Enter service name: rlama_docs_ai
Enter warehouse name [COMPUTE_WH]: 

Executing: rlama snowflake create-service my-snowflake rlama_docs_ai --database=RLAMA_AI --schema=PUBLIC --warehouse=COMPUTE_WH --table=TB_AI_RAG --columns=CHUNK,TITLE

The wizard executes the necessary commands to create the service with optimal settings.

7. Create the Cortex RAG System

Finally, the wizard helps you create the actual RAG system:

Step 7: Create Cortex RAG
Enter RAG name: rag_demo
Use Ollama instead of Cortex Complete? (y/n) [y]: 
Enter Ollama model name [llama3]: gemma3:12b

Executing: rlama snowflake create-cortex-rag my-snowflake rag_demo gemma3:12b --service=rlama_docs_ai --skip-service-validation

You can choose between:

  • Using Ollama for local inference (more cost-effective)
  • Using Cortex Complete for cloud-based inference (higher quality, higher cost)

The wizard completes the process:

Creating Cortex RAG 'rag_demo'...
Cortex service 'rlama_docs_ai' found and validated.
Cortex RAG created successfully!
You can now use it with: rlama cortex run rag_demo

🎉 Cortex RAG setup complete! 🎉

Using Your Cortex RAG

Once setup is complete, you can immediately start using your RAG system:

rlama cortex run rag_demo

This launches an interactive query interface:

❄️  SNOWFLAKE CORTEX RAG ASSISTANT  ❄️

• RAG: rag_demo
• Service: rlama_docs_ai
• Engine: Ollama (gemma3:12b)
• Search Limit: 10 results

Enter your questions (type 'exit' to quit):
> how can I create a RAG with RLAMA?

The system will:

  1. Search your documents using Cortex Search Service
  2. Generate a contextualized response using your chosen model
  3. Provide a complete answer based on your documents

Cost Optimization with the Hybrid Approach

One of the key advantages of RLAMA's Snowflake integration is the ability to use a hybrid approach:

  1. Use Snowflake Cortex Search for vector search and embedding
  2. Use local Ollama models for inference

This approach can significantly reduce costs compared to using Cortex Complete for inference, while still leveraging Snowflake's powerful search capabilities.

For a typical 1GB dataset with 10,000 monthly queries:

  • Snowflake Cortex Search costs: 6.3 credits/month ($19-25)
  • Inference options:
    • Claude 3.5 via Cortex Complete: 25.5 additional credits/month ($76-102)
    • Mixtral 8x7B via Cortex Complete: 2.2 additional credits/month ($6.6-8.8)
    • Local Ollama inference: $0 additional Snowflake costs

By choosing local inference with Ollama, you can reduce your Snowflake costs by up to 80% compared to using Claude 3.5, making enterprise-grade RAG accessible even with limited budgets.

Conclusion

RLAMA's Snowflake wizard makes it incredibly easy to create powerful RAG systems powered by Snowflake Cortex. By automating the complex setup process and offering cost-effective hybrid architectures, RLAMA democratizes access to enterprise-grade AI capabilities.

Try it today and see how quickly you can go from raw documents to an intelligent question-answering system!

Ready to streamline your document question-answering?