RAG Configuration API Guide

Overview

RAG (Retrieval-Augmented Generation) enhances teacher feedback evaluation quality by providing the AI with relevant reference examples during evaluation. This guide explains how to configure and use RAG via the Assessment API.

What is RAG?

RAG finds similar high-quality feedback examples from your reference library and provides them as context to the AI evaluator. This results in:

+18-28% accuracy improvement in evaluation consistency
More specific feedback based on your quality standards
Context-aware scoring that learns from your examples
Actionable recommendations aligned with your rubric

Prerequisites

Before using RAG via API, you need:

User Requirements

Valid account on the platform
Organization membership with one of these roles:
Admin - Full organization access
Examiner - Can create evaluations and manage assessments
Super Admin - Platform-wide access
Active membership status
API authentication token - See Authentication Guide for details

Note: If you don't have the required membership, contact your organization administrator.

RAG Setup Requirements

Create Reference Feedbacks in the web interface at /assessment/teacher-evaluation/references/
Link to Rubric - Reference feedbacks must be associated with your evaluation rubric
Quality Examples - Include 10-20 examples covering different quality levels
Vector Embeddings - System automatically generates embeddings when you create references

RAG API Parameters

When creating a batch evaluation, include these RAG parameters:

{
  "name": "My Batch",
  "rubric_id": 1,
  "organization_id": 1,
  "feedback_ids": [101, 102, 103],

  "use_reference_matching": true,
  "reference_matching_mode": "dynamic",
  "reference_detail_level": "detailed",
  "max_references": 5,
  "evaluation_instructions": "Additional AI context..."
}

Parameter Reference

Parameter	Type	Default	Description
`use_reference_matching`	boolean	`false`	Enable/disable RAG
`reference_matching_mode`	string	`"dynamic"`	Matching strategy
`reference_detail_level`	string	`"summary"`	Analysis depth
`max_references`	integer	`5`	Number of examples (1-10)
`evaluation_instructions`	string	`""`	Additional AI context

Parameter Details

use_reference_matching

Type: boolean Default: false Description: Master switch to enable/disable RAG

{
  "use_reference_matching": true
}

When to Enable: - ✓ You have 10+ reference feedbacks for the rubric - ✓ You want more consistent evaluations - ✓ You need context-aware scoring - ✓ You have established quality standards

When to Disable: - ✗ No reference feedbacks available - ✗ First-time testing/exploration - ✗ Generic evaluation without specific standards

reference_matching_mode

Type: string Values: "static" | "dynamic" Default: "dynamic" Description: How reference examples are selected

Dynamic Mode (Recommended)

{
  "reference_matching_mode": "dynamic"
}

Uses vector similarity search to find most relevant examples
Automatically adapts to each feedback's context
Best for varied feedback types
Recommended for most use cases

How it works: 1. Converts feedback to vector embedding 2. Searches reference library for similar examples 3. Returns top N most similar references 4. AI uses these as evaluation context

Example:

Feedback: "Student needs to improve grammar and clarity"
↓
Vector Search finds similar references:
- "Grammar needs attention" (similarity: 0.92)
- "Writing clarity could improve" (similarity: 0.88)
- "Structure and grammar issues" (similarity: 0.85)

Static Mode

{
  "reference_matching_mode": "static"
}

Uses predefined set of reference examples
Same examples for all feedbacks in batch
Faster but less context-aware
Useful for standardized evaluations

When to use Static: - All feedbacks have similar context - You've curated specific examples for this batch type - Speed is more important than precision

reference_detail_level

Type: string Values: "summary" | "detailed" Default: "summary" Description: How much reference detail the AI receives

Summary Level

{
  "reference_detail_level": "summary"
}

Provides key highlights from references
Faster processing (fewer tokens)
Good for straightforward evaluations
Reduces API costs

Summary includes: - Overall score - Key strengths - Main areas for improvement - Brief recommendations

Detailed Level (Recommended)

{
  "reference_detail_level": "detailed"
}

Provides complete reference feedback
Better evaluation quality
More accurate comparisons
Slightly higher API cost

Detailed includes: - Full criterion-by-criterion feedback - Complete strengths and weaknesses - Detailed recommendations - Consistency notes - Reference comparison insights

Recommendation: Use "detailed" unless processing costs are a concern. The quality improvement outweighs the marginal cost increase.

max_references

Type: integer Range: 1-10 Default: 5 Description: Maximum number of reference examples to use

{
  "max_references": 5
}

Choosing the Right Number:

Value	Use Case	Token Cost	Quality
`1-2`	Quick evaluation, limited references	Low	Basic
`3-5`	Recommended - Balanced approach	Medium	High
`6-8`	Complex evaluations, large reference library	High	Very High
`9-10`	Maximum context (rarely needed)	Very High	Maximum

Guidelines: - Start with 5 - Good balance of quality and cost - Increase to 7-8 if you have 20+ diverse references - Use 1-3 for simple, standardized evaluations - Never exceed 10 - diminishing returns

evaluation_instructions

Type: string Optional Description: Additional context or instructions for the AI evaluator

{
  "evaluation_instructions": "Pay special attention to specific, measurable recommendations for student improvement. Focus on actionability over general praise."
}

Use Cases: - Emphasize specific rubric criteria - Provide batch-specific context - Highlight organizational standards - Guide tone or style preferences

Examples:

// Focus on specificity
{
  "evaluation_instructions": "Prioritize feedback specificity. Score higher for concrete examples and actionable suggestions."
}

// Emphasis on growth mindset
{
  "evaluation_instructions": "Evaluate how well feedback promotes student growth mindset and provides clear paths for improvement."
}

// Subject-specific context
{
  "evaluation_instructions": "For mathematics feedback, assess clarity of problem-solving explanations and step-by-step guidance."
}

Complete Configuration Examples

Example 1: Standard Configuration (Recommended)

curl -X POST https://your-domain.com/api/batches/create/ \
  -H "Authorization: Bearer your-token" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Weekly Teacher Evaluations",
    "rubric_id": 5,
    "organization_id": 1,
    "feedback_ids": [101, 102, 103, 104, 105],

    "use_reference_matching": true,
    "reference_matching_mode": "dynamic",
    "reference_detail_level": "detailed",
    "max_references": 5
  }'

Best for: Most use cases, balanced quality and cost

Example 2: High-Quality Configuration

curl -X POST https://your-domain.com/api/batches/create/ \
  -H "Authorization: Bearer your-token" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Annual Performance Review",
    "rubric_id": 5,
    "organization_id": 1,
    "feedback_ids": [201, 202, 203],

    "use_reference_matching": true,
    "reference_matching_mode": "dynamic",
    "reference_detail_level": "detailed",
    "max_references": 8,
    "evaluation_instructions": "This is a comprehensive annual review. Provide thorough analysis with specific examples and detailed improvement recommendations."
  }'

Best for: Critical evaluations, annual reviews, comprehensive assessments

Example 3: Fast Processing Configuration

curl -X POST https://your-domain.com/api/batches/create/ \
  -H "Authorization: Bearer your-token" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Quick Check Evaluations",
    "rubric_id": 5,
    "organization_id": 1,
    "feedback_ids": [301, 302, 303, 304, 305, 306, 307, 308, 309, 310],

    "use_reference_matching": true,
    "reference_matching_mode": "static",
    "reference_detail_level": "summary",
    "max_references": 3
  }'

Best for: Large batches, quick turnaround, standardized evaluations

Example 4: No RAG (Baseline)

curl -X POST https://your-domain.com/api/batches/create/ \
  -H "Authorization: Bearer your-token" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Baseline Evaluations",
    "rubric_id": 5,
    "organization_id": 1,
    "feedback_ids": [401, 402, 403],

    "use_reference_matching": false
  }'

Best for: Testing, comparison baselines, when no references exist

Understanding RAG Results

When RAG is enabled, evaluation results include reference comparison data:

{
  "batch": {...},
  "results": [
    {
      "feedback": {...},
      "evaluation": {
        "overall_score": 42.5,
        "percentage": 85.0,
        "reference_comparisons": {
          "similar_references_found": 5,
          "top_reference": {
            "id": 45,
            "title": "Excellent Math Feedback Example",
            "quality_level": "excellent",
            "similarity_score": 0.89,
            "score": 48.0
          },
          "comparison_insights": "This feedback shows similar clarity to reference #45 but could benefit from more specific examples as demonstrated in reference #23..."
        },
        ...
      }
    }
  ]
}

Key Fields: - similar_references_found - Number of references used - top_reference - Most similar reference example - similarity_score - How similar (0.0-1.0, higher is more similar) - comparison_insights - AI-generated comparison analysis

Cost Analysis

RAG adds minimal cost to evaluations:

Configuration	Additional Tokens	Cost per Evaluation	Annual Cost (1000 evals)
No RAG	0	$0	$0
RAG (3 refs, summary)	~800	$0.0024	$2.40
RAG (5 refs, detailed)	~1,800	$0.0054	$5.40
RAG (8 refs, detailed)	~2,800	$0.0084	$8.40

Note: These are incremental costs on top of base evaluation costs. RAG adds approximately $0.002-$0.008 per evaluation depending on configuration.

ROI: The quality improvement (+18-28% accuracy) typically far outweighs the marginal cost increase.

Best Practices

1. Reference Library Management

✓ Maintain 10-20 quality references per rubric
✓ Include examples across all performance levels (excellent, good, average, poor)
✓ Update references quarterly based on new exemplars
✓ Tag references with subject area and grade level for better matching

2. Configuration Selection

✓ Start with default settings (dynamic, detailed, 5 references)
✓ A/B test different configurations to optimize for your use case
✓ Use detailed level for important evaluations
✓ Increase max_references if you have 20+ diverse examples

3. Performance Optimization

✓ Batch similar feedbacks together for consistent reference matching
✓ Use static mode for large batches of similar feedbacks
✓ Monitor token usage and adjust max_references if needed
✓ Cache frequently used reference sets

4. Quality Assurance

✓ Review RAG results periodically to ensure quality
✓ Check reference_comparisons to see which references are being matched
✓ Add new references when gaps are identified
✓ Compare RAG vs non-RAG evaluations to measure improvement

Troubleshooting

No References Found

Symptom: similar_references_found: 0 in results Causes: - No reference feedbacks created for this rubric - References not linked to correct rubric - References not activated (is_active=false)

Solution: 1. Go to /assessment/teacher-evaluation/references/ 2. Create 5-10 reference feedbacks 3. Link them to your rubric 4. Ensure is_active=true

Poor Reference Matching

Symptom: similarity_score consistently below 0.5 Causes: - Reference examples don't match feedback context - Limited reference library diversity - Subject/grade level mismatch

Solution: - Add more diverse reference examples - Tag references with subject area and grade level - Create subject-specific reference sets

RAG Not Improving Results

Symptom: Similar scores with or without RAG Causes: - Reference examples not high quality - Too few references (< 5) - Wrong configuration for use case

Solution: - Review and improve reference quality - Increase to 7-8 references - Try detailed level if using summary - Add more specific evaluation instructions

API Response Schema

Batch Creation Response with RAG:

{
  "message": "Batch evaluation created successfully",
  "batch": {
    "id": 123,
    "name": "My Batch",
    "status": "pending",
    "total_feedbacks": 10,
    "created_at": "2025-01-15T10:30:00Z",
    "rag_configuration": {
      "use_reference_matching": true,
      "reference_matching_mode": "dynamic",
      "reference_detail_level": "detailed",
      "max_references": 5,
      "reference_snapshot_date": "2025-01-15T10:30:00Z",
      "reference_snapshot_ids": [45, 46, 47, 48, 49, 50, 51, 52],
      "reference_snapshot_version": "v1.0"
    }
  }
}

Evaluation Result with RAG:

{
  "result_id": 456,
  "feedback_id": 101,
  "overall_score": 42.5,
  "percentage": 85.0,
  "reference_comparisons": {
    "enabled": true,
    "mode": "dynamic",
    "similar_references_found": 5,
    "references_used": [
      {
        "id": 45,
        "title": "Excellent Math Feedback",
        "quality_level": "excellent",
        "similarity_score": 0.89,
        "score": 48.0
      },
      {
        "id": 47,
        "title": "Good Constructive Feedback",
        "quality_level": "good",
        "similarity_score": 0.82,
        "score": 43.5
      }
    ],
    "comparison_insights": "This feedback demonstrates strong clarity similar to reference #45...",
    "average_reference_score": 44.2
  }
}

Workflow Guide: api_core_workflow.md - Complete API workflow including RAG
RAG Implementation: RAG_IMPLEMENTATION_GUIDE.md - Detailed RAG architecture and best practices
Email API: email_api.md - Automatic email delivery after RAG evaluation
API Reference: API_DOCUMENTATION.md - Complete API endpoint documentation

FAQs

Q: How many reference feedbacks do I need? A: Minimum 5, recommended 10-20 for best results. Include examples across all quality levels.

Q: Does RAG work with any rubric? A: Yes, but references must be linked to the specific rubric you're evaluating against.

Q: Can I use RAG without reference feedbacks? A: No. If use_reference_matching=true but no references exist, the system falls back to standard evaluation.

Q: How do I know if RAG is working? A: Check reference_comparisons in evaluation results. similar_references_found > 0 means it's working.

Q: What's the performance impact? A: Adds 10-30 seconds per batch (one-time cost for reference retrieval). Each individual evaluation is unchanged.

Q: Can I update references after creating a batch? A: Yes, but the batch uses a snapshot of references from creation time. New batches will use updated references.

Last Updated: January 2025