Feedback-Based RAG: Self-Improving Retrieval with User Signals

Use explicit and implicit user feedback to rerank, retrain, and continuously improve RAG quality. Expect steady NDCG gains month over month.

TL;DR

Collect explicit (ratings) and implicit (clicks, dwell) signals, convert them to rewards, rerank in real time, and batch‑retrain weekly for durable gains.

What is Feedback-Based RAG?

A RAG system that learns from usage. User feedback updates retrieval scores/rerankers online and via periodic retraining.

Feedback Signals

Explicit: thumbs, stars, helpful
Implicit: clicks, dwell >30s, copy/share
Negative: immediate back/ refine query
Combined score: 0.6×explicit + 0.4×implicit

Update Mechanisms

Online: EMA update of retrieval scores (fast, noisy)
Offline: daily aggregation + weekly reranker retrain
Cadence: A/B test and roll back if metrics regress

Data Requirements

Log: user_id, query, doc_ids, scores, rating, timestamp
Thresholds: ≥100 feedback samples before retraining
Temporal decay: weight recent feedback higher (exp. decay)

Retrieval Configuration

Base: BM25 + Dense (e.g., Contriever)
Feedback layer: fine‑tuned reranker on feedback data
Cold start: content similarity until ≥50 ratings
Ensemble: 0.5×base + 0.3×feedback + 0.2×recency

Environment Variables

FEEDBACK_DB_URL=postgresql://… | MIN_FEEDBACK_COUNT=50 | DECAY_RATE=0.95 | UPDATE_SCHEDULE="0 2 * * *"

Guardrails

Outlier detection and velocity limits
Honeypots to catch bots; rate limiting
Temporal decay so recent feedback counts more

Cost Model (100K monthly queries)

Storage: ~100MB/month ≈ $0.01
Daily batch: ≈ $3/month
Weekly reranker retrain: ≈ $20/month
Total: ≈ $23/month

Benchmarks (Illustrative)

NDCG@10: +8–16% over 4–8 weeks
Coverage: % results with ratings ↑
Conversion metrics (positive action) ↑

NDCG Weekly Deltas (Example)

Week 0: 0.41
Week 1: 0.45 (+0.04)
Week 2: 0.48 (+0.03)
Week 4: 0.51 (+0.03)

Security & Privacy

Strip PII; store aggregates where possible
GDPR: support delete requests
Audit logs: retain original ratings separate from aggregates

Ablation (Illustrative)

Explicit only: +8.2% (low noise)
Implicit only: +5.1% (med noise)
Combined: +12.4% (med noise)
+ Temporal decay: +14.7% (low noise)

Minimal n8n Workflow

Webhook → DB insert (feedback) → Daily aggregate → Weekly retrain reranker → Update vector metadata → A/B test.

Real‑World Fits

E‑commerce search: rank by purchases/engagement
Docs/support: surface articles that resolve issues
Academic: rank by saves/citations