RAG Knowledge Base

insights What you're gaining

What you're looking at

Same question — "what defaults should I use for RagPipeline?" — answered with vs without retrieval. Left: SwarmAI — RagPipeline ingests the project's RAG_LESSONS.md corpus and synthesises a cited answer with the eval-winning defaults baked in. Right: raw LLM — same model, same prompt, no documents, no retrieval.

On GPT-4o

Most useful

SwarmAI

Concrete defaults (CITED)

3 citations to RAG_LESSONS.md, every value justified

Chunk size 800 — peer-aligned (500 too small, 1200 lost recall)
Top-K 5 — K=10 added prompt tokens without proportional recall
Hybrid (vector + BM25 → RRF) — disabling tanked chunk-hit 14 % → 8 %
Five-bullet plain-English prompt — refusals 36 % → 12 %
~6.4s wall time, 3 citations

Baseline

Generic advice

Admits it can't see the documents, falls back to general best-practices

"I don't have access to specific documents like the IntelliDoc 2026-04-26 evaluation"
Generic "use a chunk size around N", "consider hybrid retrieval"
No reference to this codebase's defaults at all
~9.8s wall time, 0 citations

On GPT-5.4 mini

Most useful

SwarmAI

Same answer, faster (CITED)

Smaller model, same RAG-grounded answer in 4.7s

Chunk size 800, top-K 5, simple prompt — exactly what the corpus says
3 citations to RAG_LESSONS.md
~4.7s wall time — 27 % faster than GPT-4o swarm path

Baseline

Hallucinates a number

Invents "chunk size ≈ 500 tokens" — wrong, the eval picked 800

States "Chunk size: ~500 tokens" — fabricated, the project's actual default is 800
Recommends "grounded, citation-first prompt" — vague, no template
Confident tone, but every specific value is the model's prior, not the corpus
~5.8s wall time, 0 citations

What changed under the hood

RagPipeline is the new high-level RAG facade in swarmai-core (1.0.10). It bakes in the configuration that won the IntelliDoc 2026-04-26 evaluation — 7 iterations × 225 document-grounded questions × 3 platforms (IntelliDoc, LangGraph-Python, LangChain4j-Java). The defaults that produced the best chunk-hit + faithfulness + latency tradeoff:

field	default	reason
chunk size	800	peer-aligned (500 too small, 1200 lost recall)
chunk overlap	100	enough to span formula/sentence boundaries
top-K	5	K=10 added prompt tokens without proportional recall
hybrid retrieval	on	BM25 + vector → RRF; disabling tanked chunk-hit 14 % → 8 %
MMR rerank	off	over-spreads results away from the right doc
temperature	0.2	0.1 over-refused, 0.3 paraphrased formulas
num predict	350	30 % latency saving without hurting completeness
synthesis prompt	5 plain-English bullets	dropped refusals 36 % → 12 % vs 6-rule prompt

Builder is one line:

RagPipeline rag = RagPipeline.builder()
        .vectorStore(vectorStore)
        .chatClient(chatClient)
        .config(RagConfig.defaults())
        .build();

rag.ingestText("RAG_LESSONS.md", Files.readString(Paths.get("RAG_LESSONS.md")));
RagAnswer a = rag.query("What are the recommended defaults?");
// a.answer() → cited reply
// a.citations() → [Citation(RAG_LESSONS.md, 0, ...), ...]

Reproduce

cd swarm-ai-examples
./demo-recorder/record-rag-demo.sh gpt-4o gpt-5.4-mini
# requires OPENAI_API_KEY in .env, Ollama running with nomic-embed-text

Outputs land in demos/rag-knowledge-base/runs/<model>/<framework-version>/ and sync to the website under intelliswarm.ai/website/src/assets/demos/.