Cited answer pinned to the corpus vs a confident summary that can't reproduce a single specific default value.
What are the recommended defaults for chunk size, top-K, and the synthesis prompt when building a RAG pipeline with `RagPipeline`, and which lessons from the IntelliDoc 2026-04-26 evaluation justify those defaults?
SwarmAI workflow · RAG_PIPELINE
Raw GPT-4o — no tools, no memory
Same question — "what defaults should I use for RagPipeline?" — answered with vs without retrieval.
Left: SwarmAI — RagPipeline ingests the project's RAG_LESSONS.md corpus and synthesises a cited answer with the eval-winning defaults baked in.
Right: raw LLM — same model, same prompt, no documents, no retrieval.
3 citations to RAG_LESSONS.md, every value justified
Admits it can't see the documents, falls back to general best-practices
Smaller model, same RAG-grounded answer in 4.7s
RAG_LESSONS.mdInvents "chunk size ≈ 500 tokens" — wrong, the eval picked 800
RagPipeline is the new high-level RAG facade in swarmai-core (1.0.10). It bakes in the configuration that won the
IntelliDoc 2026-04-26 evaluation — 7 iterations × 225 document-grounded questions × 3 platforms (IntelliDoc,
LangGraph-Python, LangChain4j-Java). The defaults that produced the best chunk-hit + faithfulness + latency tradeoff:
| field | default | reason |
|---|---|---|
| chunk size | 800 | peer-aligned (500 too small, 1200 lost recall) |
| chunk overlap | 100 | enough to span formula/sentence boundaries |
| top-K | 5 | K=10 added prompt tokens without proportional recall |
| hybrid retrieval | on | BM25 + vector → RRF; disabling tanked chunk-hit 14 % → 8 % |
| MMR rerank | off | over-spreads results away from the right doc |
| temperature | 0.2 | 0.1 over-refused, 0.3 paraphrased formulas |
| num predict | 350 | 30 % latency saving without hurting completeness |
| synthesis prompt | 5 plain-English bullets | dropped refusals 36 % → 12 % vs 6-rule prompt |
Builder is one line:
RagPipeline rag = RagPipeline.builder()
.vectorStore(vectorStore)
.chatClient(chatClient)
.config(RagConfig.defaults())
.build();
rag.ingestText("RAG_LESSONS.md", Files.readString(Paths.get("RAG_LESSONS.md")));
RagAnswer a = rag.query("What are the recommended defaults?");
// a.answer() → cited reply
// a.citations() → [Citation(RAG_LESSONS.md, 0, ...), ...]
cd swarm-ai-examples
./demo-recorder/record-rag-demo.sh gpt-4o gpt-5.4-mini
# requires OPENAI_API_KEY in .env, Ollama running with nomic-embed-text
Outputs land in demos/rag-knowledge-base/runs/<model>/<framework-version>/ and sync to the website
under intelliswarm.ai/website/src/assets/demos/.
# RagPipeline isn't a SwarmGraph workflow — it's the high-level facade in
# swarmai-core. The "workflow" here is just the builder call + ingest + query.
rag:
pipeline: RagPipeline
config:
chunkSize: 800
chunkOverlap: 100
topK: 5
maxPassageChars: 2400
hybridRetrieval: true # vector + BM25 → RRF
contextualPrefix: true # "[filename] " prepended before embedding
mmrRerank: false # diversity rerank disabled (over-spreads)
temperature: 0.2
numPredict: 350
prompt: RagPrompts.SYSTEM # five-bullet plain-English prompt (langchain-style)
retriever: HybridRetriever # vector + BM25 fused via RRF (K=60)
splitter: RecursiveCharSplitter
citationFormat: "[source: <filename> #<chunk-index>]"
Reproducible — model version, temperature, seed, framework git SHA, and hashes of prompt + workflow are embedded in every trace. Re-run to diff against this recording.