Docs-only guardrails when RAG retrieval finds nothing


When Astra Docs Chat cannot find relevant chunks in Astra DB, the model may still answer from general knowledge: confident, plausible, and sometimes wrong for Astra-specific APIs. The parent post flagged tightening this with a stricter “docs only” prompt. This post covers practical guardrails at the prompt, retrieval, flow, and UX layers.

Context: Building Astra Docs Chat

Related: Langflow chat flow · Streaming chat UI

Status: planned for v2; not deployed in current production chat.

Try v1: Astra Docs Chat


Typical failure modes:

  1. Empty retrieval: question outside the doc corpus (e.g. “What is Kubernetes?”)
  2. Weak retrieval: top-k chunks are tangential; model fills gaps from training data
  3. Stale corpus: docs changed; vectors lag; model improvises (re-ingest post )

v1 accepts (1) and (2) as a trade-off for simplicity. Guardrails reduce harm without requiring perfect retrieval.

The chat flow’s Prompt component uses a permissive template: “Given the context above, answer the question as best as possible.” When {context} is empty or irrelevant, “as best as possible” invites general knowledge.


In the Langflow Prompt component, replace or augment instructions:

You answer questions about DataStax Astra DB Serverless using ONLY the context below.

If the context is empty or does not contain enough information to answer, respond with exactly:
"I couldn't find this in the Astra DB Serverless documentation I have indexed."

Do not guess API names, limits, default values, or URLs.
Do not answer from general knowledge about databases or other products.

Short and repetitive beats clever: models drift under streaming pressure.

Test in Langflow Playground with {context} manually cleared before you trust production behaviour.


The chat AstraDB component defaults to similarity search, 4 results, score threshold 0 in v1. That means weak matches still reach the prompt.

If your Langflow / AstraDB component exposes scores or distance:

  • Log scores for 20 known-good questions and 20 off-topic questions
  • Define a minimum similarity threshold
  • Branch in the flow: below threshold → skip LLM, return fixed refusal string (Template or ChatOutput)

This avoids paying for an LLM call that will hallucinate anyway.

Exact thresholds are corpus-specific. Questions about exact CLI flags may need lower thresholds than conceptual “what is a collection?” questions.

Hybrid search (Astra vector store post ) can help symbol-heavy queries before you tune thresholds.


Prefer flow-level refusal over proxy hacks:

AstraDB search
  → Condition: context length > N OR top score > T
       true  → Prompt → DeepSeek → ChatOutput
       false → Template ("I couldn't find this...") → ChatOutput

Langflow’s conditional routing varies by version; the idea is do not call the LLM when there is nothing to ground on.

Optional belt-and-braces: if Langflow exposes retrieval metadata in the end event, the Pages Function could replace the body when chunk count is zero. That couples you to response shape; prefer flow-level refusal when possible.


Refusal should look intentional, not broken:

I couldn’t find this in the indexed Astra DB Serverless docs. Try rephrasing, or search official documentation .

Better than a generic error bubble or a confident wrong API example. That copy renders in the Hugo chat bubbles described in building a streaming chat UI .

In astra-chat.js, refusals can use the normal assistant bubble styling (not astra-chat-bubble--error) so users do not think the service failed.


Input Expected
“How do I create a collection?” Normal grounded answer
“Explain quantum chromodynamics” Refusal
“Astra DB vs Cassandra on Mars” Refusal or narrow “not in docs”
Misspelled but valid topic (“PCU grup”) Answer or ask to rephrase: document behaviour
Question about a feature added after last ingest Refusal or stale partial answer: note re-ingest gap

Log failures during beta; adjust prompt before threshold tuning. Prompt changes redeploy only on Langflow, not Hugo.


Guardrails without source links still help. If you add citations later, implement refusal before the model can invent URLs.

Order of operations I would use:

  1. Prompt refusal (cheap)
  2. Retrieval threshold / empty-context branch (reliable)
  3. Structured citations (trust)
  4. Re-ingest automation (freshness)

When shipped:

  • Langflow flow update only for layers 1-3 (no Hugo redeploy required if refusal text comes through the normal stream)
  • Optional copy tweak in welcome message: “Answers use indexed docs only”
  • Update parent post link from “sensible next step” to “how I added docs-only guardrails”

Proxy and UI stay the same unless you add a distinct refusal SSE event type (usually unnecessary).


Series index: Building Astra Docs Chat

Open Astra Docs Chat and try an off-topic question in v1: notice when the answer sounds general. That is what this post is meant to fix.

×
Page views: