# Best Practices — Top 10 (Semantic Verifier rubric)

**Status:** draft for review · **Last updated:** 2026-04-30

## Purpose

This is the curated semantic-review rubric used by the `cortex-analyze` skill. Each rule is high-leverage and judgment-based (not a deterministic check), and applicable across most agents.

**Why a curated 10:** Loading the full `AGENTS.md` pattern library into the prompt for every review is wasteful. These are the 10 practices most worth semantic enforcement; the deeper pattern set stays in `knowledge/AGENTS.md` for human-triggered review.

## Output contract

For each rule, the verifier returns one of:

- `PASS` — rule is followed; no action needed.
- `FAIL` — rule is violated; cite the specific evidence (file + quote).
- `N/A` — rule doesn't apply to this agent (e.g., template-specific rule on a different template, or conditional sub-criterion that doesn't apply).

Plus a short rationale (≤2 sentences) per rule, and the rule's **severity** so the daily report can prioritize.

## Severity bands

| Severity | Rules | Meaning |
|---|---|---|
| **CRITICAL** | BP-4, BP-5, BP-7 | Causes broken behavior in production (contradictions, unguarded actions). Must fix before next deploy. |
| **HIGH** | BP-1, BP-3, BP-8 | Structural drift that degrades quality over time (template misfit, examples diverging from flow, stale phrasing). Fix in next iteration. |
| **MEDIUM** | BP-2, BP-6, BP-9, BP-10 | Quality / consistency findings that don't break behavior but limit ceiling. Fix when iterating on the agent for other reasons. |

---

## The 10 rules

### BP-1 — Template compliance

**Severity:** HIGH

**Rule:** Every section required by the agent's declared template (per `meta.md`) is present and matches the template's specification. For `principles` template agents, that means `system.md` contains: Identity, Voice, Flow, Calibration, Principles, Tools, NO_RESPONSE.

**Pass criterion:** All required sections from the matching `templates/*.md` exist in `system.md` (or the SDK file the template specifies); content fits the template's intent.

**Fail evidence to cite:** missing section name, or a section that materially diverges from the template's description.

**Source:** `cortex-analyze` R2.

### BP-2 — Example coverage and cap (7 base scenarios + conditional, ≤20 total)

**Severity:** MEDIUM

**Rule:** `examples.md` covers the 7 canonical scenarios — happy path, resource/link delivery, objection handling, graceful disqualification, emotional/sensitive lead, re-engagement, off-topic question — plus two conditional scenarios when applicable to the agent. Total example count must stay at or under 20.

**Conditional scenarios:**
- **Keyword trigger** — required if `keywords.json` is non-empty. At least one example must show the agent receiving a keyword and delivering the keyword's resource (then resuming flow).
- **Price objection** — required if `program.json` declares a price OR the agent's vertical involves selling something. At least one example must show the lead pushing back on price and the agent's correct handling (per the agent's own pricing rule — give it, withhold it, or redirect).

**Cap rationale:** every example consumes tokens, narrows generalization, and adds maintenance load. Three near-duplicate examples teach what one good example teaches at 3× the cost. Iteration goal: end with fewer examples, not more. When coverage is satisfied AND count is over 20, prefer merging or retiring stale examples over adding new ones.

**Pass criterion:** ≥6 of 7 base scenarios represented, every applicable conditional scenario has at least one example, AND total examples ≤20.

**Fail evidence:** list the missing base scenario(s); if a conditional is missing, cite "agent has keywords but no keyword-trigger example" or "agent has a price but no price-objection example"; if over the cap, cite the example count and suggest merge/retire candidates.

**Source:** `templates/principles.md` coverage checklist; `cortex-analyze` R3a; memory `feedback_max_20_examples`.

### BP-3 — Examples align with the flow

**Severity:** HIGH

**Rule:** Every agent turn in `examples.md` maps to a step in `system.md`'s flow, in order, and no example ends mid-flow without a terminal state (link sent, fallback resource, NO_RESPONSE, or objection handled with CTA).

**Pass criterion:** Each example walks the flow without skipping required steps and reaches a terminal state.

**Fail evidence:** quote the example turn that violates flow order or the example that ends incomplete.

**Source:** `cortex-analyze` R3d.

### BP-4 — No internal contradictions in `system.md`

**Severity:** CRITICAL

**Rule:** `system.md` does not contradict itself. No "always X / never X" pairs, no conflicting parameter values (e.g., max sentence count stated differently in two places), no flow steps that contradict format rules.

**What does NOT count as a contradiction:**
- Stated exceptions: a rule of the form "X, except when Y" (or "X, with Y as the sole exception") is a single rule with an explicit carve-out — not a contradiction. Example: "max 35 words per response, except in the Presentación step" is internally consistent.
- Rules that apply to different preconditions: "do A when X" and "do B when not-X" coexist. Only flag when two passages produce conflicting outputs *under the same precondition*.
- A rule plus a worked example that demonstrates the rule and any neighboring rule operating together correctly. The example is a tiebreaker.

**Pass criterion:** Single source of truth for every behavioral parameter under any given precondition.

**Fail evidence:** quote the two passages that disagree AND name the shared precondition under which they both fire and produce different outputs.

**Source:** `cortex-analyze` R4b; AGENTS.md PAT-001.

### BP-5 — No cross-file contradictions

**Severity:** CRITICAL

**Rule:** All SDK files agree on the agent's behavior, parameters, and data. The set under inspection: `system.md`, `examples.md`, `objections.md`, `keywords.json`, `program.json`, `resources.json`, `case_studies.json`, `personal_story.md`.

**What counts as a contradiction:**
- Examples or objection scripts demonstrate behavior that `system.md` forbids
- A keyword's `literal_response` uses CTA language that contradicts the flow's CTA in `system.md`
- `program.json` declares a price that disagrees with prices used in `examples.md` / `objections.md`
- `resources.json` URLs disagree with URLs used inline in `system.md` / `examples.md` (different resource for the same key)
- `case_studies.json` cites results or details inconsistent with claims in `system.md` / `personal_story.md`
- Two files give different answers to the same factual question (price, location, eligibility window, program length)

**What does NOT count as a contradiction:**
- Rules in different files that apply to different turns, different conditions, or different steps in the flow — they coexist. Example: a keyword's T1 instruction says "ask permission" and `resources.json` says "deliver with context"; if the example shows permission in T1 and contextual delivery in T3, both rules are honored.
- Before flagging, identify the precondition each rule applies to. If they govern different parts of the flow (different turn, different lead type, different keyword), they are compatible.
- A worked example in `examples.md` showing both rules operating together is strong evidence of compatibility — use it as a tiebreaker before flagging.

**Pass criterion:** No file demonstrates or declares behavior/data that contradicts another file *in the same precondition*.

**Fail evidence:** name both files, quote the two passages that disagree, AND name the shared precondition (same turn, same condition, same step) under which they both fire.

**Source:** AGENTS.md PAT-001; `cortex-analyze` R4b. **Note:** style/voice repetition between `system.md` and `examples.md` is *expected* and not a violation — see `feedback_instruction_vs_style_duplication`. Only contradictions count.

### BP-6 — Voice consistency across examples

**Severity:** MEDIUM

**Rule:** Every example in `examples.md` uses the same persona, vocabulary, formality level, and dialect. No jarring tonal shifts (e.g., one example cold and clinical, another warm and informal).

**Pass criterion:** A reader could believe all examples are the same agent talking to different leads.

**Fail evidence:** quote the example whose voice diverges + describe how.

**Source:** AGENTS.md §2 Voz; PAT-015 dialect audit.

### BP-7 — Unguarded actions (PAT-004)

**Severity:** CRITICAL (only if guard is missing entirely; see below)

**Rule:** Every action instruction in `system.md`'s flow that performs something governed by a hard rule includes an inline guard clause naming the constraint.

**Severity calibration:**
- **CRITICAL** — the action has NO stated condition at all. The agent could fire it any time.
- **MEDIUM (downgrade)** — the action has a guard, but it is generic ("solo si respondió afirmativamente") rather than enumerating signals ("solo si dijo sí / dale / quiero / cuándo"). Worth tightening, but not catastrophic when a worked example demonstrates the guard firing correctly.
- **PASS (downgrade)** — guard exists AND an example demonstrates it working as intended. Generic phrasing alone is not a violation.

**Pass criterion:** Risky actions (sending links, qualifying, booking) are paired with a guard, and at least one example demonstrates the guard's intended behavior.

**Fail evidence:** quote the unguarded action and name the constraint it could violate. If a guard exists, do not flag CRITICAL — choose MEDIUM only when ambiguity is real.

**Source:** AGENTS.md PAT-004.

### BP-8 — No stale version language in examples

**Severity:** HIGH

**Rule:** Examples don't use phrases or CTAs from a prior flow definition. After a flow rewrite, examples must be updated to match current language.

**Pass criterion:** Closing/CTA language in every example matches the current flow definition in `system.md`.

**Fail evidence:** quote the stale phrase + the current phrase it should be.

**Source:** `cortex-analyze` R3d (version consistency).

### BP-9 — No duplicated instructions

**Severity:** MEDIUM

**Rule:** A behavioral instruction should appear in exactly one place. The same rule restated in two locations is waste at best and divergence-over-time risk at worst — one copy gets updated, the other doesn't, and a real contradiction (BP-4 / BP-5 territory) appears later.

**What counts as duplication:**
- The same instruction stated in two Principles, or in a Principle and a Flow step, or in a Flow step and a NO_RESPONSE bullet.
- The same instruction in `system.md` *and* a keyword's `notes` field with similar prose. Pick one canonical home; the other gets a one-line cross-reference.
- The same hard rule restated in two example preambles, or in an example preamble and a section header.

**What does NOT count as duplication:**
- **Style/voice repetition between `system.md` and `examples.md`** — examples *show* the voice the Voz section *describes*. That's the whole point. Style duplication is correct, not a violation.
- **A Principle reinforced by a worked example** — the example demonstrates the principle in action; the principle states the rule. Two angles on the same idea, not two copies.
- **Cross-references** — a one-line "see Step 6" pointer in keyword `notes` is a pointer, not a duplicate.

**Pass criterion:** Each instruction has a single canonical location. Cross-references (no copies) elsewhere.

**Fail evidence:** quote both copies AND propose which keeps it. Default precedence (most → least canonical): Principle → Flow step → Calibration row → keyword `notes` → NO_RESPONSE bullet → example preamble.

**Source:** memory `feedback_instruction_vs_style_duplication`; AGENTS.md §1.2 (single source of truth).

### BP-10 — Rule-vs-principle balance

**Severity:** MEDIUM

**Rule:** Behavioral guidance (tone, flow, conversational style) should be expressed as principles + examples. Hard rules should remain only for factual constraints, safety boundaries, and compliance constraints — things not demonstrable through conversation.

**Pass criterion:** Rules that govern *behavior* (≤25 words per message, "warm tone", "never lecture") are expressed via examples or principles. Rules that govern *facts* ("price is $X", "only book Tue/Thu") remain as rules.

**Fail evidence:** quote a rule that prescribes behavior and could be replaced with an example showing it.

**Source:** AGENTS.md §1.2 (Examples > Rules > Hard Rules), PAT-001; memory `feedback_principles_over_rules`.

---

## Explicitly NOT in scope

These are intentionally excluded — they were considered and rejected because they create noise without proportionate signal:

- **"Every keyword has an example"** — too much context to demonstrate every keyword; trigger system handles them stateless.
- **"Every resource is referenced"** — many resources are accessed via tool calls, never appearing as literal URLs in `system.md`/`examples.md`.
- **Voice/tone "judgment calls" without evidence** — verifier must cite specific quotes, not opinions.
- **Anything already deterministic** (em-dash, sizes, principles count, STEP 0, etc.) — those run in Stage 1 for free.

## Maintenance

- Add a rule only if it surfaces issues that recurrent in the fleet AND can't be made deterministic.
- Remove a rule if it's repeatedly N/A across the fleet OR consistently produces false positives during operator review.
- Aim to keep this list at 10 ± 2. If it grows past 12, demote the least-cited rule.
