cschanhniem

Long-Form Manuscript Ingestion

- Pages deploy is triggered from `main`.

cschanhniem 0 Updated 2mo ago

Resources

20
GitHub

Install

npx skillscat add cschanhniem/cschanhniem-github-io

Install via the SkillsCat registry.

SKILL.md

Long-Form Manuscript Ingestion

This skill documents the house method for turning a long PDF manuscript into site-ready teaching content without losing structure, citations, appendix material, or release discipline.

When To Use It

Use this workflow when a source PDF is long, citation-heavy, or contains tables and diagrams that cannot survive plain OCR.

Canonical Paths

  • Source PDF: repo root or task-specific import path.
  • Extraction script: nhapluu-app/scripts/ingest_scrnguna.py
  • English corpus: nhapluu-app/src/content/teachings/<slug>/en/
  • Vietnamese corpus: nhapluu-app/src/content/teachings/<slug>/vi/
  • Appendix assets: nhapluu-app/public/teachings/<slug>/
  • Site module: nhapluu-app/src/data/teachings/<slug>/

Workflow

  1. Inspect the PDF structure first.
  2. Define section boundaries manually when the table of contents is reliable.
  3. Extract body text and footnotes separately.
  4. Repair OCR only where the errors are patterned and repeatable.
  5. Replace broken appendix OCR with image-backed markdown when layout matters.
  6. Draft or translate Vietnamese chapters carefully, chapter by chapter.
  7. Keep English chapters as the canonical fallback for unfinished Vietnamese sections.
  8. Build a small TypeScript bridge that imports chapter markdown into the site’s teaching model.
  9. Verify build output and inspect the public page.
  10. Publish the frontend by pushing main when the route is stable.

Short-Form Translation Release

Use this lighter branch when the source is a retreat handout, a short essay, or a single translated talk that does not need OCR repair or appendix preservation.

  1. Segment the piece into 3 to 6 markdown chapters if the source has natural shifts.
  2. Store the text under src/content/teachings/<slug>/vi/.
  3. Build a thin manifest bridge in src/data/teachings/<slug>/index.ts.
  4. Register metadata in src/data/teachings/metadata.ts.
  5. Add the slug to the lazy import map in src/pages/TeachingDetail.tsx.
  6. Write a release log in the repo-root tasks/ folder with source note and route target.
  7. Run npm run build and npm run lint before calling the route ready.

Route SEO Release Checks

Use this pass whenever a content release adds or changes a public route. The site now depends on a build-time SEO layer in addition to the client-side metadata hook.

  1. Confirm the route has a unique title, summary, and canonical path.
  2. Confirm the route family is represented in scripts/build-seo-assets.mjs.
  3. Run npm run build so the generator writes static HTML, sitemap, and robots artifacts.
  4. Inspect at least one generated file under dist/<route>/index.html.
  5. Verify the route appears in the appropriate child sitemap referenced by dist/sitemap.xml if it is public.
  6. Verify account or tool surfaces are excluded from the sitemap and carry noindex,nofollow.
  7. Keep the default social image crawler-safe. Use the shared og-default.png unless the route genuinely warrants a bespoke asset.

URL-Driven Library Branch Release

Use this pass when a library filter stops being local UI state and becomes a canonical branch route, as with Nikaya collection pages.

  1. Move the filter source of truth from component state to pathname parsing.
  2. Give each branch a stable collection URL and each detail page a canonical nested URL.
  3. Preserve old detail URLs with a redirect route and static fallback HTML when direct deep links may already exist.
  4. Pass state.from from listing to detail so back navigation lands on the right branch.
  5. Update runtime metadata and build-time SEO enumeration together. Treat either half drifting out of sync as a release defect.
  6. Verify branch pages are indexable only when they are canonical. Legacy redirect surfaces should be noindex.

Branch State Machine

stateDiagram-v2
    [*] --> UnscopedLibrary
    UnscopedLibrary --> BranchLibrary: pathname selects collection
    BranchLibrary --> CanonicalDetail: click sutta card
    CanonicalDetail --> BranchLibrary: state.from or inferred branch
    LegacyDetail --> Redirected
    Redirected --> CanonicalDetail
    CanonicalDetail --> Redirected: slug mismatch

Branch Sequence

sequenceDiagram
    participant User
    participant Library
    participant Router
    participant Detail
    participant BuildSEO

    User->>Library: choose collection branch
    Library->>Router: navigate to /library/<branch>
    User->>Library: choose sutta
    Library->>Router: navigate to /library/<branch>/<id> with state.from
    Router->>Detail: render canonical detail
    User->>Router: open old detail link
    Router->>Detail: redirect to canonical nested path
    BuildSEO->>Router: generate branch + legacy fallback HTML

Branch Data Flow

flowchart LR
    A[Sidebar filter intent] --> B[Branch pathname]
    B --> C[Collection parser]
    C --> D[Filtered listing]
    D --> E[Detail pathname with branch]
    E --> F[Canonical metadata]
    F --> G[Static route generation]
    H[Legacy detail path] --> I[Redirect resolver]
    I --> E

SEO State Machine

stateDiagram-v2
    [*] --> MetadataReady
    MetadataReady --> RouteEnumerated
    RouteEnumerated --> StaticHtmlBuilt
    StaticHtmlBuilt --> SitemapCovered
    SitemapCovered --> Verified
    Verified --> ReleaseLogged
    ReleaseLogged --> ReadyToPublish
    StaticHtmlBuilt --> RouteEnumerated: missing namespace
    SitemapCovered --> MetadataReady: canonical or robots defect

SEO Sequence

sequenceDiagram
    participant Content
    participant Metadata
    participant BuildScript
    participant Dist
    participant Verifier

    Content->>Metadata: route title and summary
    Metadata->>BuildScript: route becomes enumerable
    BuildScript->>Dist: write static route HTML
    BuildScript->>Dist: write sitemap and robots
    Verifier->>Dist: inspect generated route file
    Verifier->>Dist: inspect sitemap inclusion or exclusion

SEO Data Flow

flowchart LR
    A[Teaching or route metadata] --> B[scripts/build-seo-assets.mjs]
    B --> C[dist/route/index.html]
    B --> D[dist/sitemap.xml]
    B --> E[dist/robots.txt]
    C --> F[Search crawler]
    D --> F
    E --> F

Nikaya Sutta Data QA

Use this branch when Nikaya detail pages show placeholder prose, raw Bilara templates, or version options that claim to exist without readable content.

  1. Inspect one affected local file under public/data/suttacentral-json/<collection>/.
  2. Distinguish file presence from readable content. available.json is not enough for UI truth.
  3. If the English payload comes from Bilara, inspect html_text, translation_text, root_text, and keys_order.
  4. Compose Bilara html_text templates with translation_text before rendering. Do not render raw {} placeholders.
  5. Publish raw file readability into content-availability.json, product-facing route readability into effective-content-availability.json, and child-to-canonical fallback routing into canonical-aliases.json.
  6. Keep the Nikaya selector limited to the curated 3-option set: Tiếng Việt - Thích Minh Châu, Tiếng Anh - Bhikkhu Sujato, and Tiếng Việt - Nhập Lưu 2026.
  7. Within that curated set, keep disabled any option that lacks local readable content.
  8. Audit collection triad readiness with npm run audit:nikaya -- <dn|mn|sn|an|kn>.
  9. If the collection is SN or another peyyala-heavy branch, inspect nikaya_index.json for grouped range IDs such as sn12.72-81 and fetch those exact IDs too.
  10. Treat Bilara 200 {"msg":"Not Found"} responses as missing English, not success.
  11. For KN, remember that collection inference must map kp, dhp, ud, iti, and snp into the kn folder, and that this check must happen before any generic sn prefix logic.
  12. Verify one collection route and one detail route in a browser after the patch.
  13. For manual 2026 authoring, store the copy in src/data/nikaya-improved/vi/*.ts as curated markdown, keep the doctrinal structure intact, and use modern Vietnamese that still sounds disciplined when read aloud.
  14. A manual 2026 file is only complete when it contains the full translation body of the route itself. Short commentary shells or note-style outlines are drafts, not final deliverables.
  15. Do not use Mũi kinh, Điều bài kinh muốn chỉ ra, Bài học thực hành as the main architecture of the final translation. At most, keep one short framing note before or after a full translated body.
  16. For peyyāla and grouped shorthand routes, expand the body into a route-complete translation. Do not hide missing expansion behind a compact paraphrase.
  17. For early AN child routes inside grouped TTC blocks, do not reuse the parent block title for every child. Name exact routes like an1.1 and an1.10 from their own segment content, or the manual layer will look structurally wrong even if the grouped fallback still renders.
  18. AN 1.11-20 adds one more constraint for those early grouped blocks: the child routes come in mirrored pairs. Write an1.11-15 as the causes that let hindrances arise and an1.16-20 as the remedies that prevent or abandon them, so the reader can feel the symmetry built into the source.
  19. AN 1.21-30 should be handled as a mind-training cluster. Keep the prose aware of the full sequence, unworkable and workable, harmful and beneficial, suffering and happiness, and do not lose the English nuance of unrealized and realized potential in an1.25-26.
  20. AN 1.71-81 must be treated as a shaped editorial unit. an1.71-75 form a conditions-of-training cluster, good friends, what one pursues, and wise or unwise attention. an1.76-81 then split into two interlaced triads about loss and growth, where relatives, wealth, and fame are all subordinated to the single higher metric of wisdom. Do not flatten these into generic motivational sayings.
  21. AN 1.82-97 is another high-structure block. Its eight pairs all repeat known themes, but the editorial center of gravity is the stronger verdict very harmful versus very beneficial. Write the manual layer so that readers can feel this escalation, and restore full route-level clarity where the Minh Châu source uses peyyala shorthand like như số 1/2, chỉ thế vào ....
  22. AN 1.98-139 is a three-stage editorial unit. First comes the interior-versus-exterior framing, then the same factors are re-read as causes for the true teaching to fade or endure, and finally the block turns into a doctrinal-integrity warning about mislabeling dhamma, vinaya, the Tathāgata's speech, practice, and prescriptions. The last ten routes must sound like safeguards of transmission, not like generic self-help advice.
  23. AN 1.140-149 is the restoration mirror of the previous integrity block. The prose must feel affirmative but still exacting: calling not-dhamma not-dhamma, dhamma dhamma, and so on is not bland correctness, it is a communal act of merit that protects transmission. Keep the five integrity domains sharply separated so the reader can hear what exactly is being preserved.
  24. AN 1.31-40 should read like a ladder of discipline around the same mind: untamed and tamed, unguarded and guarded, unprotected and protected, unrestrained and restrained. Let an1.39-40 gather the whole ladder back together so the block closes with structural force.
  25. AN 1.41-50 needs a different editorial rhythm. Keep the sequence as an arc from direction to destiny, then from clarity to pliancy, then into the luminous-mind pair. an1.49-50 especially should stay sparse and exact; clarify the practical point, but do not load those two short suttas with extra doctrine that the source itself does not say.
  26. AN 1.51-60 is another mixed block that still needs one spine. Read it as a sequence from seeing the luminous mind rightly, to the dignity of even a finger-snap of mettā, to intention as the lead factor, then heedfulness and laziness. Keep an1.53-55 distinct at the level of verbs, arise, develop, direct, even though the formula is nearly identical.
  27. Before trusting Nikaya totals, run npm run audit:nikaya-integrity. It checks index parity, manifest parity, ordering defects, misplaced files, and alias IDs where suttaplex.uid !== file id.
  28. Run npm run audit:nikaya-originals when you need the stronger truth set for English plus Minh Châu: file presence, readable body text, alias UID drift, alias-target validity, alias-range validity, grouped-range completeness, title parity, Pali-title parity, canonical previous/next continuity, and whether a readable Vietnamese source is genuinely Minh Châu.
  29. In KN, do not trust the *_vi_minh_chau.json suffix by itself. A large metadata-only subset actually points to phantuananh or to no Vietnamese source metadata at all.
  30. When regenerating nikaya_index.json, sort IDs with token-aware numeric ordering and keep one row per local file id. Do not let grouped suttaplex.uid swallow a real single route such as an1.100 or sn12.100.
  31. Preserve canonical punctuation in route IDs. sn12.72 must stay sn12.72; do not normalize it into sn1272 just to share a lookup helper with compact manual-translation keys.
  32. KN grouped Dhammapada canonicals such as dhp1-20 are now part of the local index. If English appears missing on a child dhp* route, check canonical-aliases.json and verify the grouped canonical file was fetched.
  33. Treat route topology as its own truth set. SN and AN currently include both grouped range routes and all child alias routes, which is semantic duplication but not missing content. KN now also has grouped canonicals, so its dhp* children should inherit English via canonical fallback instead of looking missing.
  34. Keep alias range violations and range completeness violations at 0. A grouped canonical is not healthy if it exists in the index but silently drops child IDs or absorbs unexpected ones.
  35. Use effective-content-availability.json for UI-facing totals and content-availability.json for raw file-level audits. Do not mix them up.
  36. Run npm run audit:nikaya-coverage when the user asks the practical question, “bài nào thật sự đọc được bằng English và Minh Châu?” This matrix works at the canonical-block level and separates fallback-only aliases from genuinely missing English, genuinely missing Vietnamese, and missing canonical routes.
  37. Run npm run audit:nikaya-master when you need the full executive summary in one place. It merges the raw file audit and the canonical coverage matrix so you can answer “đúng thứ tự chưa, thiếu gì, thừa gì, sai gì” without manually reconciling multiple reports.
  38. On alias detail routes, do not trust remote suttaplex metadata to be complete. Merge it with local fallback metadata derived from nikaya_index.json, and prefer the child row over any grouped canonical row for acronym, titles, and blurb.
  39. In the public Nikaya library, hide grouped canonical fallback rows such as sn12.72-81 or dhp1-20. Keep them in the raw index for fallback resolution and audits, but do not expose them beside child routes in the main reader-facing list.
  40. Apply that same grouping rule to SEO. Grouped canonical fallback rows should remain directly openable, but they should be omitted from indexable sitemaps and carry noindex,nofollow in both static HTML and runtime page metadata.
  41. When regenerating nikaya_index.json, reject blank titles as real metadata. If suttaplex.translated_title or translation.title is empty, keep falling through to English and then to the Bilara sutta-title segment.
  42. In that Bilara fallback, derive the visible title from translation_text and only use root_text as a Pali-title fallback when it is not just ~.
  43. When a triad audit reads src/data/nikaya-improved/vi/*.ts, normalize filenames like sn-56-11.ts back to sn56.11. Do not flatten dotted IDs away, or manual 2026 coverage will be undercounted.
  44. Token-sort Bilara segment keys in both local and remote fallbacks. Any helper that uses parseFloat('1.10') on segment suffixes will scramble discourse order when the route falls back to the live API.
  45. Run npm run audit:nikaya-remote when you need proof from the official source. It now reports both canonical gaps and visible-route gaps, so cases like an1.330-332 do not disappear behind a readable canonical block. If the result is network error, rerun with real network access before concluding that the upstream source is silent.
  46. If remote audit says the source is readable upstream but the local layer is still missing, run node scripts/fetch-all-nikayas.mjs repair <collection> <en|vi>. That mode refetches only unreadable curated originals and skips child aliases already covered by canonical fallback.
  47. After the current repair pass, KN should be treated as English-complete on the public reader surface. Do not keep describing KN as English-deficient unless a later audit shows regression.
  48. Run npm run audit:nikaya-fidelity when the user asks whether a visible route is exact, sliced from a grouped source, still showing the whole grouped block, or genuinely missing.
  49. For grouped Bilara English, try two scoping passes before giving up: direct child-prefixed keys such as an1.2:*, then range-position sections such as sn12.72-81:1.1. The second pass rescues many SN, AN, and KN child routes.
  50. For grouped Minh Châu HTML, try the same narrowing discipline in a different shape: exact child id, then nested subrange id, then TTC anchors such as TTC 3-5 or TTC 14-17. These TTC slices are scoped grouped renders, not exact single-sutta renders, and they are only safe when the discovered TTC ranges cover the full grouped source contiguously from 1..N.
  51. In any Bilara or legacy payload audit, never count metadata keys like uid, lang, title, author, previous, or next as segment content. Only : keys are real segments, with a rare explicit text field as the only direct-text exception.
  52. If a route can only render an original layer as a whole grouped block, surface that fact in the reader. Do not silently present grouped fallback prose as if it were a clean single-sutta extraction.
  53. If a gap survives source inspection, register it in src/lib/nikaya-source-gaps.ts and surface a verified source-gap notice in NikayaDetail. Use this for real absences such as sn36.30, AN 11.*, or English an1.330-332.
  54. Do not fabricate HT. Thích Minh Châu content for peyyala routes that do not exist in the verified edition. Once proven absent, the correct product behavior is an explicit source-gap explanation, not synthetic filler.
  55. For SN manual 2026 translation work, start with a doctrinal spine across major saṃyuttas instead of only translating consecutive IDs. A first pass that covers dependent origination, right view, not-self, the burning discourse, satipaṭṭhāna conditions, and the truths sets a cleaner editorial bar for later expansion.
  56. The current SN spine now covers dependent origination, the burden, the foam similes, the all, solitude through non-clinging, the eightfold path, the nutriments of hindrances and awakening factors, the refuge of four satipaṭṭhānas after Sāriputta’s passing, the five faculties, stream-entry factors, and the truths defined through the aggregates. Extend SN by preserving this leverage-first logic.
  57. For KN manual 2026 translation work, begin with Khuddakapāṭha (kp1-kp9) as a single editorial cluster. Those texts are short, foundational, and often liturgical, so preserve the chant body itself before adding any brief framing notes.
  58. The next KN editorial foothold after Khuddakapāṭha is Sutta Nipāta at snp1.8, snp2.4, and snp3.7. Keep Mettā and Maṅgala chantable and route-specific, while Sela should retain its arc of recognition, praise, ordination, and realization.
  59. DN manual 2026 coverage is now complete. When revising DN, assume the task is to improve fidelity, cadence, or explanatory framing rather than to fill missing routes.
  60. MN manual 2026 coverage is now complete. For MN, assume the next tasks are editorial upgrades, doctrinal tightening, or prose refinement, not missing-file backfill.
  61. The manual 2026 Vietnamese loader is now file-driven. src/data/nikaya-improved/vi/index.ts discovers *.ts modules with import.meta.glob, and src/data/nikaya-improved/availability.ts derives coverage from that set. Do not rebuild a hand-maintained import map.
  62. Use node scripts/generate-manual-2026.mjs <dn|mn|sn|an|kn> to scaffold missing manual modules. It preserves existing curated files and writes canonical hyphenated filenames such as mn-6.ts, an-1-10.ts, or sn-56-11.ts.
  63. docs/manual-2026-agent-prompts.md is the reusable prompt pack for delegating manual 2026 work. Reach for it when the user asks for prompts, when a new agent needs authoring instructions, or when you want a sharper editorial QA loop.
  64. When using that prompt pack, keep source roles disciplined: English is the semantic lock, HT. Thích Thanh Từ is the stylistic and pedagogical comparator only when the source is actually present, and HT. Thích Minh Châu is the local terminology and route-structure control. Never blur those jobs together.
  65. That prompt pack now treats summary-style route files as incomplete. If you encounter one, revise it into a full body translation before calling the route publication-grade.
  66. AN 1.170-187 should be authored as one coherent Tathāgata cluster. an1.170-174 define the Buddha's singular appearance, an1.175-186 unfold the liberating capacities that appear with him, and an1.187 closes with Sāriputta as the rightful continuer of the Wheel. The grouped shell for an1.175-186 must be restored into distinct child titles and not left as twelve copies of Như Lai.
  67. worklog-translate-2026.md is the live queue for manual 2026 authoring. Keep it current after every batch so future agents can resume without reconstructing progress from audits and scattered task logs.
  68. AN 1.188-197 is a disciples-of-distinction cluster. The routes are short, but each one names a precise excellence that should stay intact. an1.197 is especially important because it models how to expand a brief saying faithfully, which is also the editorial discipline manual 2026 itself depends on.
  69. AN 1.198-208 continues the foremost-disciples lane but changes texture. an1.198-200 name subtle interior attainments and must stay semantically exact. an1.201-206 should read as a sequence of communal virtues, not six isolated compliments. an1.207-208 must protect two delicate meanings: Sīvali is not a mascot for material luck, and Vakkalī is not a mascot for irrational faith.
  70. AN 1.209-218 shifts again. The opening pair is about trainability and faith. The middle stretch reveals the concrete beauty of a functioning sangha, from meal-order detail to lodging assignments and beloved presence. The closing triad must distinguish three nearby but different excellences: immediate penetrative insight, luminous preaching, and the unobstructed analytic knowledges.
  71. AN 1.219-234 must be handled as a structured portrait, not a long flat list. an1.219-223 are really five faces of Ānanda as the carrier of the teaching. Where English and Minh Châu diverge, use the Pali root and the cluster logic to decide the manual phrasing. an1.224-230 broaden into community health and instruction. an1.231-234 then close with four hard-edged excellences that should never be swapped or blurred: admonishing monks, mastery of fire, awakening eloquence, and coarse-robe austerity.
  72. AN 1.235-247 is the first major bhikkhunī cluster. Write it so the ni lineage stands in its own authority. Do not reduce the group to generic praise of “female disciples.” The structural spine is elderhood, wisdom, psychic power, Vinaya, Dhamma speech, meditation, energy, clairvoyance, quick insight, recollection of past lives, great realization, rough-robe austerity, and deep faith.
  73. AN 1.248-257 is the first major male lay follower cluster. Keep the range wide: first refuge, generosity, Dhamma speech, social cohesion through the four saṅgahavatthus, refined giving, Sangha support, experiential confidence, person-centered confidence, and intimate trust. The last three routes are easy to flatten or mistranslate. Use the Pali roots aveccappasanna, puggalappasanna, and vissāsaka to keep them distinct.
  74. AN 1.258-267 is the matching female lay follower cluster. Keep the sequence visibly varied: first refuge, generosity, great learning, loving-kindness, meditation depth, excellent giving, care for the sick, unwavering confidence, intimate trust, and confidence grounded in hearing. The last three routes are again easy to flatten. Keep aveccappasanna as confidence made steady by verification, vissāsikā as intimate trustworthy closeness, and anussavappasanna as confidence born from hearing and transmission, not rumor.
  75. AN 1.268-277 is the first impossibility cluster after the lay-follower material. Keep the formula alive: one accomplished in right view cannot do or believe these things, while an ordinary person still can. Do not flatten diṭṭhisampanno into mere correctness of opinion. The block moves from wrong perception of conditioned things, to impossibility of the gravest acts, to the singularity of one Buddha in one world-system.
  76. AN 1.278-286 continues the same impossibility cluster. Preserve the progression from one wheel-turning monarch in one world-system, to impossible cosmological role claims, to the law that bad bodily, verbal, and mental conduct cannot ripen into agreeable results. For grouped source rows an1.281-283 and an1.285-286, write exact route-level manual modules rather than leaving the child routes semantically collapsed.
  77. AN 1.287-295 completes the first impossibility cycle with the positive mirror of the same karmic law. Keep the symmetry visible: good conduct cannot ripen into disagreeable results, bad conduct cannot lead upward after death, and good conduct cannot lead downward after death. Split grouped source rows into exact route-level manual modules so each body, speech, and mind route remains independently readable.
  78. AN 1.296-305 opens the first recollection cluster of the One Thing chapter. Preserve the repeated arc in every route: developed and cultivated, this one thing leads to disillusionment, dispassion, cessation, peace, direct knowledge, awakening, and nibbāna. Split grouped source rows so each recollection remains route-exact: Buddha, Dhamma, Saṅgha, ethics, generosity, deities, breathing, death, body, and peace.
  79. AN 1.306-315 is the seed and view cluster. Keep the three-step movement intact: wrong and right view as engines of decline or growth, irrational and rational application of mind as their near causes, and finally the bitter seed and sweet seed similes that show how view flavors every action, intention, wish, and outcome. The final pair must feel climactic.
  80. AN 1.316-332 is a three-part block. Keep an1.316-317 as the public force of wrong and right view, an1.320-327 as the mirrored cluster on badly and well explained Dhamma, and an1.328-332 as the disgust similes for even a finger-snap of becoming. Even though English visible routes an1.330-332 are upstream gaps, the grouped Sujato line and Minh Châu TTC 14-17 are sufficient to restore the manual routes exactly if you do not add doctrine beyond the repeated template.
  81. AN 1.575-615 is a kāyagatāsati mega-block where the local child files are structurally misleading. Use the grouped Bilara author endpoint .../bilarasuttas/an1.575-615/sujato?lang=en, not the child JSON shells, then align it with Minh Châu TTC anchors.
  82. Within that block, an1.576-582 must be split into seven precise fruits of body-based mindfulness in this order: urgency, benefit, sanctuary from the yoke, mindfulness and awareness, knowledge and vision, present-life happiness, and knowledge with liberation. Do not leave them merged.
  83. The next sub-clusters in the same block also have shape and should not be flattened. an1.586-590 are the five abandonment results, an1.591-595 are the analytic-penetration results, an1.596-599 are the four fruits, and an1.600-615 are a wisdom ladder of sixteen distinct qualities. Keep titles and prose tight, doctrinally exact, and route-specific.
  84. AN 1.616-627 is the closing amata mirror for the whole Book of the Ones. Preserve each verb difference. Enjoy, have enjoyed, lose, miss out, neglect, forget, cultivate, develop, make much of, have insight into, completely understand, realize. The whole ending depends on that fine-grained variation.
  85. AN 2.1-10 opens the Book of the Twos with a wider register than the late one-line pairs of Book One. Let an2.1 and an2.5 breathe as full discourses. Then keep an2.6-9 as one ethical cluster where bondage, conscience, prudence, and protection of the social world build on each other.
  86. AN 2.11-20 is the first doctrinal staircase of Book Two. Keep an2.11-13 visibly cumulative, from reflective discernment, to the learner's training-power, to the same power expressed through awakening factors and the four jhānas. Then widen the prose for an2.14-20: two teaching modes, the two sides of a monastic dispute, karmic consequence, abandoning the unwholesome and cultivating the wholesome, and the precise conditions that make the true Dhamma decay or endure. an2.15, an2.17, and an2.20 should sound like full discourse bodies, not aphorism shells.
  87. AN 2.21-31 is the first ethics-and-interpretation cluster of Book Two. Do not leave all child routes under the parent label Bālavagga. Recover exact route identity. an2.21 is about seeing one's fault and rightly accepting confession. an2.22-25 are a single hermeneutics block: motive-based distortion, false attribution, and then the explicit contrast between discourses requiring interpretation and discourses whose meaning is already explicit. an2.26-29 move into karmic destination through concealment, view, and virtue. an2.30-31 then reopen the horizon, solitude for present happiness and compassion for later generations, and the paired training of serenity and insight. Keep that whole inner architecture visible in titles and prose.
  88. AN 2.32-41 is the first uneven-weight block of Book Two, and the agent has to resist false uniformity. an2.32-35 are compact but still route-distinct, gratitude, repaying parents, action and inaction, and worthy fields of giving. an2.36-38 are full discourse bodies and need scene, speaker, and doctrinal turn preserved. an2.36 must keep the distinction between internal fetters, external fetters, and the Buddha's later correction about where the deities cultivated their minds. an2.37-38 are Mahākaccāna dialogues and should not be flattened into general moral paraphrase. an2.39-41 tighten again into short teachings on institutional strength, right conduct, and preserving both wording and meaning.
  89. AN 2.42-51 is a community-governance ladder. Do not leave all child routes under the shell title Parisavagga. an2.42-46 are paired diagnostics of assembly quality, shallow and deep, divided and united, inferior and superior, ignoble and noble, cặn bã and tinh hoa. an2.47-48 are the two routes where the prose must lengthen a little, because they turn to how a community listens, questions, and relates to gain. an2.49-51 compress back into procedural and doctrinal integrity, lawful acts, lawful community, and right conduct in disputes. Keep the titles and prose visibly paired all the way through.
  90. AN 2.52-63 rises by tiers inside Puggalavagga. an2.52-56 identify exceptional persons and awakened types. an2.57-59 use the image of thunder to test composure, so keep the animal comparison crisp and forceful. an2.60-61 pivot into truthfulness and hunger that never feels satisfied. an2.62-63 are the communal summit of the block, admonition, coexistence, quarrel, bitterness, and the conditions for conflict either to harden or to calm. Do not translate those two as soft moral abstracts.
  91. AN 2.64-76 is a happiness ladder, not thirteen interchangeable aphorisms. The agent must preserve the repeated frame while letting each pair keep its doctrinal edge: lay and renunciate, sensual and renunciant, attached and unattached, defiled and undefiled, material and non-material, noble and ignoble, bodily and mental, with rapture and without rapture, pleasure and equanimity, with immersion and without immersion, and finally form-bound versus formless. Repetition here is structure, not filler.
  92. AN 2.77-86 must read like a precise dismantling manual for unwholesome states. The doctrinal work is lexical accuracy: sign, source, cause, fabrications, condition, form, feeling, perception, consciousness, conditioned object. Keep each route minimal because the source is minimal, but make the distinction between the ten supports visible and audible.
  93. AN 2.87-97 is a lexical-precision block. Most routes are only one sentence long, so the title does a lot of doctrinal work. Do not use vague headings. Make the paired terms exact and balanced: liberation of heart and liberation by wisdom, conscience and prudence, hard-to-correct and bad friendship, easy-to-correct and good friendship, skill in elements and skill in wise attention, skill in offenses and skill in emergence from offenses.
  94. AN 2.98-117 is the first overt fool-and-wise mirror of Book Two. Do not leave the routes under the parent shell Bālavagga. Recover the exact contrast in each title and keep the internal turn visible. an2.98-107 are paired judgments, what burden to carry, what is allowable, what counts as offense, what is truly Dhamma, what is truly training. an2.108-117 then restate the same field through the growth or non-growth of the taints. The second half should sound more consequential, not merely repetitive.
  95. In AN 2.109-117, keep the diction spare but not weak. The body can stay short because the source is short, yet the title and cadence must signal consequence. These are not just right views in the abstract. They are the conditions under which taints either keep swelling or stop gaining ground.
  96. AN 2.118-129 changes shape mid-block. The first five routes are about hopes, gratitude, contentment, hoarding, and waste, so the prose may stay human and concrete. The next four are doctrinally sharper and must keep subhanimitta, paṭighanimitta, parato ghoso, ayoniso manasikāra, and yoniso manasikāra straight in meaning even if rendered in lucid Vietnamese. The final three are technical Vinaya categories of offense. Do not blur light, serious, coarse, not coarse, with residue, and without residue.
  97. AN 2.130-140 is the first aspiration block and the first one in Book Two where several routes clearly require fuller bodies again. an2.130-133 are aspiration formulas anchored in exemplars. Keep the phrase cán cân or chuẩn mực alive so the routes remain normative, not sentimental. an2.134-137 are paired fool-versus-true-person discourses and must preserve both halves. an2.138-140 then strip back to minimal dyads. The skill here is modulating length without losing coherence.
  98. AN 2.141-150 is a short-form giving cluster where the challenge is lexical precision, not expansion. Each route is brief, but each one turns on a different social verb, giving, offering, relinquishing, surrendering, possessing, enjoying together, sharing, including, supporting, sympathizing. Keep each route as a distinct child file and let chánh pháp remain the clearly superior counterpart in every closing line. Do not let the repeated pattern tempt you into one generalized block summary.
  99. AN 2.151-162 is the receiving-and-flourishing mirror of the prior giving cluster. The doctrinal move stays the same, material form versus Dhamma form, but the social verbs change fast: welcoming, hosting, seeking, searching more broadly, inquiring, honoring, treating guests, succeeding, growing, treasuring, accumulating, expanding. Keep the titles and prose exact enough that a reader can tell one route from the next without losing the shared refrain.
  100. AN 2.163-179 changes register from social exchange to discipline and contemplative craft. Treat it as one shaped practice-cluster, not seventeen detached glosses. an2.168-169 must feel like a corrective pair. an2.170-172 should sound stronger and more inward. an2.173-176 are not abstract philosophy but the moral and cognitive spine of the path. an2.177-179 are the closing pressure points of the block, never complacent in wholesome states, never lax in effort, then the stark contrast between heedless mind and mindful clear comprehension.
  101. AN 2.180-229 is the first large Book Two peyyāla machine. The only safe way through it is by matrix. Keep the five moral pairs stable, anger and grudge, disdain and spite, jealousy and stinginess, deceit and deviousness, shamelessness and recklessness. Then keep each five-route tranche tied to its governing predicate: bare statement, bright opposite, suffering now, happiness now, decline, non-decline, hell, heaven, bad rebirth, good rebirth. If you lose the matrix, the translations turn into noise.
  102. AN 1.333-347 is a rarity ladder. Keep the few and many rhythm alive as the routes climb from rare favorable birth and favorable lands, to rare discernment and noble wisdom-eye, to rare encounter with Tathagata and Dhamma, to rare retention, reflection, practice, samvega, right effort, samadhi based on letting go, and finally the rare taste of meaning, Dhamma, and liberation in an1.347. an1.347 must feel climactic.
  103. AN 1.348-377 is a rebirth matrix. Do not flatten grouped triples into abstract summaries. Each route must preserve one rare rebirth target and one exact common fall destination. Organize the prose by source realm, human, gods, hell, animals, ghosts, and keep the doctrinal force plain: favorable rebirth is rare, downward drift is common, and neither pain nor pleasure guarantees wisdom.
  104. AN 1.378-393 is an inspiring-qualities cluster. Keep the block tiered, not flat. an1.378-381 are renunciant disciplines, an1.382-388 are teaching, Vinaya, learning, bearing, and communal influence, and an1.389-393 are social or bodily traits that can help others open in trust. Preserve the repeated force worth having, but do not inflate good family, physical beauty, or health into proofs of liberation. They are confidence-supporting conditions, not the goal.
  105. AN 1.394-401 is the first finger-snap cultivation cluster. Keep the eight routes as a single shaped unit: first through fourth jhāna, then loving-kindness, compassion, rejoicing, and equanimity. The doctrinal center is the repeated refrain, not the bare title of the state. Even a finger-snap of true cultivation means the monk is not empty in meditation, follows the Teacher, responds to advice, and does not consume the country's alms in vain. Let that refrain stay alive in every manual file.
  106. AN 1.402-423 is the next finger-snap training arc. Keep the shape explicit: four establishments of mindfulness, four right efforts, four bases of spiritual power, five faculties, and five powers. The repeated refrain still matters more than the list itself. Each route says that even a finger-snap of real cultivation in this frame is enough to make the monk not empty in meditation, responsive to instruction, and worthy of alms. Preserve both the exact framework and the dignity of the brief true effort.
  107. AN 1.424-438 continues the same finger-snap curriculum through the seven awakening factors and the noble eightfold path. Keep an1.424-430 as a genuine developmental chain, mindfulness, investigation, energy, joy, tranquility, immersion, equanimity, and keep an1.431-438 as the exact order of the path. These routes are short but not thin. Each one carries a distinct doctrinal function plus the same strong refrain that even a brief true moment of cultivation is not spiritually empty.
  108. AN 1.439-454 shifts from the finger-snap curriculum into the contemplative deep-structure of the chapter. Treat an1.439-446 as the eight mastery bases and an1.447-454 as the eight liberations. Do not flatten them into decorative meditation language. The first block is about mastery over perception in increasingly subtle visual fields. The second block is about release through progressively subtler configurations, from form, to purified perception, to the immaterial attainments, to cessation. Keep the prose lucid, exact, and restrained.
  109. AN 1.455-464 is the ten-kasina block and should stay visibly shaped. First come earth, water, fire, and air. Then the blue, yellow, red, and white kasinas. Finally the arc opens further into space and consciousness. Preserve biến xứ as a technical term, and let the prose show the deepening movement from coarse support to subtle field without turning the block into vague mystical atmosphere.
  110. AN 1.465-474 is the ten-perception block and should not be flattened. The first four perceptions strip glamour away from the world of craving. The next three form an insight ladder, impermanence, suffering, not-self. The last three turn that insight into release, giving up, fading away, cessation. Keep each route lean, but preserve the whole arc.
  111. AN 1.475-484 is the next perception block and changes register halfway. The first five routes are short reset-perceptions. The next five are corpse contemplations and must keep their severity. Do not soften the imagery just to make it more comfortable; their job is precisely to cut attachment to bodily glamour.
  112. AN 1.485-494 is the recollection block and needs a different register from the harsher perception clusters before it. The prose should be clear, warm, and steady without getting sentimental. These routes are meant to give the mind reliable supports, from the six classic recollections through breathing, death, body, and peace.
  113. AN 1.495-504 is the first-jhāna faculties-and-powers block. Treat an1.495-499 and an1.500-504 as two matched pentads. The key editorial job is to preserve the doctrinal ascent from căn to lực, from growing capacity to stabilized strength, without turning the ten routes into ten near-identical notes.
  114. AN 1.505-514 is the same matched pentad structure shifted into the second jhāna. The content should not sound mechanically recycled from AN 1.495-504. Keep the quieter, more unified flavor of nhị thiền audible, while still marking the rise from faculty to power.
  115. AN 1.515-524 repeats the matched pentads in the third jhāna. The editorial job is not to inflate doctrine, but to make the tonal descent audible: less bright than nhị thiền, more settled, more cool, more balanced. Preserve both axes, from faculty to power and from nhị thiền to tam thiền.
  116. AN 1.525-534 shifts the same pentads into the fourth jhāna. The prose should be the cleanest of the four jhāna blocks: more even, more purified, less affective, and more unmistakably grounded in equanimity and clarity. Keep both transitions explicit, from faculty to power and from tam thiền to thiền thứ tư.
  117. AN 1.535-544 leaves the jhāna ladder and enters the mettā pentads. The prose should open out: warmer, wider, less technical, but still exact. Keep both transitions explicit, from faculty to power and from fourth-jhāna purity into loving-kindness as a lived, non-hostile mode of mind.
  118. AN 1.545-554 repeats the same structural ladder under compassion. Do not write it as generic kindness with a darker mood. The key color is nearness to suffering without collapse. Keep karuṇā distinct from grief, keep the faculty to power turn visible, and let the final route close the whole compassion block rather than sounding like one more isolated aphorism.
  119. AN 1.555-564 repeats the ladder under sympathetic joy. This block should sound brighter than compassion but still disciplined. Keep hỷ distinct from excitement, pleasure, pride, or victory. The prose should feel open and clean, free from envy, and the final route should show that wisdom can rejoice without losing balance.
  120. AN 1.565-574 repeats the ladder under equanimity. This tranche should sound level and spacious, but never deadened. Keep xả distinct from indifference. Also preserve the special closing force of an1.574, where the grouped source restores the explicit refrain about not being barren of jhāna and not eating alms in vain. This block is the first one that should fully embody the repo's stricter full-body translation standard.

Nikaya State Machine

stateDiagram-v2
    [*] --> RawJsonPresent
    RawJsonPresent --> GroupedIdsResolved
    GroupedIdsResolved --> ContentAudited
    ContentAudited --> OriginalLayersAudited
    OriginalLayersAudited --> MetadataParityAudited
    MetadataParityAudited --> ManifestSynced
    ManifestSynced --> RenderFidelityAudited
    RenderFidelityAudited --> DetailSelectable
    DetailSelectable --> BrowserVerified
    BrowserVerified --> ReaderWarned: grouped block only
    RawJsonPresent --> RawJsonPresent: refetch if metadata only
    GroupedIdsResolved --> RawJsonPresent: grouped route absent locally
    GroupedIdsResolved --> ContentAudited: range-position slice available
    ContentAudited --> RawJsonPresent: Bilara composition defect
    OriginalLayersAudited --> RawJsonPresent: Minh Chau provenance mismatch
    MetadataParityAudited --> GroupedIdsResolved: alias target or nav defect
    ReaderWarned --> DetailSelectable: exact source becomes available
    BrowserVerified --> DetailSelectable: UI regression

Manual 2026 Loader

stateDiagram-v2
    [*] --> IndexRowKnown
    IndexRowKnown --> FilenameCanonical
    FilenameCanonical --> ModuleWritten
    ModuleWritten --> GlobDiscovered
    GlobDiscovered --> AvailabilityDerived
    AvailabilityDerived --> TriadAudited
    TriadAudited --> BuildVerified
    BuildVerified --> [*]
    FilenameCanonical --> IndexRowKnown: file naming fix
    GlobDiscovered --> ModuleWritten: export shape invalid
    TriadAudited --> ModuleWritten: coverage still missing
sequenceDiagram
    participant Index as nikaya_index.json
    participant Script as generate-manual-2026.mjs
    participant Files as src/data/nikaya-improved/vi/*.ts
    participant Loader as vi/index.ts
    participant Availability as availability.ts
    participant Audit as audit-nikaya-triad.mjs

    Index->>Script: collection metadata rows
    Script->>Files: scaffold missing modules
    Files->>Loader: eager glob discovery
    Loader->>Availability: normalized sutta IDs
    Availability->>Audit: manual coverage set
    Audit->>Audit: confirm triad completeness
flowchart LR
    A[nikaya_index.json] --> B[generate-manual-2026.mjs]
    B --> C[src/data/nikaya-improved/vi/*.ts]
    C --> D[vi/index.ts]
    D --> E[availability.ts]
    E --> F[audit:nikaya]
    D --> G[Nikaya reader]

Nikaya Source-Gap Flow

flowchart LR
    A[Route id plus language] --> B[canonical-aliases.json]
    B --> C[src/lib/nikaya-source-gaps.ts]
    C --> D[NikayaDetail coverage notice]
    C --> E[Audit summary]
    E --> F[No fabricated original content]

Nikaya Sequence

sequenceDiagram
    participant Agent
    participant File
    participant RawManifest
    participant EffectiveManifest
    participant Alias
    participant Parser
    participant Index
    participant Originals as audit-nikaya-originals
    participant Fidelity as audit-nikaya-render-fidelity
    participant Detail
    participant Browser

    Agent->>File: inspect one sutta JSON
    Agent->>RawManifest: compare available.json vs content-availability.json
    Agent->>EffectiveManifest: compare effective route coverage against UI expectations
    Agent->>Index: inspect grouped range IDs when the collection is peyyala-heavy
    Agent->>Parser: confirm KN book prefixes resolve to the kn directory
    Agent->>Parser: patch Bilara composition if needed
    Agent->>Originals: audit readable EN, readable Minh Chau VI, alias ids, title parity, and canonical nav continuity
    Agent->>Fidelity: audit exact vs grouped render quality on visible routes
    Agent->>Alias: verify child routes resolve to the correct grouped canonical
    Agent->>Index: prefer child-row metadata over grouped canonical metadata on alias detail routes
    Parser->>Parser: scope grouped Bilara by child prefix or range position
    Parser->>Parser: scope grouped Minh Chau HTML by subrange ids or TTC chunks
    Parser->>Detail: return rendered HTML only when content exists
    Detail->>Alias: hide grouped fallback canonicals from the public library list
    Detail->>Detail: keep grouped fallback canonicals off the indexable SEO surface
    Detail->>Browser: expose only the curated 3-option selector
    Detail->>Browser: warn when a version is only a grouped block
    Browser->>Agent: confirm readable EN output

Nikaya Data Flow

flowchart LR
    A[Raw sutta JSON] --> B[Content audit]
    X[Grouped range IDs] --> B
    A --> H[Original layers audit]
    A --> Y[Render fidelity audit]
    A --> N[Remote gap audit]
    A --> L[Index generation]
    B --> C[content-availability.json]
    B --> I[effective-content-availability.json]
    B --> J[canonical-aliases.json]
    H --> C
    N --> O[Repair mode]
    O --> A
    L --> M[nikaya_index.json]
    M --> X
    M --> F
    M --> G
    A --> D[suttacentralLocal parser]
    D --> E[Rendered HTML]
    Y --> G[NikayaDetail notices]
    X --> F[Library grouped-route pruning]
    J --> K[SEO noindex gating]
    I --> F[NikayaLibrary badges]
    I --> G[NikayaDetail dropdown]
    J --> G
    E --> G

State Machine

stateDiagram-v2
    [*] --> SourceReviewed
    SourceReviewed --> Chapterized
    Chapterized --> ModuleWired
    ModuleWired --> MetadataRegistered
    MetadataRegistered --> RouteVerified
    RouteVerified --> ReleaseLogged
    ReleaseLogged --> ReadyToPublish
    RouteVerified --> ModuleWired: import or chapter mismatch
    RouteVerified --> MetadataRegistered: summary or theme defect

Sequence

sequenceDiagram
    participant Source
    participant Markdown
    participant Module
    participant Metadata
    participant Route
    participant TaskLog

    Source->>Markdown: translated, segmented chapters
    Markdown->>Module: ordered raw imports
    Module->>Metadata: title, summary, author, themes
    Metadata->>Route: slug becomes loadable
    Route->>TaskLog: route target and release notes

Data Flow

flowchart LR
    A[Short source text] --> B[Markdown chapters]
    B --> C[Teaching module]
    C --> D[Metadata registry]
    C --> E[TeachingDetail lazy import]
    D --> F[DhammaLibrary listing]
    E --> G[Teaching route]
    G --> H[Task log and release verification]

State Machine

stateDiagram-v2
    [*] --> InspectPDF
    InspectPDF --> SegmentSections
    SegmentSections --> ExtractText
    ExtractText --> RepairOCR
    RepairOCR --> PreserveAppendices
    PreserveAppendices --> DraftVietnamese
    DraftVietnamese --> RegisterTeaching
    RegisterTeaching --> VerifySite
    VerifySite --> PublishFrontend
    PublishFrontend --> Published
    DraftVietnamese --> PreserveAppendices: source artifact blocks phrasing
    VerifySite --> RepairOCR: extraction defect
    VerifySite --> RegisterTeaching: integration defect

Sequence

sequenceDiagram
    participant Agent
    participant PDF
    participant Script
    participant Corpus
    participant Site
    participant GitHubPages

    Agent->>PDF: inspect TOC, headings, appendix pages
    Agent->>Script: encode section boundaries
    Script->>PDF: extract text, lines, footnotes
    Script->>Corpus: write English markdown
    Agent->>Script: inject appendix image strategy
    Script->>Corpus: write cleaned English corpus
    Agent->>Corpus: hand-translate chapter where ready
    Corpus->>Site: vi chapter or en fallback
    Agent->>Site: register teaching module and metadata
    Site-->>Agent: build result and route validation
    Agent->>GitHubPages: push main after checks pass

Data Flow

flowchart LR
    A[scrnguna.pdf] --> B[Section map]
    B --> C[Line extraction]
    C --> D[Body cleanup]
    C --> E[Footnote parsing]
    D --> F[English markdown chapters]
    E --> F
    A --> G[Rendered appendix pages]
    G --> H[Public image assets]
    F --> I[Manual Vietnamese chapters]
    F --> J[English fallback corpus]
    I --> K[Teaching TS module]
    J --> K
    H --> K
    K --> L[Metadata + route wiring]
    L --> M[Local build]
    M --> N[GitHub Pages deploy]

Practical Rules

  • Never trust automatic footnote placement on biography or front-matter pages. Only annotate numbers that actually exist as page footnotes.
  • Run a dedicated front-matter QA pass. Cover pages, title pages, and library stamps often OCR into duplicated headings, isolated capitals, and other debris that must be rewritten into clean editorial prose before publication.
  • Do not flatten tables or diagrams into broken prose. Preserve them as images with short textual summaries.
  • Keep chapter files stable across reruns by using explicit numeric prefixes in filenames.
  • Protect Pāli doctrinal vocabulary when translating. A bad translation of a key term is worse than leaving the term in transliteration.
  • Treat the English markdown as the canonical extracted source.
  • If the Vietnamese chapter is not yet elegant, doctrinally precise, and readable aloud, do not force publication. Let the module fall back to English.
  • For this repo, a content-only release normally means frontend publish only.
  • If a teaching grows large enough to create an oversized route chunk, prefer chapter-level loadContent loaders over eager raw markdown imports so the reader can hydrate progressively.
  • Site verification now runs on Vite 8. Keep manualChunks function-based in vite.config.ts, and if chart routes fail under production bundling, confirm react-is is installed for recharts.
  • Do not reintroduce a forced vendor-markdown chunk for the KaTeX reader stack. On this repo, Rolldown can emit a broken katex_min_exports symbol when katex and rehype-katex are grouped too aggressively.
  • If math pages need lazy styling, keep useKatexCSS on the stylesheet-URL path. Avoid dynamic CSS module imports for katex.min.css unless you verify the emitted chunk graph in production.
  • During route QA, inspect the page chrome as well as the manuscript body. Mis-scoped i18n keys such as t('common.exportPdf') can surface raw keys even when the content itself is clean.

Review Checklist

  • Section ordering matches the source PDF.
  • Page-scoped footnote labels are unique.
  • No obvious split-word artifacts remain around footnote markers.
  • Appendix pages render upright and at readable width.
  • Metadata title, summary, difficulty, and themes match the manuscript.
  • The teaching route resolves with chapter ordering intact.
  • If the teaching is surfaced from Pháp Bảo, confirm the back link returns to /phap-bao/giao-phap and not the generic library root.
  • If the route is public, confirm dist/<route>/index.html contains the expected canonical and JSON-LD after build.
  • Site build passes after wiring.
  • Pages deploy is triggered from main.