A portable standard for bounded-polysemic content.

The Sūtrakṛt per-verse-object schema is the structure for any content that needs to preserve bounded interpretive disagreement at representation time. The Bhagavad-Gītā substrate at this site is its first published instantiation; the schema generalizes to halakhic responsa, common-law precedent, Quranic tafsīr, critical-edition philology, medical specialty reasoning, paradigm-divergent code, and LLM context windows.

v1.0 · April 2026substrate · MITdata · CC-BY 4.0API · CORS-open↗ github

public repositorygithub.com/ekras-doloop/sutrakrit-gita700 verses · 7 schools · CC-BY 4.0 · clone, mirror, port

quick start

Fetch the per-verse object for BG 2.47 in one line:

bashcurl https://gita.ekrasworks.com/api/v1/verse/2/47

Returns the full structured object — mūla, word_by_word, six-school doctrinal_projections, intertextual_panel with feature-decomposed scores, theme_list_memberships, so_what_questions, everyday_applications, audit_trail.

In Python:

pythonimport requestsv = requests.get("https://gita.ekrasworks.com/api/v1/verse/2/47").json()print(v["mūla"]["iast"])for school, proj in v["doctrinal_projections"].items():    print(f"{school}: {proj['english_rendering'][:80]}...")for tok in v["word_by_word"]:    print(f"{tok['surface_form']:15} ← {tok['lemma']:15} | {tok['grammar']}")

In JS:

javascriptconst v = await fetch("https://gita.ekrasworks.com/api/v1/verse/2/47").then(r => r.json());const advaita = v.doctrinal_projections.advaita;console.log(advaita.english_rendering, "—", advaita.divergence_note);

api endpoints

Public, CORS-open, edge-cached (5 min) with 24h SWR. No auth. Rate-limit liberal; if you need bulk fetch, please clone the data repo instead — it is CC-BY 4.0.

GET/api/v1/verse/{chapter}/{verse}per-verse object for one verse

GET/api/v1/lemma/{lemma}every verse where a lemma occurs (IAST-encoded; URL-encode diacritics)

GET/api/v1API root with full endpoint manifest

For bulk access (all 700 verses at once) clone github.com/ekras-doloop/sutrakrit-gita — the rendered/ directory is the canonical store. Files are named bg_X_Y.json per verse.

the per-verse object

Every verse conforms to schema/per-verse-object.schema.json. The shape:

json{  "verse_id": "2.47",  "mūla": {    "devanāgarī": "कर्मण्येवाधिकारस्ते मा फलेषु कदाचन | …",    "iast":       "karmaṇyevādhikāraste mā phaleṣu kadācana | …",    "speaker": "Krishna",    "addressed_to": "Arjuna",    "chapter_position": "BG 2.47"  },  "word_by_word": [    {      "surface_form": "karmaṇi",      "surface_devanagari": "कर्मणि",      "lemma": "karman",      "grammar": "locative neuter singular noun",      "senses_attested_in_panel": [        { "school": "advaita", "sense": "कर्म-विषये" },        { "school": "viśiṣṭādvaita", "sense": "स्वधर्म-कर्मणि" }      ],      "theme_lists": ["कर्म"]    },    /* … one entry per token … */  ],  "doctrinal_projections": {    "advaita":        { "english_rendering": "…", "divergence_note": "…", "witness_passages": ["shankara_2.47", "anandgiri_2.47"], "score": 1.0 },    "viśiṣṭādvaita":  { "english_rendering": "…", "divergence_note": "…", "witness_passages": ["ramanuja_2.47"], "score": 1.0 },    "dvaita":         { "english_rendering": "…", "divergence_note": "…", "witness_passages": ["madhva_2.47"], "score": 1.0 },    "śuddhādvaita":   { /* … */ },    "bhakti":         { /* … */ },    "advaita-bhakti": { /* … */ }  },  "intertextual_panel": [    {      "verse": "3.19",      "type": "near-cluster echo",      "score": 0.927,      "feature_breakdown": {        "cosine":        0.882,        "theme_graph":   3.0,        "vocative":      0.0,        "substring":     0.0,        "lemma_overlap": 14.7,        "stem_prefix":   6.0      }    },    /* … top-K nearest verses … */  ],  "prosodic_information": { "meter": "anuṣṭubh", "pragmatic_context": { "vocative": null } },  "theme_list_memberships": [    { "list": "कर्म", "role": "primary", "other_verses_in_list": ["2.48", "2.50", "3.4", "…"] }  ],  "so_what_questions": ["…"],  "everyday_applications": { "advaita": "…", "viśiṣṭādvaita": "…", /* … per school … */ },  "audit_trail": {    "substrate_version": "sutrakrit-v2.6",    "fitted_weights": { "a": 1.0, "b": 0.01, "e_v": 0.005, "z": 0.2, "h": 0.0, "th": 0.01 },    "corpus_provenance": {      "mūla": "Belvalkar critical edition (BORI 1947), via Ambuda multi-witness",      "panel_witnesses": ["bg-mula", "bg-shankara", "bg-ramanuja", "bg-madhva", /* … */]    },    "extraction_date": "2026-04-22",    "score_methodology_documented_at": "Paper 1, §II.B (substrate architecture and feature definitions)",    "word_by_word_parser": "ByT5-Sanskrit-multitask (Nehrdich/Hellwig/Keutzer EMNLP 2024)"  }}

the substrate model

The composite score for ranking intertextual links is a frozen-weight linear combination of one neural feature and five Sanskrit-aware symbolic features:

textscore(s, v) =    a  · cos(E(s), E(v))                  // mE5-base sentence-embedding cosine  + b  · |Sutras(src(s)) ∩ Sutras(v)|     // theme-graph co-membership  + e_v · vocative_signal(src(s), v)      // vocative pattern correspondence  + z  · substring_score(s, v)            // verbatim-fragment citation  + h  · lemma_idf_overlap(s, v)          // lemmatized lexical overlap (ByT5)  + th · stem_prefix_overlap(s, v)        // Devanāgarī stem-prefix family frozen weights: a=1.0, b=0.01, e_v=0.005, z=0.2, h=0.0, th=0.01 fit:    Ramsukhdas's marked cross-references in Sādhak Sañjīvanī (1980)test:   Śaṅkara's bhāṣya  → R@4 = 71.6%  (frozen-weight cross-school transfer)        Vedantadeshika   → R@4 = 40.2%  at n=629        Tilak (negctrl)  → R@4 = 38.4%  (Cat-3 author-mode collapse, expected lower)

Full methodology + cross-validation details in companion Paper 1 (Sūtrakṛt: A Computational Substrate for Cross-Reference Retrieval Across the Bhagavad-Gītā Commentary Tradition) and Paper 2 (Bounded Polysemy as Textual Architecture).

convergent validation

The substrate's weights are fit on one anchor and held frozen across the full panel of competing schools. The methodologythat justifies trusting the result is older than the substrate: convergent validation on a minimally theory-laden anchor.

The intuition is the eight-detectives image. Imagine eight rival detectives who actively dislike each other, each with a different theory of the case, all walking onto the same muddy crime scene. You cannot trust any single detective's reading of where the killer went — each one wants their theory to win. But if all eight, independently, point to the same muddy footprint on the ground, that footprint is real. Their opposing biases cancel. What survives is the framework-independent fact.

Sūtrakṛt's eight-commentator panel plays the role of the rival detectives. The cross-reference network is the muddy footprint. When Śaṅkara (advaita), Rāmānuja (viśiṣṭādvaita), Madhva (dvaita), Vallabha (śuddhādvaita), Śrīdhara (bhakti), Madhusūdana (advaita-bhakti), Vedāntadeśika, and Anandagiri — schools that disagree with each other about almost everything — independently surface the same set of cross- references for a given verse, the network is anchored not in any one school's authority but in the structural mathematics of the text itself. That is what frozen-weight cross-school R@4 = 71.6% on Śaṅkara, fit on Ramsukhdas, is measuring.

The same methodology ports: in the code instantiation, eight development tools (compiler, type-checker, static analyzer, fuzzer, formal verifier, security scanner, profiler, AI assistant) are the detectives, and the AST + tool-graph is the footprint. In the context-engineering instantiation, the four context modalities (system, user, tool, retrieval) are the detectives, and the compiled token-stream + provenance graph is the footprint. The architecture transfers because the validation discipline transfers.

port to other domains

The schema generalizes to any content with four properties: discrete primary text units, attested witness traditions with named witnesses, identifiable schools or lineages, and formal substrate constraining how polysemy is bounded. Mapping table:

Indic ritual texts→verse · bhāṣya school · bhāṣya passage

Halakhic responsa→teshuvah · pesak tradition · named posek + responsum citation

Tafsīr→Quranic verse · tafsīr school · tafsīr passage + isnad

Common law→case holding · jurisprudential lineage · citation chain

Critical-edition philology→text-witness line · manuscript tradition · apparatus siglum

Medical specialty reasoning→symptom-cluster diagnostic question · specialty · guideline + version

LLM-generated code→function render · paradigm projection · type-system witness

LLM context-window content→context item · per-modality projection · provenance + position + role

To port: instantiate the per-unit object schema, define the per-domain symbolic features that capture the formal substrate (analogues of meter, vocative, sūtra-citation, lemma-overlap), fit weights on a Layer-3 anchor (a minimally-theory-laden internal-register text in your domain), freeze the weights, and validate via cross-witness recall@k. The Sūtrakṛt-Gītā repo is the reference implementation.

license · cite

Substrate library (the scoring code, schema, feature extractors): MIT. Per-verse rendered objects (the BG data the API serves): CC-BY 4.0. Sanskrit primary sources cited per audit_trail.corpus_provenance in each per-verse object.

If you build on the schema or use the substrate, please cite:

bibtex@misc{rastogi2026sutrakritgita,  author = {Rastogi, Gaurav},  title  = {S\={u}trak\d{r}t-G\={\i}t\={a}: A Substrate-Rendered Edition of the Bhagavad-G\={\i}t\={a}},  year   = {2026},  url    = {https://gita.ekrasworks.com},  note   = {Substrate code under MIT; per-verse rendered objects under CC-BY 4.0.            Word-by-word via ByT5-Sanskrit-multitask (Nehrdich/Hellwig/Keutzer, EMNLP 2024).            Cross-reference substrate fit on Ramsukhdas's S\={a}dhak Sa\~{n}j\={\i}van\={\i} (1980),            cross-validated R@4 = 71.6\% on \'{S}a\.{n}kara's bh\={a}\d{s}ya.}}