What is Sūtrakṛt?

A substrate for texts that mean more than one thing at once.

सूत्रकृत्

sūtrakṛt · the one who weaves the threads back together

Some texts are made to mean several things at once.

A song lyric is not a sentence. A scripture is not a manual. A legal opinion is not a FAQ entry. The texts that human beings have kept around for centuries — sūtras, sacred poems, common-law judgments, songs by Cohen and Dylan, Hamlet — work because they hold several legitimate readings at the same time, in tension, bounded by the formal craft of the text itself.

The Bhagavad-Gītā is the canonical example. For more than a thousand years, six major schools of Hindu commentary have read the same 700 verses and produced six different coherent philosophies: Śaṅkara's non-dual reading, Rāmānuja's qualified-non-dual reading, Madhva's dualist reading, Vallabha's pure-non-dual reading, Śrīdhara's philological-devotional reading, Madhusūdana's synthesis of non-dual realization with devotion. None of them is wrong. None of them is the consensus. The bounded disagreement is what kept the text alive.

Modern AI tools, by default, flatten that.

The standard way AI reads a long-lived text today is: turn every passage into a numerical vector, find the closest vectors to your question, summarize what they say. That move works fine on a user manual. It silently destroys the texture of a text that was made to mean several things. The resulting summary sounds fluent, sounds confident, sounds like a good answer — and the bounded disagreement that gave the text its longevity is gone. The answer reads correctly. It just is not the kind of thing the text actually is.

Tomato soup survives blending. Bolognese does not.

iii

Sūtrakṛt refuses the flattening.

Sūtrakṛt is the underlying engine that powers this site. For every verse of the Gītā, it produces a structured object that holds all the readings the tradition has attested, alongside the Sanskrit text itself, the cross-references the verses make to each other, and a complete audit trail back to the source bhāṣya passages each reading was anchored to.

The reader picks the lens they want. Or picks none and reads all six side-by-side. The substrate stays modal — six schools, six readings, none privileged. The surface stays modeless — you do not have to choose a reading to start.

a worked example · sandhi

The classical demonstration is the Upaniṣadic phrase तत्त्वमसि tattvamasi. Read with the most common parse, it splits into tat tvam asi — “thatthou art” — the great non-dual identity statement: you are the universe.

But the same continuous string is grammatically licit as a-tat tvam asi — “not-thatthou art” — where the initial a- negates the predicate and the verse becomes the dualist's existence proof: you are emphatically not the absolute. Two opposite philosophies, one identical letter sequence, selected entirely by where the reader places the invisible space.

This is not a defect of Sanskrit. It is the engineering. A substrate that holds the reading open at the place where the tradition holds it open is faithful to the text. A substrate that quietly picks one parse for you is not.

What you'll find on a verse page.

If you came here to read: The Sanskrit mūla in Devanāgarī + IAST as the central column, the six schools' readings as the right-margin apparatus rail (color-coded by school), and two so-what questions a working modern reader might bring to the verse. The first screen, in 30 seconds.
If you came here to study: Each school's full English rendering, the bhāṣya divergence note, the everyday-application of that school's reading, the witness pointers back to the specific bhāṣya passages each rendering was anchored to.
If you came here to read scholarly source: Word-by-word with lemma + grammar + English meaning + each school's actual gloss-snippet in Sanskrit; intertextual panel ranked by the Sūtrakṛt substrate; anuvṛtti theme-chains; substrate version, fitted weights, corpus provenance. Everything is one click; nothing is forced.
If you came here to cite: Every verse page has a one-click copy with citation at the top. The full panel copies as Markdown with the citation block built in (substrate version, parser provenance, license, accessed-date, canonical URL). Per-school copy this reading on each card.

Briefly: how the substrate works.

For each verse, the substrate combines a multilingual sentence-embedding model (mE5-base) with five Sanskrit-aware symbolic features — theme-graph co-membership, vocative pattern correspondence, verbatim citation overlap, lemmatized lexical overlap, and Devanāgarī stem-prefix family. The composite scoring function was fit by grid search on Ramsukhdas's marked cross-references in his 1980 Sādhak Sañjīvanī and then frozen. Cross-validated on Śaṅkara's bhāṣya, the frozen weights produce 71.6% recall@4 — matching the fit-corpus performance within 0.1pp, which is the cross-school generalization claim the substrate rests on.

The word-by-word layer uses ByT5-Sanskrit-multitask (Nehrdich, Hellwig & Keutzer, EMNLP 2024) for lemma identification and grammatical analysis. The English meaning under each lemma comes from a curated 2,135-lemma Sanskrit-English gloss dictionary with prefix/suffix etymology where the lemma is decomposable. Per-school sense-snippets are extracted from each commentator's actual bhāṣya text in the school's own Sanskrit register (translation would be an additional collapse the substrate declines).

Code is open at github.com/ekras-doloop/sutrakrit-gita under MIT (substrate library) and CC-BY 4.0 (per-verse rendered objects). Reproducible byte-for-byte on a laptop with 8 GB RAM.

Where this is going.

The Bhagavad-Gītā is the first text Sūtrakṛt has been built around because it is the most institutionally-coupled, longest-lived, most-instrumented testbed of bounded polysemy available — the equivalent of ImageNet for the substrate-rendered-edition idea. The architecture is designed to extend.

The next planned editions are the Yoga-Sūtras (with Vyāsa's bhāṣya + Vācaspati Miśra's Tattva-Vaiśāradī + Vijñānabhikṣu's Yoga-Vārttika commentary chain) and the principal Upaniṣads with their bhāṣya panels. The schema also generalizes — with per-domain feature engineering — to halakhic responsa, Quranic tafsīr, common-law precedent, critical-edition philology, and other corpora where bounded interpretive disagreement has been institutionally preserved.

figureBounded Polysemy — the architecture of meaningwhy one reading is too few, infinite readings are too many, and a bounded substrate is the discipline in between

i.The problem — pattern collapse

Modern systems answer what does this mean? by averaging across a corpus of disagreement. The output reads fluent. The texture — what each tradition actually said — is gone.

“Tomato soup survives blending. Bolognese does not.”

ii.The solution — six-layer per-verse object

Every Bhagavad-Gītā verse is rendered against the same six-layer schema. Each layer carries witnesses back to a named source — no interpretation appears without the bhāṣya passage that licenses it.

The schema is the contribution; the BG edition is the existence proof.

iii.Case study — BG 18.63 & the ports

The same schema ports — with per-domain feature engineering — to halakhic responsa, Quranic tafsīr, common-law precedent, critical-edition philology, and to two AI applications already drafted as papers:

Sūtrakṛt for Code — per-function projections (security / performance / functional / OOP) with tools-as-witnesses (compilers, fuzzers, static analyzers).
Sūtrakṛt for Context-Engineering — modal-tagged context items (authority / retrieval / query) so prompt injection becomes a structurally-blockable boundary violation rather than a prose-following accident.