STEPBible-Data

Repository: tyndale/STEPBible-Data
Maintainer: Tyndale House, Cambridge (academic institution)
License: CC BY 4.0 — allows commercial and non-commercial use, modification, and redistribution with attribution. Fully compatible with open-source distribution.
Suitability Score: ⭐⭐⭐⭐⭐ (5/5)

Coverage

Format: Tab-separated values (TSV) files. Each dataset is a single large file or a set of files with consistent column structure. Headers describe columns. Straightforward to parse with Python.

Datasets

Dataset ID	Name	Content
TAHOT	Tyndale Amalgamated Hebrew OT	Hebrew OT with Strong's numbers + morphological tags per word
TAGNT	Tyndale Amalgamated Greek NT	Greek NT with Strong's numbers + morphological tags per word
TTESV	Tyndale Translation–ESV	ESV translation aligned word-by-word to Hebrew/Greek
TBESH	Tyndale Brief Hebrew Lexicon	~8,700 entries, BDB-derived, brief glosses
TBESG	Tyndale Brief Greek Lexicon	~5,500 entries, LSJ-derived, brief glosses
TFLSJ	Tyndale Full LSJ Greek Lexicon	Full Liddell-Scott-Jones Greek lexicon entries
TIPNR	Tyndale Individuated Proper Names with Ref	~3,000 unique individuals + ~1,000 places; birth/death, family trees, geocoding, verse references
TVTMS	Tyndale Versification Traditions Mapping	Maps verse IDs across English, Hebrew, Greek, Latin, Syriac, and other traditions
TEHMC	Tyndale Edition-specific Hebrew Manuscripts	Hebrew text variant comparison across editions
TEGMC	Tyndale Edition-specific Greek Manuscripts	Greek text variant comparison across editions

Coming Soon

Dataset ID	Name	Content
TAGOT	Tyndale Amalgamated Greek OT	LXX with Strong's + morphological tags (Septuagint tagged!)
TFBDB	Tyndale Full BDB Hebrew Lexicon	Full Brown-Driver-Briggs Hebrew lexicon
TOTMM / TNTMM	Tyndale OT/NT Morphological Manuscripts	Morphological analysis per manuscript tradition
TBCWG	Tyndale Brief Contextual Word Glosses	Context-sensitive translation glosses

Quality

High. Tyndale House is a respected academic institution. Data undergoes scholarly review. TAHOT and TAGNT are amalgamated from multiple academic sources with cross-verification.

Gaps Filled

✅ Morphological tags (Hebrew via TAHOT, Greek via TAGNT) — Critical gap
✅ Source tokens (embedded in TAHOT/TAGNT word-level data) — Critical gap
✅ Person names database (TIPNR — ~3,000 individuals + ~1,000 places)
✅ Place names / geocoding (TIPNR includes coordinates)
✅ Versification mapping (TVTMS — multi-tradition)
✅ Extended lexicons (TBESH, TBESG, TFLSJ)
🔜 Septuagint tagged (TAGOT — coming)
🔜 Full BDB Hebrew lexicon (TFBDB — coming)

Integration Notes

TSV parsing is trivial in Python — add a StepBibleParser class to the ingest pipeline
TAHOT/TAGNT data maps directly to enriching existing :InterlinearWord nodes with morphology, pos, and parsing properties
TIPNR would create new :Person and :Place node types in FalkorDB with relationship edges to :Passage nodes
TVTMS would create :VersificationMapping edges between :Passage nodes across translations
Strong's numbers in STEPBible data align with GospeLib's existing Strong's-keyed lexicon entries
Requires new ingest pipeline stages or extension of existing stages (stages for morphology enrichment, people/places, versification)

Coverage​

Datasets​

Coming Soon​

Quality​

Gaps Filled​

Integration Notes​