Skip to main content

STEPBible-Data

  • Repository: tyndale/STEPBible-Data
  • Maintainer: Tyndale House, Cambridge (academic institution)
  • License: CC BY 4.0 — allows commercial and non-commercial use, modification, and redistribution with attribution. Fully compatible with open-source distribution.
  • Suitability Score: ⭐⭐⭐⭐⭐ (5/5)

Coverage

Format: Tab-separated values (TSV) files. Each dataset is a single large file or a set of files with consistent column structure. Headers describe columns. Straightforward to parse with Python.

Datasets

Dataset IDNameContent
TAHOTTyndale Amalgamated Hebrew OTHebrew OT with Strong's numbers + morphological tags per word
TAGNTTyndale Amalgamated Greek NTGreek NT with Strong's numbers + morphological tags per word
TTESVTyndale Translation–ESVESV translation aligned word-by-word to Hebrew/Greek
TBESHTyndale Brief Hebrew Lexicon~8,700 entries, BDB-derived, brief glosses
TBESGTyndale Brief Greek Lexicon~5,500 entries, LSJ-derived, brief glosses
TFLSJTyndale Full LSJ Greek LexiconFull Liddell-Scott-Jones Greek lexicon entries
TIPNRTyndale Individuated Proper Names with Ref~3,000 unique individuals + ~1,000 places; birth/death, family trees, geocoding, verse references
TVTMSTyndale Versification Traditions MappingMaps verse IDs across English, Hebrew, Greek, Latin, Syriac, and other traditions
TEHMCTyndale Edition-specific Hebrew ManuscriptsHebrew text variant comparison across editions
TEGMCTyndale Edition-specific Greek ManuscriptsGreek text variant comparison across editions

Coming Soon

Dataset IDNameContent
TAGOTTyndale Amalgamated Greek OTLXX with Strong's + morphological tags (Septuagint tagged!)
TFBDBTyndale Full BDB Hebrew LexiconFull Brown-Driver-Briggs Hebrew lexicon
TOTMM / TNTMMTyndale OT/NT Morphological ManuscriptsMorphological analysis per manuscript tradition
TBCWGTyndale Brief Contextual Word GlossesContext-sensitive translation glosses

Quality

High. Tyndale House is a respected academic institution. Data undergoes scholarly review. TAHOT and TAGNT are amalgamated from multiple academic sources with cross-verification.

Gaps Filled

  • ✅ Morphological tags (Hebrew via TAHOT, Greek via TAGNT) — Critical gap
  • ✅ Source tokens (embedded in TAHOT/TAGNT word-level data) — Critical gap
  • ✅ Person names database (TIPNR — ~3,000 individuals + ~1,000 places)
  • ✅ Place names / geocoding (TIPNR includes coordinates)
  • ✅ Versification mapping (TVTMS — multi-tradition)
  • ✅ Extended lexicons (TBESH, TBESG, TFLSJ)
  • 🔜 Septuagint tagged (TAGOT — coming)
  • 🔜 Full BDB Hebrew lexicon (TFBDB — coming)

Integration Notes

  • TSV parsing is trivial in Python — add a StepBibleParser class to the ingest pipeline
  • TAHOT/TAGNT data maps directly to enriching existing :InterlinearWord nodes with morphology, pos, and parsing properties
  • TIPNR would create new :Person and :Place node types in FalkorDB with relationship edges to :Passage nodes
  • TVTMS would create :VersificationMapping edges between :Passage nodes across translations
  • Strong's numbers in STEPBible data align with GospeLib's existing Strong's-keyed lexicon entries
  • Requires new ingest pipeline stages or extension of existing stages (stages for morphology enrichment, people/places, versification)