Skip to main content

Recommended Adoption Plan

A three-phase plan for integrating external data sources, ordered by criticality and dependency.

Phase 1 — Critical (Immediate)

Fill the most critical gaps with the highest-quality, most permissively licensed sources.

SourceDatasetsGaps FilledLicenseComplexity
STEPBible-DataTAHOT, TAGNTMorphological tags + source tokens (Hebrew & Greek)CC BY 4.0Medium — TSV parsing + enrichment of existing interlinear nodes
lxx-sweteFull Swete LXXSeptuagint textPublic domain (likely)Low — simple TSV, new Translation in FalkorDB
STEPBible-DataTVTMSVersification mappingCC BY 4.0Medium — new edge type in graph
Clear-Bible MACULA (optional)macula-greek, macula-hebrewEnhanced morphology + syntax trees + semantic rolesCC BY 4.0 (partial)Medium — XML/TSV parsing + new :SyntaxNode graph structure

Note on MACULA: MACULA offers richer data than STEPBible for morphology (syntax trees, semantic roles, coreference) but requires legal review on semantic domain data (MARBLE/UBS terms are "used with permission"). The core morphology + syntax data is CC BY 4.0 and can be adopted independently. If legal review clears the semantic domain data, MACULA could become the primary morphological source, with STEPBible as fallback.

Prerequisites: None — all data is publicly available on GitHub.

Estimated new ingest pipeline stages: 2 (morphology enrichment, LXX ingestion). Versification can be a sub-stage of morphology enrichment.

Impact: Resolves the two Critical gaps (morphology, source tokens) plus one High gap (Septuagint). Transforms the interlinear feature from placeholder to fully functional scholarly tool.

Phase 2 — High Priority

Important enhancements requiring moderate integration work.

SourceDatasetsGaps FilledLicenseComplexity
scrollmappercross_references tableCross-references (~340K)MITLow — SQLite/JSON, new edge type
scrollmapperVulgate translation tablesVulgate (5 variants)MITLow — fits existing translation pipeline
STEPBible-DataTIPNRPerson names + place geocodingCC BY 4.0Medium — new node types :Person, :Place
MorphGNTSBLGNT morphologyCross-validation of Greek NT morphologyCC-BY-SA 3.0Low — enrichment data, check share-alike
OpenScripturesmorphhbCross-validation of Hebrew OT morphologyCC BY 4.0Low — enrichment data

Prerequisites: Phase 1 morphology enrichment stage completed (provides the integration pattern). FalkorDB schema extended for :Person and :Place nodes.

Estimated new ingest pipeline stages: 2 (cross-references, people/places).

Impact: Resolves cross-references (High), Vulgate (Medium), and people/places (Medium). Adds significant navigational and scholarly depth.

Phase 3 — Enhancement

Nice-to-have additions for future consideration.

SourceDatasetsGaps FilledLicenseComplexity
ebible.orgCurated translations (20–30)Additional translationsPer-translationMedium — USFM parsing infrastructure
berean.bibleBSB + Translation TablesModern translation + interlinear alignmentFreeMedium — xlsx/tsv parsing
viz.bibleEvents databaseTimeline featuresRequest-basedMedium — new :Event node type
biblicalhumanitiesDodson LexiconGreek lexicon supplementCC0Low — merge with existing entries
scrollmapperAdditional translationsMultilingual translationsMITLow — fits existing pipeline
unfoldingWordusfm-js, wordMAPUSFM tooling for future ingestionCC BY-SA 4.0Low — tooling dependency
ETCBC/dssText-Fabric DSS corpusDead Sea Scrolls textMITHigh — Text-Fabric ETL + new node types
SEDRA IV + SefariaREST API + bulk exportAramaic lexicon (composite)Apache 2.0 / CC-BY-NCMedium — API consumption + license review
CrossWire SWORDCommentary modulesPublic-domain commentaries (~10)Public domainMedium — SWORD→OSIS→JSON pipeline
Clear-Bible MACULASyntax trees + semantic rolesDiscourse analysisCC BY 4.0 (partial)Medium — XML parsing + new graph structure

Prerequisites: Phase 2 complete. USFM parsing infrastructure built (needed for ebible.org and unfoldingWord sources).

Impact: Broadens translation coverage and adds supplementary scholarly tools. Good for internationalization and advanced features.