Format Compatibility Assessment
How each data source maps to GospeLib's JSON schema, FalkorDB graph model, and ingest pipeline.
Mapping to scripture-text v2.0.0 JSON Schema
| Source | Native Format | Conversion Difficulty | Notes |
|---|---|---|---|
| STEPBible-Data | TSV | Low | Column mapping to JSON properties is straightforward |
| scrollmapper | SQLite/JSON/CSV | Low | JSON variant is closest; SQLite requires SQL extraction |
| lxx-swete | TSV | Low | Verse-level text → scripture-text passages |
| MorphGNT | Space-separated text | Low | Fixed-column format, easily parsed |
| OpenScriptures morphhb | OSIS XML | Medium | XML namespace handling, <w> element extraction |
| viz.bible | CSV/JSON/Neo4j | Low–Medium | JSON is direct; Neo4j dump needs Cypher adaptation |
| biblicalhumanities | HTML/XML/CSV (varies) | Medium | Multiple formats across repos |
| berean.bible | USFM/xlsx/tsv | Medium | USFM needs parser; xlsx needs openpyxl |
| ebible.org | USFM | Medium | Standard format but needs USFM parsing infrastructure |
| unfoldingWord | USFM | Medium | Same USFM consideration as ebible.org |
| marvel.bible | Custom modules | High | Unknown format, likely needs reverse engineering |
| Clear-Bible MACULA | XML (lowfat/nodes/TEI) + TSV | Medium | XML namespaces; TSV variant simplifies parsing |
| ETCBC/dss | Text-Fabric | High | Requires text-fabric Python package; custom ETL |
| SEDRA IV | JSON (REST API) | Low | Standard JSON response; API consumption |
| Sefaria | JSON / MongoDB dump | Medium | Bulk export is MongoDB BSON; API is clean JSON |
| CrossWire SWORD | SWORD binary → OSIS XML | Medium | Requires mod2osis CLI; then standard XML parsing |
Mapping to FalkorDB Graph Model
| Data Type | Source(s) | Graph Mapping |
|---|---|---|
| Morphological enrichment | STEPBible, MorphGNT, morphhb | Properties on existing :InterlinearWord nodes |
| New translations | scrollmapper, lxx-swete, ebible, berean | New :Translation → :Book → :Passage subgraphs |
| Cross-references | scrollmapper | :CROSS_REFERENCES edges between :Passage nodes (with votes property) |
| People | STEPBible TIPNR, viz.bible | New :Person nodes with :MENTIONED_IN edges to :Passage |
| Places | STEPBible TIPNR, viz.bible | New :Place nodes with :MENTIONED_IN edges to :Passage, geocoding properties |
| Versification | STEPBible TVTMS | :MAPS_TO edges between :Passage nodes across traditions |
| Events | viz.bible | New :Event nodes with :PARTICIPANT, :LOCATED_AT, :REFERENCED_IN edges |
| Lexicon additions | STEPBible TFLSJ, Dodson, berean | Properties/nodes enriching existing :LexiconEntry nodes |
| Syntax trees | MACULA | :SyntaxNode tree with :CHILD_OF edges, linked to :InterlinearWord |
| DSS transcriptions | ETCBC/dss | :Manuscript → :Fragment → :DSSWord nodes with linguistic properties |
| Aramaic lexicon | SEDRA IV, Sefaria | :AramaicLexiconEntry nodes or combined into :LexiconEntry with language: "aramaic" |
| Commentary | SWORD | :Commentary → :CommentaryEntry nodes with :COMMENTS_ON edges to :Passage |
Ingest Pipeline Integration
GospeLib's ingest pipeline is a 7-stage linear Python/Click pipeline with Pydantic validation. New data sources would extend this pipeline:
| New Stage | Sources | Description |
|---|---|---|
| Morphology enrichment | STEPBible TAHOT/TAGNT, MorphGNT, morphhb | MERGE morphological properties onto existing :InterlinearWord nodes |
| Cross-reference ingestion | scrollmapper | CREATE :CROSS_REFERENCES edges from TSK data |
| People/Places ingestion | STEPBible TIPNR, viz.bible | CREATE :Person and :Place nodes with relationship edges |
| Translation ingestion (extended) | scrollmapper, lxx-swete, ebible, berean | Existing TranslationPipeline with format-specific pre-parsers |
| Versification mapping | STEPBible TVTMS | CREATE :MAPS_TO edges between verse systems |
| Syntax tree ingestion | MACULA | CREATE :SyntaxNode tree structures linked to word-level nodes |
| DSS ingestion | ETCBC/dss | CREATE :Manuscript fragments with word-level linguistic data |
| Aramaic lexicon ingestion | SEDRA IV, Sefaria | CREATE or MERGE :LexiconEntry nodes for Aramaic vocabulary |
| Commentary ingestion | SWORD | CREATE :Commentary and :CommentaryEntry nodes from OSIS XML |
Each new stage would follow the existing pipeline pattern: Pydantic model validation → batch UNWIND/MERGE Cypher writes → idempotent via MERGE.