Clear-Bible MACULA
- Repository: Clear-Bible/macula-greek and Clear-Bible/macula-hebrew
- Maintainer: Clear Bible, Inc. (published by Biblica, Inc.)
- License: CC BY 4.0 (top-level), but MARBLE/UBS semantic domain data is "used with permission" — needs legal review for redistribution. SIL glosses also have a custom license.
- Suitability Score: ⭐⭐⭐⭐⭐ (5/5) — but note licensing caveats for semantic domain data
Coverage
Format: XML (3 variants: nodes, lowfat, TEI) + TSV flat export. Per-book files.
- macula-greek: Full NT based on Nestle1904 + SBLGNT. Syntax trees with roles, Strong's numbers, Louw-Nida/SDBH semantic domains, semantic frames, participant referents, English + Mandarin glosses, word senses.
- macula-hebrew: Full OT based on Westminster Leningrad Codex. Same depth of analysis. Updated Feb 2026.
- Fields per word: morphology, lemma, Strong's, part of speech, syntax role, semantic domain, word sense, gloss, unique xml:id, USFM ref.
Quality
Very high. Richest open biblical linguistic dataset available — exceeds STEPBible in depth (syntax trees, semantic roles, coreference). Actively maintained.
Gaps Filled
- ✅ Morphological tags (Hebrew + Greek) — comprehensive per-word analysis with syntax trees
- ✅ Source tokens — individual word forms with lemmas and unique IDs
- ✅ Syntax / discourse analysis (Gap #14) — full syntax trees with semantic roles
- 🔶 Pericope divisions (SBLGNT subdivisions included)
Integration Notes
- XML/TSV parsing straightforward in Python. Tree structure maps naturally to FalkorDB graph.
- Could supersede or complement STEPBible for morphology
- The mixed licensing requires legal review — the morphology + syntax data is CC BY 4.0, but semantic domain labels may have restrictions
- Unique word IDs enable precise alignment with other datasets
- Per-book file structure aligns with existing ingest pipeline patterns
- Would create new
:SyntaxNodetree structures in FalkorDB linked to:InterlinearWordnodes