Grammar Checking Overview¶
Grammar checking in Divvun Runtime combines morphological analysis, disambiguation, spell checking, and grammar rule application to detect and correct errors in text.
What is Grammar Checking?¶
Grammar checking is a multi-stage pipeline that:
- Tokenizes text into analyzable units
- Analyzes word forms morphologically
- Disambiguates readings using context
- Checks spelling and generates corrections
- Applies grammar rules to detect errors
- Generates suggestions for corrections
Output Format¶
Grammar checkers return JSON with error information:
[
{
"form": "wrod",
"beg": 0,
"end": 4,
"err": "typo",
"msg": ["Spelling error", "Word not in dictionary"],
"rep": ["word", "world", "wor"]
},
{
"form": "has went",
"beg": 10,
"end": 18,
"err": "msyn-verb-form",
"msg": ["Wrong verb form", "Use 'gone' after 'has'"],
"rep": ["has gone"]
}
]
Pipeline Architecture¶
A typical grammar checker pipeline includes:
export default function grammarChecker(entry: StringEntry): Command {
// Stage 1: Tokenization
let x = hfst.tokenize(entry, {
model_path: "tokeniser-gramcheck-gt-desc.pmhfst"
});
// Stage 2: Whitespace analysis
x = divvun.blanktag(x, {
model_path: "analyser-gt-whitespace.hfst"
});
// Stage 3: Disambiguation
x = cg3.vislcg3(x, { model_path: "valency.bin" });
// Stage 4: MWE handling
x = cg3.mwesplit(x);
x = cg3.vislcg3(x, { model_path: "mwe-dis.bin" });
// Stage 5: Spell checking
x = divvun.cgspell(x, {
err_model_path: "errmodel.default.hfst",
acc_model_path: "acceptor.default.hfst"
});
// Stage 6: Post-spell disambiguation
x = cg3.vislcg3(x, { model_path: "valency-postspell.bin" });
// Stage 7: Grammar checking
x = cg3.vislcg3(x, { model_path: "grammarchecker.bin" });
// Stage 8: Generate suggestions
return divvun.suggest(x, {
model_path: "generator-gramcheck-gt-norm.hfstol"
});
}
Key Components¶
1. Error Tagging¶
CG3 rules mark errors with special tags:
2. errors.json Mapping¶
Maps error tags to Fluent message IDs:
3. Fluent Message Files¶
Localized error messages:
spelling-error = Spelling error
.desc = The word {$1} is not in the dictionary.
agr-noun-adj = Agreement error
.desc = The adjective {$1} should agree with the noun {$2}.
Required Assets¶
A grammar checker project needs:
assets/
├── tokeniser-gramcheck-gt-desc.pmhfst
├── analyser-gt-whitespace.hfst
├── valency.bin
├── mwe-dis.bin
├── errmodel.default.hfst
├── acceptor.default.hfst
├── valency-postspell.bin
├── grammarchecker.bin
├── generator-gramcheck-gt-norm.hfstol
├── errors.json
├── errors-en.ftl
└── errors-{locale}.ftl
Workflow Example¶
Input text: "I has went to school"
Stage 1 - Tokenize: Break into words
Stage 2-4 - Analyze & Disambiguate: Add linguistic analysis
Stage 5 - Check Grammar: Detect "has went" error
Stage 6 - Generate Output: Create error report
{
"form": "has went",
"err": "verb-form-error",
"msg": ["Wrong verb form", "Use 'gone' after 'has'"],
"rep": ["has gone"]
}
Runtime Configuration¶
Configure at runtime with -c
flag:
# Set locales for error messages
divvun-runtime run -c 'suggest={"locales":["fo","en"]}' bundle.drb "text"
# Ignore specific error types
divvun-runtime run -c 'suggest={"ignore":["typo"]}' bundle.drb "text"
# Use UTF-16 encoding for positions
divvun-runtime run -c 'suggest={"encoding":"utf-16"}' bundle.drb "text"
Next Steps¶
- Understand the Error System
- Configure command options
- See Complete Example