graphemes++
A grapheme-aware toolkit for evaluating Tamil and Sinhala text. One visually-perceived character is often several Unicode code points - these tools measure at the grapheme level, complementing character-based metrics like chrF.
Graphemizer
Segment text into visually-perceived grapheme clusters and see how the count differs from the raw unicode-point count - the gap character-level metrics miss.
Open graphemizer →Evaluation Metrics
Compare a reference and a prediction with grapheme-level chrF / chrF++, CER and Levenshtein, each next to its standard unicode-point baseline. Paste text or upload two files.
Open metrics →Tamil / Sinhala Decomposition
Segment text into visually-perceived grapheme clusters and decompose complex scripts into their phonetic units - then recompose to verify the round-trip.
Open decomposition →