graphemes++

A grapheme-aware toolkit for evaluating Tamil and Sinhala text. One visually-perceived character is often several Unicode code points - these tools measure at the grapheme level, complementing character-based metrics like chrF.

Graphemizer

Segment text into visually-perceived grapheme clusters and see how the count differs from the raw unicode-point count - the gap character-level metrics miss.

Open graphemizer →

Evaluation Metrics

Compare a reference and a prediction with grapheme-level chrF / chrF++, CER and Levenshtein, each next to its standard unicode-point baseline. Paste text or upload two files.

Open metrics →

Tamil / Sinhala Decomposition

Segment text into visually-perceived grapheme clusters and decompose complex scripts into their phonetic units - then recompose to verify the round-trip.

Open decomposition →