
The app is portfolio-ready today, and it’s also a foundation for something bigger: a benchmarking tool that can compare multiple speech-to-text engines side-by-side.
Data Visualization
Full Stack Development
WERscore Labs
DateAug 2022 - Feb 2023
LinkSo I focused on two things:
This turns an abstract metric into something you can demo to a hiring manager, explain to a teammate, or eventually use to compare ASR engines.
Substitutions are the most informative error type, but showing both words inline can make the line unreadable.
Instead, the UI shows the reference token as the chip, and reveals the hypothesis token on hover using a tooltip. It’s a clean interaction pattern: obvious when you want details, invisible when you don’t.WER evaluation should be robust to inconsistent spacing and casing, so tokenization is intentionally simple and predictable: lowercase + whitespace split.
const tokenize = (s: string) => s.toLocaleLowerCase().trim().split(/\s+/).filter(Boolean)
const compute = useCallback((hypothesis: string, reference: string) => {
const hTokens = tokenize(hypothesis)
const rTokens = tokenize(reference)
setAlignment([])
setIsRunning(false)
setReferenceWordCount(rTokens.length)
const maxSteps = Math.max(hTokens.length, rTokens.length) + 10
engineRef.current = {
hTokens,
rTokens,
i: 0,
j: 0,
step: 0,
maxSteps,
done: false,
}
}, [])
The goal here: reset UI state, store token arrays, and set up a small “engine cursor” (i, j) that we can animate forward.Instead of calculating everything at the end, the UI derives totals from the emitted alignment tokens — meaning the counters and WER stay correct during playback and manual stepping.
const totals = useMemo(() => {
return alignment.reduce(
(acc, { type }) => {
acc[type] += 1
return acc
},
{ ...emptyTotals },
)
}, [alignment])
const wer = useMemo(() => {
if (referenceWordCount === 0) return undefined
const errors = totals.DELETED + totals.INSERTED + totals.SUBSTITUTED
return errors / referenceWordCount
}, [totals, referenceWordCount])
This is what makes the project feel like a live instrument panel rather than a static result page.
The heart of the demo is the single-step function. Each call emits exactly one alignment token — which then appears as one animated chip.
A key detail: when there’s a mismatch, the engine checks one token ahead to decide whether the best explanation is:
if (h !== r) {
if (rTokens[e.j + 1] === h) {
setAlignment((prev) => [...prev, { word: r!, type: 'DELETED', substitution: undefined }])
e.j += 1
e.step += 1
return
}
if (hTokens[e.i + 1] === r) {
setAlignment((prev) => [...prev, { word: h!, type: 'INSERTED', substitution: undefined }])
e.i += 1
e.step += 1
return
}
setAlignment((prev) => [...prev, { word: r!, type: 'SUBSTITUTED', substitution: h }])
e.i += 1
e.j += 1
e.step += 1
return
}
This structure is intentionally simple because the project is meant to be:
The alignment tokens render as chips, animated into view as they stream in. Substitutions get tooltips to reveal the hypothesis token on hover.
<motion.span
className={`rounded px-1 py-0.5 ${colorHighlightClass[t.type]} ${t.type === 'INSERTED' ? 'line-through' : ''}`}
initial={{ opacity: 0, y: -10 }}
animate={{ opacity: 1, y: 0 }}
transition={transition}
>
{t.word}
</motion.span>
This is the “aha” moment of the project: the user doesn’t just read a score — they watch the operations that create it.This app already works as a clean portfolio demo, but it’s also a strong base for a more practical tool.
Planned upgrades:
npm install
npm run dev
This started as a screening prompt, but I treated it like a product: make WER understandable, interactive, and explainable. The end result is a visual alignment engine with live metrics, animated word-level feedback, and a clear path to evolving into a full benchmarking suite for speech-to-text engines.