We built Helix around the approval loop, with voice as a first-class capture mode alongside typed input. The idea was straightforward: most owner-operators are mobile and busy, voice is the fastest input on a phone where it works, and approving a draft is the fastest output. Together, the two should collapse a 200-email morning into a 60-second swipe.

Most of that worked. Some of it didn’t. These are the design notes.

What worked

Voice as the fast path, not the only path

The single best decision was treating typed input as a peer of voice, not a fallback. Every voice flow has a typed equivalent that runs the same logic with the same UX. When the Web Speech API is unavailable (older iOS Safari, some embedded browsers), nothing breaks; the user just types.

A ranked queue, not autocomplete on the input

We tried autocompleting voice input early on. It felt magical for two minutes and broken thereafter — voice transcripts are messy, and a half-formed correction is worse than no correction. We moved the ranking layer one step downstream: instead of predicting what you’re saying, we rank the queue so the most likely next approval is on top. The v1 ranking uses heuristics (recency, sender, thread state); per-identity learning is on the roadmap. The ranking runs client-side and never requires the user to disambiguate.

Approve in one tap, edit in two

Approve is one tap. Edit is the same tap held a beat longer. Skip is a swipe. Three gestures, no menus, and the gesture set works whether your hands are dry or wet, gloved or bare.

What failed

Always-on dictation

We shipped an early version with a wake-word listener. It pulled too much battery, surfaced false positives, and freaked out a few users who did not realise the mic was hot. We removed it before launch. Voice now starts with an explicit tap-and-hold; the mic stops the moment you release.

Cross-locale TTS for narration

The narrated walkthrough at /what-is-helix sounds great in English. We tried generating equivalent voice tracks for Spanish, French, and Japanese and the result was uncanny-valley. Until we have time to do it right, the walkthrough ships English-only with a captions track for everyone else.

What we kept

Voice as a fast path, where it pays off: triage, approval, and the agent setup flow. The typed flow runs the same logic with no missing features — it is a peer, not a fallback. The narration plays once, gracefully degrades on autoplay-blocked browsers, and never gates the demo behind audio. The ranked queue surfaces the next decision but never autoplays an action.

The result: the design language is "press the mic, hear the agent, tap to approve" — and every step still works if you turn the audio off.

Voice-first inboxes — design notes