Rust + Tauri
The Stack
| Layer | Technology | Why |
|---|---|---|
| Core Logic | Rust | Performance, safety, no GC |
| Desktop Shell | Tauri v2 | Lightweight, secure |
| UI | React | Developer familiarity |
| Inference | ONNX Runtime | Universal model format |
Why Tauri?
Tauri uses the system’s native webview instead of bundling Chromium, which reduces binary size and RAM usage. For a voice assistant that may run continuously, lower idle resource usage helps.
Note: The app requires ~500MB of model downloads on first run, so the binary size savings are offset by the ML models. The main benefit is runtime efficiency.
The Tauri Architecture
┌─────────────────────────────────────────────────┐
│ Tauri App │
│ ┌───────────────────┐ ┌────────────────────┐ │
│ │ Rust Backend │ │ WebView (UI) │ │
│ │ │ │ │ │
│ │ ┌─────────────┐ │ │ React + TypeScript│ │
│ │ │ Audio Bus │ │ │ │ │
│ │ │ STT Engine │◀─┼──┼─ invoke() │ │
│ │ │ VAD │──┼──┼─▶ events │ │
│ │ └─────────────┘ │ │ │ │
│ └───────────────────┘ └────────────────────┘ │
└─────────────────────────────────────────────────┘
- Rust Backend: All heavy lifting (audio, inference, VAD)
- WebView: Native OS webview (not bundled Chromium)
- Communication: Tauri’s IPC (commands + events)
The UI is a Passenger
The React frontend is intentionally “dumb”:
- It displays text from the backend
- It sends commands (start/stop recording)
- It never touches audio data directly
This separation means:
- UI bugs can’t crash the audio pipeline
- The UI can be replaced without touching core logic
- Heavy computation never blocks rendering
Security Model
Tauri uses a capability-based permission system:
// plugins/recorder/permissions/default.json
{
"permissions": ["recorder:start", "recorder:stop"],
"deny": ["fs:write", "shell:execute"]
}
Each plugin declares exactly what it needs. Everything else is denied by default.