TouchDesigner — Voice Transcription Network

System diagram  /  RMS-triggered audio capture + OpenAI Whisper + image snapshot

CHOP — channel operator
DAT — data operator
TOP — texture operator
COMP — component
External API
input
audiodevicein1
Audio Device In CHOP
Headset mic input
44100 Hz mono
audio
audiofileout1
Audio File Out CHOP
Writes .wav
Record toggled by script
audio
analyze1
Analysis CHOP
RMS Power
per-frame value
auto trigger
analyze1
Analysis CHOP
RMS value stream
value change
chopexec2
CHOP Execute DAT
onValueChange()
threshold: 0.08
silence timeout: 2s
cooldown: 3s
controls
audiofileout1
Audio File Out CHOP
par.record toggled
par.file set per take
manual
button1
Button COMP
Press to start
Press to stop
off to on
chopexec1
CHOP Execute DAT
onOffToOn()
toggle() start/stop
label
button1
Button COMP
[ REC ] / Sending...
/ Press to record
speech-to-text
chopexec2
CHOP Execute DAT
_transcribe()
reads .wav file
HTTP POST
OpenAI Whisper
External API
whisper-1 model
/v1/audio/transcriptions
multipart/form-data
JSON text
transcript_output
Text DAT
Latest transcription
resets each take
dat.text
Text TOP
TOP
Renders text
overlay on output
visual capture
Visual Output TOP
TOP chain
StreamDiffusion +
Text overlay +
Composite
texture
moviefileout1
Movie File Out TOP
top.save() called
on silence trigger
writes
take_YYYYMMDD.jpg
File on disk
Saved to
Desktop/recordings/
high freq
audiodevicein1
Audio Device In CHOP
same mic input
audio
audiofilter1
Audio Filter CHOP
Bandpass
4000–8000 Hz
filtered
analyze1_clap
Analysis CHOP
RMS Power
high band only
value change
chopexec_clap
CHOP Execute DAT
onValueChange()
threshold: 0.15
cooldown: 1s
index++
current_question
Text DAT
Reads from
questions Table DAT
loops at 6
07 — Output Files Per Take
.wav
Raw audio recording
Uncompressed PCM 16-bit
Written by audiofileout1
take_20260306_114319.wav
.txt
Transcription result
Plain UTF-8 text
Written by _save_transcript()
take_20260306_114319.txt
.jpg
Visual snapshot
Captured at silence trigger
Written by moviefileout1.save()
take_20260306_114319.jpg
08 — Key Configuration Parameters
Parameter Default Description
RMS_THRESHOLD0.08Minimum RMS to start recording. Set to ~2x your room noise floor.
SILENCE_TIMEOUT2.0sSeconds of silence before auto-stop and transcription.
RESTART_COOLDOWN3.0sMinimum gap between takes. Prevents noise re-triggering.
CLAP_THRESHOLD0.15High-band RMS to detect clap. Tune after filtering to 4–8kHz.
CLAP_COOLDOWN1.0sIgnores repeat clap signals within this window.
MODELwhisper-1OpenAI model. Alternatively: gpt-4o-transcribe.
LANGUAGEenLanguage hint for Whisper. Empty string for auto-detect.
MAX_FILE_MB24 MBFile size cap before rejecting send. OpenAI limit is 25 MB.