Pipeline completo de Speech-to-Speech para ensino de portugues, deployavel em multiplas plataformas.
| Platform | Directory | Protocol | GPU |
|---|---|---|---|
| Modal | modal/ |
HTTP + SSE streaming | L4, A10G, A100, H100 |
| TensorDock / VAST.ai | app.py |
WebSocket | RTX 3090, RTX 4090 |
Audio -> Whisper v3 Turbo (STT) -> Ministral 3B (LLM) -> Qwen3-TTS (TTS) -> Audio
All 3 models run in a single GPU container ("Moshi pattern") for lowest latency.
| Component | Model | Role |
|---|---|---|
| STT | openai/whisper-large-v3-turbo |
Speech-to-text |
| LLM | mistralai/Ministral-3-3B-Instruct-2512-BF16 |
Response generation |
| TTS | Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice |
Text-to-speech (streaming) |
pip install modal
export MODAL_TOKEN_ID="ak-..."
export MODAL_TOKEN_SECRET="as-..."
python3 -m modal deploy modal/modal_parle_whisper.py
See modal/README.md for full documentation.
Audio -> Whisper (STT) -> CEFR Classifier -> Gemma 3 4B (LLM) -> Kokoro (TTS) -> Audio
| Component | Model | Role |
|---|---|---|
| STT | openai/whisper-small |
Speech-to-text |
| LLM | RedHatAI/gemma-3-4b-it-quantized.w4a16 |
Response generation |
| TTS | hexgrad/Kokoro-82M |
Text-to-speech |
| CEFR | marcosremar2/cefr-classifier-pt-mdeberta-v3-enem |
Level classification |
POST /api/stream-audio (multipart/form-data with audio file)
event: status data: {"stage": "stt"}
event: transcript data: {"transcript": "...", "stt_ms": N}
event: status data: {"stage": "llm"}
event: response data: {"response": "...", "llm_ms": N}
event: status data: {"stage": "tts"}
event: audio data: {"chunk": "<base64 WAV>", "index": N}
event: complete data: {"transcript":"...", "response":"...", "timing":{...}}
const ws = new WebSocket('ws://HOST:PORT/ws/stream');
ws.send(audioBlob); // Send recorded audio
ws.onmessage = (event) => {
if (event.data instanceof Blob) playAudioChunk(event.data);
else console.log(JSON.parse(event.data));
};
No API keys are hardcoded. All credentials are passed via environment variables:
| Variable | Platform | Description |
|---|---|---|
MODAL_TOKEN_ID |
Modal | Modal API token ID |
MODAL_TOKEN_SECRET |
Modal | Modal API token secret |
TENSORDOCK_API_TOKEN |
TensorDock | TensorDock API token |
TENSORDOCK_INSTANCE_ID |
TensorDock | Instance ID for auto-stop |
MIT