Healthcare Voice API
Medical-specialized speech recognition and synthesis. Proven accuracy in clinical environments with real-time streaming.
Generic STT models struggle with medical terminology. Persly Voice is trained on healthcare-specific vocabulary for superior accuracy.
100K+ medical terms, drug names, and disease names
Accurately distinguishes homophones in medical context
Cross-language medical term recognition
WebSocket-based live transcription
Automatic doctor/patient conversation separation
Sentence-level time information
Per-result confidence scoring
Optimized tone for medical information delivery
Accurate drug and disease name pronunciation
Informational/warning/normal tone selection
Various speaker styles available
| Supported Languages | 70+ languages |
| Sample Rate | 16kHz / 48kHz |
| Encoding | PCM, WebM, MP3 |
| Latency | < 200ms (streaming) |
| Max Audio Length | 4 hours (batch) / Unlimited (streaming) |
| Supported Languages | 70+ languages |
| Sample Rate | 22.05kHz / 48kHz |
| Output Format | PCM, MP3, OGG |
| First Byte Latency | < 150ms |
| Max Text Length | 10,000 characters / request |
import asyncio
from persly import Voice
client = Voice(api_key="YOUR_API_KEY")
async def transcribe_stream():
async for result in client.transcribe_stream(
audio_stream=microphone_stream(),
language="en",
enable_medical_mode=True,
speaker_diarization=True
):
print(f"[{result.speaker}] {result.text}")
print(f" Confidence: {result.confidence}")
print(f" Medical terms: {result.medical_terms}")
asyncio.run(transcribe_stream())Word Error Rate (WER) comparison on medical speech datasets. Lower is better.
Drug Name Accuracy %
Word error rate on general medical conversations
Word error rate on medical terminology
Correct recognition of medication names
Real-time transcription delay
| Metric | Persly Voice | Competitors Avg |
|---|---|---|
| First Result Latency | 180ms | 350ms |
| Streaming Delay | < 200ms | 400-800ms |
| TTS First Byte | 120ms | 300ms |
* WER = Word Error Rate (lower is better) * Benchmarked on 1,000 medical consultation recordings across 4 languages
Live transcription of doctor-patient conversations with automatic speaker separation and EMR integration
Hospital appointment bots, medication reminder calls, health consultation voice assistants
Voice-based prescription writing, medical report dictation, test result recording
Medication guidance voice synthesis, pre-surgery instructions, multilingual patient information
Streaming uses WebSocket for real-time results, ideal for live consultations. Batch processes entire audio files at once, better for transcribing recordings.
Monthly updates include new drug names and medical terms. Enterprise plans support custom vocabulary additions.
Yes. All voice data is encrypted in transit and deleted immediately after processing. We comply with HIPAA, GDPR, and local privacy regulations.
Voice API works as an input/output layer for the Embed → Finder → Rerank → LLM pipeline. Use a single API key for all services.
Let's discuss how our APIs can power your healthcare product