📚 Knowledge Library
Complete reference for ScribeLive capabilities, languages, and API features
Generates a concise summary of the transcript. Available in Batch API only (SaaS).
| Option | Values | Default |
|---|---|---|
| content_type | auto conversational informative | auto |
| summary_length | brief detailed | brief |
| summary_type | paragraphs bullets | paragraphs |
JSON output only Most languages
API path: response.summary.content
Labels transcript segments as positive, negative, or neutral with confidence scores. Includes per-speaker and per-channel breakdown.
| Data | Location |
|---|---|
| Segments array | sentiment_analysis.segments[] |
| Each segment has | text, sentiment, start_time, end_time, speaker, channel, confidence |
| Overall summary | sentiment_analysis.summary.overall (positive_count, negative_count, neutral_count) |
| Per speaker | sentiment_analysis.summary.speakers[] |
| Per channel | sentiment_analysis.summary.channels[] |
JSON output only Most languages Batch + On-prem
Splits transcript into titled, timestamped chapters with summaries. Non-overlapping and exhaustive (covers entire audio).
| Field | Type |
|---|---|
| title | Auto-generated chapter title |
| summary | Paragraph summary of chapter content |
| start_time | Seconds (float) |
| end_time | Seconds (float) |
JSON output only Best with 10+ min audio Best with diarization ON
Not supported: Irish, Maltese, Urdu, Bengali, Swahili
API path: response.chapters[]
Detects topics in the audio and labels segments. Default topics: Business & Finance, Education, Entertainment, Events & Attractions, Food & Drink, News & Politics, Science, Sports, Technology & Computing, Travel.
Custom topics: up to 10 can be provided via topic_detection_config.topics[].
| Data | Location |
|---|---|
| Segments | topics.segments[] — each has text, start_time, end_time, topics[] |
| Summary | topics.summary.overall — object with topic counts |
English audio ONLY JSON output only Max 10 custom topics
Detects non-speech audio events: music, applause, laughter. Returns timestamped events with confidence scores.
| Field | Description |
|---|---|
| type | Event type (music, applause, laughter) |
| start_time | Seconds (float) |
| end_time | Seconds (float) |
| confidence | 0.0 to 1.0 |
| channel | If channel diarization enabled |
All languages
API paths: audio_events[] and audio_event_summary.overall
Audio Clarity (audio_enhancement): Pre-processes audio to reduce noise before transcription. Applied during file conversion.
Clean Speech (remove_disfluencies): Removes filler words like "um", "uh", "like" from the transcript output.
All languages
Critical for Malaysian market — handles speakers who switch between languages mid-sentence.
| Pack | Code | Use Case |
|---|---|---|
| English + Malay | en_ms | Most important — Malaysian courts |
| English + Mandarin | cmn_en | Bilingual proceedings |
| English + Tamil | en_ta | Malaysian/Singapore courts |
| Arabic + English | ar_en | Middle Eastern clients |
| Multi (4 languages) | cmn_en_ms_ta | Multi-language hearings |
| Spanish + English | es with domain bilingual-en | Special config required |
Intelligence features may not work with code-switching packs
- Only to/from English — cannot translate Malay→Mandarin directly
- Maximum 5 target languages per job
- JSON output format only — TXT and SRT do not include translations
- If source = target, returns transcription unchanged
- Channel diarization info is not included in translation output
All can be translated to AND from English.
NOT supported as translation targets: Arabic, Tamil, Thai, Bengali, Urdu, Cantonese, Persian, Hebrew, Swahili, Zulu, and all South Asian languages except Hindi
| Mode | Value | Description |
|---|---|---|
| Speaker | speaker | Identifies speakers by voice. Most common. |
| Channel | channel | Each audio channel = separate speaker. Perfect for court setups with per-mic channels. |
| Channel + Speaker | channel_and_speaker | Combines both — separate channels with speaker ID within each channel. |
| None | none | No speaker labels. |
| Setting | Config Key | Notes |
|---|---|---|
| Max speakers | speaker_diarization_config.max_speakers | Limits detected speakers. Only applies to non-enrolled generic speakers. |
| Sensitivity | speaker_diarization_config.speaker_sensitivity | 0.0-1.0. Lower = fewer speaker switches. Higher = more sensitive detection. |
| Prefer current speaker | speaker_diarization_config.prefer_current_speaker | Reduces false speaker switching for similar-sounding speakers. |
ScribeLive supports enrolling speaker audio samples to auto-identify known speakers. Workflow:
- Run diarization on a solo-speaker audio → call GetSpeakers to get voice identifiers
- Store identifiers in a Voice Bank
- On next transcription, provide identifiers in speaker_diarization_config.speakers[]
- Enrolled speakers are tagged by name; unknown speakers get generic labels (S1, S2...)
Not yet built in ScribeLive — parked for Phase 2
| Domain | Value | Notes |
|---|---|---|
| Legal | legal | English only! Sending with non-English audio returns 400 error. Do NOT use with code-switching packs. |
| Medical | medical | Enhanced mode required. Limited language support. |
| Finance | finance | Finance terminology optimization. |
| Bilingual (Spanish) | bilingual-en on language es | Special: set language to Spanish with bilingual-en domain. |
Audio: mp3, wav, webm, ogg, aac, flac, m4a
Video: mp4, mov, avi, mkv (audio extracted automatically)
Max file size: 5GB. converter.py handles all conversion to MP3 before sending to the transcription engine.
WebM is the browser recording format (ScribeRecorder). Converted automatically.
| ScribeLive Label | API Value | Notes |
|---|---|---|
| High Accuracy (recommended) | enhanced | Best quality. Slightly slower. Default. |
| Fast Mode | standard | Faster processing. Lower accuracy. |
All config keys are siblings of transcription_config in the job config JSON.
| Feature | Config Key | Value |
|---|---|---|
| Smart Summary | summarization_config | {content_type, summary_length, summary_type} or {} for defaults |
| Tone Analysis | sentiment_analysis_config | {} |
| Smart Sections | auto_chapters_config | {} |
| Key Topics | topic_detection_config | {} or {topics:["custom1","custom2"]} |
| Sound Detection | audio_events_config | {types:["music","applause","laughter"]} |
| Translation | translation_config | {target_languages:["ms","fr"]} |
| Clean Speech | transcript_filtering_config | {remove_disfluencies:true} |
| Speaker Labels | transcription_config.diarization | "speaker" or "channel" |
| Issue | Detail |
|---|---|
| domain: "legal" + non-English | Returns 400 error. Legal domain is English-only. Never hardcode in recorder submit. |
| Key Topics + non-English | Silently skipped — no error, just no topics returned. |
| Translation + TXT/SRT format | Only JSON includes translations. TXT and SRT formats silently omit them. |
| Mandarin translation code | Use cmn not zh. The engine rejects zh. |
| Translation max targets | Maximum 5 per job. More causes invalid_config error. |
| Smart Sections minimum | Best with 10+ minutes of audio. Short audio returns poor or no chapters. |
| API returns 201 | Job creation returns 201, not 200. Check for both. |
| format_duration(None) | Always use job.get("duration_seconds") or 0. DB field can be NULL for recorded jobs. |
| __pycache__ | Must clear after replacing any Python file or old cached code runs instead. |
| Code-switching + intelligence | Summary, Tone, Sections may not work with bilingual packs (en_ms, cmn_en etc). Test individually. |
| Arabic not a translation target | Arabic (ar) is supported as a source language but NOT as a translation target from English. |
Never use internal engine terms on client-facing surfaces.
| Internal Term | ScribeLive Client Label |
|---|---|
| Summarization | Smart Summary |
| Sentiment analysis | Tone Analysis |
| Auto chapters | Smart Sections |
| Topic detection | Key Topics |
| Audio events | Sound Detection |
| Audio enhancement | Audio Clarity |
| Remove disfluencies | Clean Speech |
| Speaker diarization | Speaker Labels |
| Enhanced operating point | High Accuracy |
| Standard operating point | Fast Mode |