?

📚 Knowledge Library

Complete reference for ScribeLive capabilities, languages, and API features

🧠 Intelligence Features
Smart Summary (Summarization)

Generates a concise summary of the transcript. Available in Batch API only (SaaS).

OptionValuesDefault
content_typeauto conversational informativeauto
summary_lengthbrief detailedbrief
summary_typeparagraphs bulletsparagraphs

JSON output only Most languages

API path: response.summary.content

Tone Analysis (Sentiment Analysis)

Labels transcript segments as positive, negative, or neutral with confidence scores. Includes per-speaker and per-channel breakdown.

DataLocation
Segments arraysentiment_analysis.segments[]
Each segment hastext, sentiment, start_time, end_time, speaker, channel, confidence
Overall summarysentiment_analysis.summary.overall (positive_count, negative_count, neutral_count)
Per speakersentiment_analysis.summary.speakers[]
Per channelsentiment_analysis.summary.channels[]

JSON output only Most languages Batch + On-prem

Smart Sections (Auto Chapters)

Splits transcript into titled, timestamped chapters with summaries. Non-overlapping and exhaustive (covers entire audio).

FieldType
titleAuto-generated chapter title
summaryParagraph summary of chapter content
start_timeSeconds (float)
end_timeSeconds (float)

JSON output only Best with 10+ min audio Best with diarization ON

Not supported: Irish, Maltese, Urdu, Bengali, Swahili

API path: response.chapters[]

Key Topics (Topic Detection)

Detects topics in the audio and labels segments. Default topics: Business & Finance, Education, Entertainment, Events & Attractions, Food & Drink, News & Politics, Science, Sports, Technology & Computing, Travel.

Custom topics: up to 10 can be provided via topic_detection_config.topics[].

DataLocation
Segmentstopics.segments[] — each has text, start_time, end_time, topics[]
Summarytopics.summary.overall — object with topic counts

English audio ONLY JSON output only Max 10 custom topics

Sound Detection (Audio Events)

Detects non-speech audio events: music, applause, laughter. Returns timestamped events with confidence scores.

FieldDescription
typeEvent type (music, applause, laughter)
start_timeSeconds (float)
end_timeSeconds (float)
confidence0.0 to 1.0
channelIf channel diarization enabled

All languages

API paths: audio_events[] and audio_event_summary.overall

Audio Clarity & Clean Speech

Audio Clarity (audio_enhancement): Pre-processes audio to reduce noise before transcription. Applied during file conversion.

Clean Speech (remove_disfluencies): Removes filler words like "um", "uh", "like" from the transcript output.

All languages

🌍 Supported Languages (63+)
Multilingual Code-Switching Packs

Critical for Malaysian market — handles speakers who switch between languages mid-sentence.

PackCodeUse Case
English + Malayen_msMost important — Malaysian courts
English + Mandarincmn_enBilingual proceedings
English + Tamilen_taMalaysian/Singapore courts
Arabic + Englishar_enMiddle Eastern clients
Multi (4 languages)cmn_en_ms_taMulti-language hearings
Spanish + Englishes with domain bilingual-enSpecial config required

Intelligence features may not work with code-switching packs

All Supported Languages
Popular (Southeast Asia)
English (en)Malay (ms)Mandarin (cmn)Tamil (ta)Hindi (hi)Indonesian (id)Thai (th)Vietnamese (vi)Japanese (ja)Korean (ko)Arabic (ar)Cantonese (yue)
European
Bulgarian (bg)Catalan (ca)Croatian (hr)Czech (cs)Danish (da)Dutch (nl)Estonian (et)Finnish (fi)French (fr)German (de)Greek (el)Hungarian (hu)Irish (ga)Italian (it)Latvian (lv)Lithuanian (lt)Macedonian (mk)Maltese (mt)Norwegian (no)Polish (pl)Portuguese (pt)Romanian (ro)Russian (ru)Serbian (sr)Slovak (sk)Slovenian (sl)Spanish (es)Swedish (sv)Ukrainian (uk)Welsh (cy)
South & Central Asian
Bengali (bn)Gujarati (gu)Kannada (kn)Malayalam (ml)Marathi (mr)Nepali (ne)Punjabi (pa)Sinhala (si)Telugu (te)Urdu (ur)
Other
Persian/Farsi (fa)Hebrew (he)Turkish (tr)Swahili (sw)Zulu (zu)Mongolian (mn)Burmese (my)Khmer (km)Lao (lo)Filipino/Tagalog (tl)
🌐 Translation
Translation Rules
  • Only to/from English — cannot translate Malay→Mandarin directly
  • Maximum 5 target languages per job
  • JSON output format only — TXT and SRT do not include translations
  • If source = target, returns transcription unchanged
  • Channel diarization info is not included in translation output
Supported Translation Targets (34 languages)

All can be translated to AND from English.

Bulgarian (bg) · Catalan (ca) · Mandarin (cmn) · Czech (cs) · Danish (da) · German (de) · Greek (el) · Spanish (es) · Estonian (et) · Finnish (fi) · French (fr) · Galician (gl) · Hindi (hi) · Croatian (hr) · Hungarian (hu) · Indonesian (id) · Italian (it) · Japanese (ja) · Korean (ko) · Lithuanian (lt) · Latvian (lv) · Malay (ms) · Dutch (nl) · Norwegian (no) · Polish (pl) · Portuguese (pt) · Romanian (ro) · Russian (ru) · Slovak (sk) · Slovenian (sl) · Swedish (sv) · Turkish (tr) · Ukrainian (uk) · Vietnamese (vi)

NOT supported as translation targets: Arabic, Tamil, Thai, Bengali, Urdu, Cantonese, Persian, Hebrew, Swahili, Zulu, and all South Asian languages except Hindi

👥 Speaker Labels (Diarization)
Diarization Modes
ModeValueDescription
SpeakerspeakerIdentifies speakers by voice. Most common.
ChannelchannelEach audio channel = separate speaker. Perfect for court setups with per-mic channels.
Channel + Speakerchannel_and_speakerCombines both — separate channels with speaker ID within each channel.
NonenoneNo speaker labels.
Speaker Configuration
SettingConfig KeyNotes
Max speakersspeaker_diarization_config.max_speakersLimits detected speakers. Only applies to non-enrolled generic speakers.
Sensitivityspeaker_diarization_config.speaker_sensitivity0.0-1.0. Lower = fewer speaker switches. Higher = more sensitive detection.
Prefer current speakerspeaker_diarization_config.prefer_current_speakerReduces false speaker switching for similar-sounding speakers.
Speaker Identification (Voice Banks) — FUTURE

ScribeLive supports enrolling speaker audio samples to auto-identify known speakers. Workflow:

  • Run diarization on a solo-speaker audio → call GetSpeakers to get voice identifiers
  • Store identifiers in a Voice Bank
  • On next transcription, provide identifiers in speaker_diarization_config.speakers[]
  • Enrolled speakers are tagged by name; unknown speakers get generic labels (S1, S2...)

Not yet built in ScribeLive — parked for Phase 2

🏛 Domain Specializations
DomainValueNotes
LegallegalEnglish only! Sending with non-English audio returns 400 error. Do NOT use with code-switching packs.
MedicalmedicalEnhanced mode required. Limited language support.
FinancefinanceFinance terminology optimization.
Bilingual (Spanish)bilingual-en on language esSpecial: set language to Spanish with bilingual-en domain.
🔊 Audio & File Handling
Supported File Formats

Audio: mp3, wav, webm, ogg, aac, flac, m4a

Video: mp4, mov, avi, mkv (audio extracted automatically)

Max file size: 5GB. converter.py handles all conversion to MP3 before sending to the transcription engine.

WebM is the browser recording format (ScribeRecorder). Converted automatically.

Transcription Modes (Operating Points)
ScribeLive LabelAPI ValueNotes
High Accuracy (recommended)enhancedBest quality. Slightly slower. Default.
Fast ModestandardFaster processing. Lower accuracy.
⚙️ API Configuration Keys
Job Config Structure

All config keys are siblings of transcription_config in the job config JSON.

FeatureConfig KeyValue
Smart Summarysummarization_config{content_type, summary_length, summary_type} or {} for defaults
Tone Analysissentiment_analysis_config{}
Smart Sectionsauto_chapters_config{}
Key Topicstopic_detection_config{} or {topics:["custom1","custom2"]}
Sound Detectionaudio_events_config{types:["music","applause","laughter"]}
Translationtranslation_config{target_languages:["ms","fr"]}
Clean Speechtranscript_filtering_config{remove_disfluencies:true}
Speaker Labelstranscription_config.diarization"speaker" or "channel"
⚠️ Limits & Gotchas
IssueDetail
domain: "legal" + non-EnglishReturns 400 error. Legal domain is English-only. Never hardcode in recorder submit.
Key Topics + non-EnglishSilently skipped — no error, just no topics returned.
Translation + TXT/SRT formatOnly JSON includes translations. TXT and SRT formats silently omit them.
Mandarin translation codeUse cmn not zh. The engine rejects zh.
Translation max targetsMaximum 5 per job. More causes invalid_config error.
Smart Sections minimumBest with 10+ minutes of audio. Short audio returns poor or no chapters.
API returns 201Job creation returns 201, not 200. Check for both.
format_duration(None)Always use job.get("duration_seconds") or 0. DB field can be NULL for recorded jobs.
__pycache__Must clear after replacing any Python file or old cached code runs instead.
Code-switching + intelligenceSummary, Tone, Sections may not work with bilingual packs (en_ms, cmn_en etc). Test individually.
Arabic not a translation targetArabic (ar) is supported as a source language but NOT as a translation target from English.
🏷 ScribeLive Label Mapping

Never use internal engine terms on client-facing surfaces.

Internal TermScribeLive Client Label
SummarizationSmart Summary
Sentiment analysisTone Analysis
Auto chaptersSmart Sections
Topic detectionKey Topics
Audio eventsSound Detection
Audio enhancementAudio Clarity
Remove disfluenciesClean Speech
Speaker diarizationSpeaker Labels
Enhanced operating pointHigh Accuracy
Standard operating pointFast Mode