scribeAi — Knowledge Library

🧠 Intelligence Features

Smart Summary (Summarization)

Generates a concise summary of the transcript. Available in Batch API only (SaaS).

Option	Values	Default
content_type	auto conversational informative	auto
summary_length	brief detailed	brief
summary_type	paragraphs bullets	paragraphs

JSON output only Most languages

API path: response.summary.content

Tone Analysis (Sentiment Analysis)

Labels transcript segments as positive, negative, or neutral with confidence scores. Includes per-speaker and per-channel breakdown.

Data	Location
Segments array	sentiment_analysis.segments[]
Each segment has	text, sentiment, start_time, end_time, speaker, channel, confidence
Overall summary	sentiment_analysis.summary.overall (positive_count, negative_count, neutral_count)
Per speaker	sentiment_analysis.summary.speakers[]
Per channel	sentiment_analysis.summary.channels[]

JSON output only Most languages Batch + On-prem

Smart Sections (Auto Chapters)

Splits transcript into titled, timestamped chapters with summaries. Non-overlapping and exhaustive (covers entire audio).

Field	Type
title	Auto-generated chapter title
summary	Paragraph summary of chapter content
start_time	Seconds (float)
end_time	Seconds (float)

JSON output only Best with 10+ min audio Best with diarization ON

Not supported: Irish, Maltese, Urdu, Bengali, Swahili

API path: response.chapters[]

Key Topics (Topic Detection)

Detects topics in the audio and labels segments. Default topics: Business & Finance, Education, Entertainment, Events & Attractions, Food & Drink, News & Politics, Science, Sports, Technology & Computing, Travel.

Custom topics: up to 10 can be provided via topic_detection_config.topics[].

Data	Location
Segments	topics.segments[] — each has text, start_time, end_time, topics[]
Summary	topics.summary.overall — object with topic counts

English audio ONLY JSON output only Max 10 custom topics

Sound Detection (Audio Events)

Detects non-speech audio events: music, applause, laughter. Returns timestamped events with confidence scores.

Field	Description
type	Event type (music, applause, laughter)
start_time	Seconds (float)
end_time	Seconds (float)
confidence	0.0 to 1.0
channel	If channel diarization enabled

All languages

API paths: audio_events[] and audio_event_summary.overall

Audio Clarity & Clean Speech

Audio Clarity (audio_enhancement): Pre-processes audio to reduce noise before transcription. Applied during file conversion.

Clean Speech (remove_disfluencies): Removes filler words like "um", "uh", "like" from the transcript output.

All languages

🌍 Supported Languages (63+)

Multilingual Code-Switching Packs

Critical for Malaysian market — handles speakers who switch between languages mid-sentence.

Pack	Code	Use Case
English + Malay	en_ms	Most important — Malaysian courts
English + Mandarin	cmn_en	Bilingual proceedings
English + Tamil	en_ta	Malaysian/Singapore courts
Arabic + English	ar_en	Middle Eastern clients
Multi (4 languages)	cmn_en_ms_ta	Multi-language hearings
Spanish + English	es with domain bilingual-en	Special config required

Intelligence features may not work with code-switching packs

All Supported Languages

Popular (Southeast Asia)

English (en)Malay (ms)Mandarin (cmn)Tamil (ta)Hindi (hi)Indonesian (id)Thai (th)Vietnamese (vi)Japanese (ja)Korean (ko)Arabic (ar)Cantonese (yue)

European

Bulgarian (bg)Catalan (ca)Croatian (hr)Czech (cs)Danish (da)Dutch (nl)Estonian (et)Finnish (fi)French (fr)German (de)Greek (el)Hungarian (hu)Irish (ga)Italian (it)Latvian (lv)Lithuanian (lt)Macedonian (mk)Maltese (mt)Norwegian (no)Polish (pl)Portuguese (pt)Romanian (ro)Russian (ru)Serbian (sr)Slovak (sk)Slovenian (sl)Spanish (es)Swedish (sv)Ukrainian (uk)Welsh (cy)

South & Central Asian

Bengali (bn)Gujarati (gu)Kannada (kn)Malayalam (ml)Marathi (mr)Nepali (ne)Punjabi (pa)Sinhala (si)Telugu (te)Urdu (ur)

Other

Persian/Farsi (fa)Hebrew (he)Turkish (tr)Swahili (sw)Zulu (zu)Mongolian (mn)Burmese (my)Khmer (km)Lao (lo)Filipino/Tagalog (tl)

🌐 Translation

Translation Rules

Only to/from English — cannot translate Malay→Mandarin directly
Maximum 5 target languages per job
JSON output format only — TXT and SRT do not include translations
If source = target, returns transcription unchanged
Channel diarization info is not included in translation output

Supported Translation Targets (34 languages)

All can be translated to AND from English.

Bulgarian (bg) · Catalan (ca) · Mandarin (cmn) · Czech (cs) · Danish (da) · German (de) · Greek (el) · Spanish (es) · Estonian (et) · Finnish (fi) · French (fr) · Galician (gl) · Hindi (hi) · Croatian (hr) · Hungarian (hu) · Indonesian (id) · Italian (it) · Japanese (ja) · Korean (ko) · Lithuanian (lt) · Latvian (lv) · Malay (ms) · Dutch (nl) · Norwegian (no) · Polish (pl) · Portuguese (pt) · Romanian (ro) · Russian (ru) · Slovak (sk) · Slovenian (sl) · Swedish (sv) · Turkish (tr) · Ukrainian (uk) · Vietnamese (vi)

NOT supported as translation targets: Arabic, Tamil, Thai, Bengali, Urdu, Cantonese, Persian, Hebrew, Swahili, Zulu, and all South Asian languages except Hindi

👥 Speaker Labels (Diarization)

Diarization Modes

Mode	Value	Description
Speaker	speaker	Identifies speakers by voice. Most common.
Channel	channel	Each audio channel = separate speaker. Perfect for court setups with per-mic channels.
Channel + Speaker	channel_and_speaker	Combines both — separate channels with speaker ID within each channel.
None	none	No speaker labels.

Speaker Configuration

Setting	Config Key	Notes
Max speakers	speaker_diarization_config.max_speakers	Limits detected speakers. Only applies to non-enrolled generic speakers.
Sensitivity	speaker_diarization_config.speaker_sensitivity	0.0-1.0. Lower = fewer speaker switches. Higher = more sensitive detection.
Prefer current speaker	speaker_diarization_config.prefer_current_speaker	Reduces false speaker switching for similar-sounding speakers.

Speaker Identification (Voice Banks) — FUTURE

ScribeLive supports enrolling speaker audio samples to auto-identify known speakers. Workflow:

Run diarization on a solo-speaker audio → call GetSpeakers to get voice identifiers
Store identifiers in a Voice Bank
On next transcription, provide identifiers in speaker_diarization_config.speakers[]
Enrolled speakers are tagged by name; unknown speakers get generic labels (S1, S2...)

Not yet built in ScribeLive — parked for Phase 2

🏛 Domain Specializations

Domain	Value	Notes
Legal	legal	English only! Sending with non-English audio returns 400 error. Do NOT use with code-switching packs.
Medical	medical	Enhanced mode required. Limited language support.
Finance	finance	Finance terminology optimization.
Bilingual (Spanish)	bilingual-en on language es	Special: set language to Spanish with bilingual-en domain.

🔊 Audio & File Handling

Supported File Formats

Audio: mp3, wav, webm, ogg, aac, flac, m4a

Video: mp4, mov, avi, mkv (audio extracted automatically)

Max file size: 5GB. converter.py handles all conversion to MP3 before sending to the transcription engine.

WebM is the browser recording format (ScribeRecorder). Converted automatically.

Transcription Modes (Operating Points)

ScribeLive Label	API Value	Notes
High Accuracy (recommended)	enhanced	Best quality. Slightly slower. Default.
Fast Mode	standard	Faster processing. Lower accuracy.

⚙️ API Configuration Keys

Job Config Structure

All config keys are siblings of transcription_config in the job config JSON.

Feature	Config Key	Value
Smart Summary	summarization_config	{content_type, summary_length, summary_type} or {} for defaults
Tone Analysis	sentiment_analysis_config	{}
Smart Sections	auto_chapters_config	{}
Key Topics	topic_detection_config	{} or {topics:["custom1","custom2"]}
Sound Detection	audio_events_config	{types:["music","applause","laughter"]}
Translation	translation_config	{target_languages:["ms","fr"]}
Clean Speech	transcript_filtering_config	{remove_disfluencies:true}
Speaker Labels	transcription_config.diarization	"speaker" or "channel"

⚠️ Limits & Gotchas

Issue	Detail
domain: "legal" + non-English	Returns 400 error. Legal domain is English-only. Never hardcode in recorder submit.
Key Topics + non-English	Silently skipped — no error, just no topics returned.
Translation + TXT/SRT format	Only JSON includes translations. TXT and SRT formats silently omit them.
Mandarin translation code	Use cmn not zh. The engine rejects zh.
Translation max targets	Maximum 5 per job. More causes invalid_config error.
Smart Sections minimum	Best with 10+ minutes of audio. Short audio returns poor or no chapters.
API returns 201	Job creation returns 201, not 200. Check for both.
format_duration(None)	Always use job.get("duration_seconds") or 0. DB field can be NULL for recorded jobs.
__pycache__	Must clear after replacing any Python file or old cached code runs instead.
Code-switching + intelligence	Summary, Tone, Sections may not work with bilingual packs (en_ms, cmn_en etc). Test individually.
Arabic not a translation target	Arabic (ar) is supported as a source language but NOT as a translation target from English.

🏷 ScribeLive Label Mapping

Never use internal engine terms on client-facing surfaces.

Internal Term	ScribeLive Client Label
Summarization	Smart Summary
Sentiment analysis	Tone Analysis
Auto chapters	Smart Sections
Topic detection	Key Topics
Audio events	Sound Detection
Audio enhancement	Audio Clarity
Remove disfluencies	Clean Speech
Speaker diarization	Speaker Labels
Enhanced operating point	High Accuracy
Standard operating point	Fast Mode

📚 Knowledge Library