Skip to content

TTS Loader

Loader for TTS (Text-to-Speech) datasets.

There are two parsing strategies for TTS datasets, controlled by the root_strategy field in the schema.

Strategies

Index-based (default)

A CSV / TSV / pipe-delimited index file maps audio paths to transcriptions.

Controlled by:

Field Required Description
format File format ("csv", "tsv", "pipe").
index_file Path to the index file, relative to the dataset root.
base_audio_path Directory prefix prepended to file_path dtype columns.
columns Column mappings from source columns to logical names.
separator Explicit column separator (e.g. "\|").
has_header Whether the index file has a header row. When false, source_column must be a positional integer.
encoding File encoding (e.g. "utf-8-sig" for files with a BOM).

Paired-file / glob-based (root_strategy: "paired_glob")

Each audio file has a matching .txt file with the same stem. The loader recursively finds all text files, reads the transcription, and pairs them with the corresponding audio file. The parent directory name is captured as a split column in the resulting DataFrame.

Controlled by:

Field Required Description
file_pattern Glob pattern used to find text files (e.g. "**/*.txt").
audio_extension Extension of the matching audio files (e.g. ".webm").
content_mapping Optional mapping of file content to DataFrame columns.

Examples

Index-based schema

# Example: pipe-delimited, headerless metadata
dataset_id: "aso-ckb-tts"
task: "TTS"
format: "pipe"
separator: "|"
has_header: false
index_file: "metadata.csv"
base_audio_path: "wavs/"
columns:
  audio_path:
    source_column: 0        # positional index (no header)
    dtype: "file_path"
  transcription:
    source_column: 1
    dtype: "string"

Paired-glob schema

dataset_id: "pl-PL-darkman"
task: "TTS"
root_strategy: "paired_glob"
file_pattern: "**/*.txt"
audio_extension: ".webm"