The code of the original author was used as a primary resource of this reproducibility attempt and the model attempted to adapt to a new voice with diverse acoustic conditions using VCTK and LJSpeech datasets.
The code of the original author was used as a primary resource of this reproducibility attempt. custom voice, Adaspeech used three staged pipelines including pretraining, fine-tuning, and inference. In the first stage, it was a large multi-speaker dataset (LibriTTS (Zen et al., 2019) dataset). In the second stage, the model attempted to adapt to a new voice with diverse acoustic conditions using VCTK (Veaux et al., 2016) and LJSpeech (Ito, 2017) datasets. In the final stage, both unadapted and adapted parts were targeted to inference request.