Page history

Page Discussion

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (VALL-E)

7 April 2023

Alpha5
no edit summary
23:47
+4

2 March 2023

Nicoboomer
→‎Vall-E
17:07
+871
Nicoboomer
Created page with "{{see also|Papers}} ==Introduction== In the last decade, there have been significant advances in speech synthesis via neural networks and end to end modeling. Current text-to-speech (TTS), systems require high-quality data from recording studios. They also suffer from poor generalization for unseen speaker in zero-shot situations. A new TTS framework, VALL-E, has been developed to address this issue. It uses audio codec codes for an intermediate representation as well a..."
16:56
+2,938

Retrieved from "http:///wiki/Special:History/Neural_Codec_Language_Models_are_Zero-Shot_Text_to_Speech_Synthesizers_(VALL-E)"