Page history
7 April 2023
2 March 2023
→Vall-E
+871
Created page with "{{see also|Papers}} ==Introduction== In the last decade, there have been significant advances in speech synthesis via neural networks and end to end modeling. Current text-to-speech (TTS), systems require high-quality data from recording studios. They also suffer from poor generalization for unseen speaker in zero-shot situations. A new TTS framework, VALL-E, has been developed to address this issue. It uses audio codec codes for an intermediate representation as well a..."
+2,938