Audio Models

From AI Wiki
See also: Audio and Models
ModelHF NameCreatorTaskLibraryDatasetLanguagePaperRelated toLicense
Fine-tuned XLSR-53 large for speech recognition in English modeljonatasgrosman/wav2vec2-large-xlsr-53-englishJonatasgrosmanAutomatic Speech RecognitionPyTorch
JAX
Transformers
Common voice
Mozilla-foundation/common voice 6 0
EnglishApache-2.0
Fine-tuned XLSR-53 large for speech recognition in Japanese modeljonatasgrosman/wav2vec2-large-xlsr-53-japaneseJonatasgrosmanAutomatic Speech RecognitionPyTorch
JAX
Transformers
Common voiceJapaneseApache-2.0
HuBERT-Base for Emotion Recognition modelsuperb/hubert-base-superb-erSuperbAudio ClassificationPyTorch
Transformers
SuperbEnglish2105.01051Apache-2.0
Segmentation modelpyannote/segmentationPyannoteVoice Activity DetectionPyTorch
Pyannote.audio
Ami
Dihard
Voxconverse
2104.04045Mit
Speaker Verification with ECAPA-TDNN embeddings on Voxceleb modelspeechbrain/spkrec-ecapa-voxcelebSpeechbrainPyTorch
Speechbrain
VoxcelebEnglish2106.04624Apache-2.0
Speaker embedding modelpyannote/embeddingPyannotePyTorch
TensorBoard
Pyannote.audio
VoxcelebMit
Speaker-diarization modelpyannote/speaker-diarizationPyannoteAutomatic Speech Recognition
Voice Activity Detection
Pyannote.audioVoxceleb
Ami
Dihard
Voxconverse
Aishell
Repere
2012.01477
2110.07058
2005.08072
Mit
Wav2Vec2-Base modelfacebook/wav2vec2-baseMetaPyTorch
Transformers
Librispeech asrEnglish2006.11477Apache-2.0
Wav2Vec2-Base-960h modelfacebook/wav2vec2-base-960hMetaAutomatic Speech RecognitionPyTorch
TensorFlow
Safetensors
Transformers
Librispeech asrEnglish2006.11477Apache-2.0
Wav2Vec2-Large-960h modelfacebook/wav2vec2-large-960hMetaAutomatic Speech RecognitionPyTorch
Transformers
Librispeech asrEnglish2006.11477Apache-2.0
Wav2Vec2-Large-960h-Lv60 Self-Training modelfacebook/wav2vec2-large-960h-lv60-selfMetaAutomatic Speech RecognitionPyTorch
TensorFlow
JAX
Transformers
Librispeech asrEnglish2010.11430
2006.11477
Apache-2.0
Wav2Vec2-Large-XLSR-53 finetuned on multi-lingual Common Voice modelfacebook/wav2vec2-xlsr-53-espeak-cv-ftMetaAutomatic Speech RecognitionPyTorch
Transformers
Common voiceMulti-lingual2109.11680Apache-2.0
Wav2Vec2-large-a modelyongjian/wav2vec2-large-aYongjianAutomatic Speech RecognitionPyTorch
Transformers
LIUM/tedliumEnglish
Wave2Vec2-large-robust-ft-libritts-voxpopuli modeljbetker/wav2vec2-large-robust-ft-libritts-voxpopuliJbetkerAutomatic Speech RecognitionPyTorch
Transformers
Whisper large v2 modelopenai/whisper-large-v2OpenaiAutomatic Speech RecognitionPyTorch
TensorFlow
Transformers
99 languages2212.04356Apache-2.0