Model | HF Name | Creator | Task | Library | Dataset | Language | Paper | Related to | License |
---|
Fine-tuned XLSR-53 large for speech recognition in English model | jonatasgrosman/wav2vec2-large-xlsr-53-english | Jonatasgrosman | Automatic Speech Recognition | PyTorch JAX Transformers | Common voice Mozilla-foundation/common voice 6 0 | English | | Audio Wav2vec2 Speech Hf-asr-leaderboard Mozilla-foundation/common voice 6 0 Robust-speech-event Xlsr-fine-tuning-week | Apache-2.0 |
Fine-tuned XLSR-53 large for speech recognition in Japanese model | jonatasgrosman/wav2vec2-large-xlsr-53-japanese | Jonatasgrosman | Automatic Speech Recognition | PyTorch JAX Transformers | Common voice | Japanese | | Audio Wav2vec2 Speech Xlsr-fine-tuning-week | Apache-2.0 |
HuBERT-Base for Emotion Recognition model | superb/hubert-base-superb-er | Superb | Audio Classification | PyTorch Transformers | Superb | English | 2105.01051 | Audio Speech Hubert | Apache-2.0 |
Segmentation model | pyannote/segmentation | Pyannote | Voice Activity Detection | PyTorch Pyannote.audio | Ami Dihard Voxconverse | | 2104.04045 | Audio Speech Speaker Pyannote Pyannote-audio-model Voice Speaker-segmentation Overlapped-speech-detection Resegmentation | Mit |
Speaker Verification with ECAPA-TDNN embeddings on Voxceleb model | speechbrain/spkrec-ecapa-voxceleb | Speechbrain | | PyTorch Speechbrain | Voxceleb | English | 2106.04624 | Embeddings Speaker Verification Identification ECAPA TDNN | Apache-2.0 |
Speaker embedding model | pyannote/embedding | Pyannote | | PyTorch TensorBoard Pyannote.audio | Voxceleb | | | Audio Speech Speaker Pyannote Pyannote-audio-model Voice Speaker-recognition Speaker-verification Speaker-identification Speaker-embedding | Mit |
Speaker-diarization model | pyannote/speaker-diarization | Pyannote | Automatic Speech Recognition Voice Activity Detection | Pyannote.audio | Voxceleb Ami Dihard Voxconverse Aishell Repere | | 2012.01477 2110.07058 2005.08072 | Audio Speech Speaker Pyannote Voice Overlapped-speech-detection Pyannote-audio-pipeline Speaker-diarization Speaker-change-detection | Mit |
Wav2Vec2-Base model | facebook/wav2vec2-base | Meta | | PyTorch Transformers | Librispeech asr | English | 2006.11477 | Wav2vec2 Speech Pretraining | Apache-2.0 |
Wav2Vec2-Base-960h model | facebook/wav2vec2-base-960h | Meta | Automatic Speech Recognition | PyTorch TensorFlow Safetensors Transformers | Librispeech asr | English | 2006.11477 | Audio Wav2vec2 Hf-asr-leaderboard | Apache-2.0 |
Wav2Vec2-Large-960h model | facebook/wav2vec2-large-960h | Meta | Automatic Speech Recognition | PyTorch Transformers | Librispeech asr | English | 2006.11477 | Wav2vec2 Speech | Apache-2.0 |
Wav2Vec2-Large-960h-Lv60 Self-Training model | facebook/wav2vec2-large-960h-lv60-self | Meta | Automatic Speech Recognition | PyTorch TensorFlow JAX Transformers | Librispeech asr | English | 2010.11430 2006.11477 | Audio Wav2vec2 Speech Hf-asr-leaderboard | Apache-2.0 |
Wav2Vec2-Large-XLSR-53 finetuned on multi-lingual Common Voice model | facebook/wav2vec2-xlsr-53-espeak-cv-ft | Meta | Automatic Speech Recognition | PyTorch Transformers | Common voice | Multi-lingual | 2109.11680 | Audio Wav2vec2 Speech Phoneme-recognition | Apache-2.0 |
Wav2Vec2-large-a model | yongjian/wav2vec2-large-a | Yongjian | Automatic Speech Recognition | PyTorch Transformers | LIUM/tedlium | English | | Audio Wav2vec2 Speech | |
Wave2Vec2-large-robust-ft-libritts-voxpopuli model | jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli | Jbetker | Automatic Speech Recognition | PyTorch Transformers | | | | Wav2vec2 | |
Whisper large v2 model | openai/whisper-large-v2 | Openai | Automatic Speech Recognition | PyTorch TensorFlow Transformers | | 99 languages | 2212.04356 | Audio Hf-asr-leaderboard Whisper | Apache-2.0 |