Hugging Face: Difference between revisions

no edit summary
No edit summary
No edit summary
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
Hugging face is a [[company]] that works on the field of [[artificial intelligence]] ([[AI]]), self-described as the “home of [[machine learning]].” <ref name="”1”">Romano, R (2022).  An introduction to Hugging Face transformers for NLP. Qwak. https://www.qwak.com/post/an-introduction-to-hugging-face-transformers-for-nlp</ref> It’s a community and data science platform that provides both tools that empower users to build, train, and deploy [[machine learning]] ([[ML]]) [[models]] that are based on [[open-source]] code, and a place where a community of researchers, data scientists, and ML engineers can participate by sharing ideas and contributing to open source projects. <ref name="”2”">Mahmood, O (2022). What’s Hugging Face? Towards Data Science. https://towardsdatascience.com/whats-hugging-face-122f4e7eb11a</ref> Its open-source hub offers a library of state-of-the-art models for [[Natural Language Processing]] ([[NLP]]), [[computer vision]], and others that are relevant to AI. In August 2022, there were more than 61 thousand [[pre-trained models]]. Technological giants like [[Microsoft]], [[Google]], [[Facebook]], [[Apple]], [[AWS]], and others have used Hugging Face’s models, datasets, and libraries. <ref name="”3”">Nabeel, M. What is Hugging Face? Educative. https://www.educative.io/answers/what-is-huggingface</ref> <ref name="”4”">Syal, A (2020). Hugging Face: A Step Towards Democratizing NLP. Towards Data Science. https://towardsdatascience.com/hugging-face-a-step-towards-democratizing-nlp-2c79f258c951</ref>
==Introduction==
[[Hugging Face]] is a [[company]] and [[model hub]] that works on the field of [[artificial intelligence]] ([[AI]]), self-described as the “home of [[machine learning]].” <ref name="”1”">Romano, R (2022).  An introduction to Hugging Face transformers for NLP. Qwak. https://www.qwak.com/post/an-introduction-to-hugging-face-transformers-for-nlp</ref> It’s a community and data science platform that provides both tools that empower users to build, train, and deploy [[machine learning]] ([[ML]]) [[models]] that are based on [[open-source]] code, and a place where a community of researchers, data scientists, and ML engineers can participate by sharing ideas and contributing to open source projects. <ref name="”2”">Mahmood, O (2022). What’s Hugging Face? Towards Data Science. https://towardsdatascience.com/whats-hugging-face-122f4e7eb11a</ref> Its open-source hub offers a library of state-of-the-art models for [[Natural Language Processing]] ([[NLP]]), [[computer vision]], and others that are relevant to AI. In August 2022, there were more than 61 thousand [[pre-trained models]]. Technological giants like [[Microsoft]], [[Google]], [[Facebook]], [[Apple]], [[AWS]], and others have used Hugging Face’s models, datasets, and libraries. <ref name="”3”">Nabeel, M. What is Hugging Face? Educative. https://www.educative.io/answers/what-is-huggingface</ref> <ref name="”4”">Syal, A (2020). Hugging Face: A Step Towards Democratizing NLP. Towards Data Science. https://towardsdatascience.com/hugging-face-a-step-towards-democratizing-nlp-2c79f258c951</ref>


The company began by offering a chat platform in 2017. Then, it focused on NLP, creating an NLP library that made easily accessible resources like [[transformers]], [[datasets]], [[tokenizers]], etc. Releasing a wide variety of tools made them popular among big tech companies. <ref name="”5”">Sarma, N (2023). Hugging Face pre-trained models: Find the best one for your task. Neptune.ai. https://neptune.ai/blog/hugging-face-pre-trained-models-find-the-best</ref> NLP technologies can help to bridge the communication gap between humans and machines since computers do not process information in the same way. <ref name="”5”"></ref> With these systems, “it is possible for computers to read text, hear speech, interpret it, measure sentiment, and even determine which parts of the text or speech are important”. <ref name="”4”"></ref>
The company began by offering a chat platform in 2017. Then, it focused on NLP, creating an NLP library that made easily accessible resources like [[transformers]], [[datasets]], [[tokenizers]], etc. Releasing a wide variety of tools made them popular among big tech companies. <ref name="”5”">Sarma, N (2023). Hugging Face pre-trained models: Find the best one for your task. Neptune.ai. https://neptune.ai/blog/hugging-face-pre-trained-models-find-the-best</ref> NLP technologies can help to bridge the communication gap between humans and machines since computers do not process information in the same way. <ref name="”5”"></ref> With these systems, “it is possible for computers to read text, hear speech, interpret it, measure sentiment, and even determine which parts of the text or speech are important”. <ref name="”4”"></ref>
Line 5: Line 6:
As work practices have become more flexible, there has been an increase in the adoption of tools for remote collaboration between data science teams, experts, and amateurs. Sharing knowledge and resources is gaining relevance in AI in order to advance the field since probably no single company will be able to “solve” it on its own. Hugging Face embraces this community work by providing a community “Hub,” a place where users can share and examine models and datasets, therefore contributing to its goal of democratizing AI for all. <ref name="”3”"></ref> It is like the [[GitHub]] for [[AI models]].
As work practices have become more flexible, there has been an increase in the adoption of tools for remote collaboration between data science teams, experts, and amateurs. Sharing knowledge and resources is gaining relevance in AI in order to advance the field since probably no single company will be able to “solve” it on its own. Hugging Face embraces this community work by providing a community “Hub,” a place where users can share and examine models and datasets, therefore contributing to its goal of democratizing AI for all. <ref name="”3”"></ref> It is like the [[GitHub]] for [[AI models]].


In 2019, the company raised $15 million to build a comprehensive NLP library. In 2021, it raised another $40 million in a Series B funding round in which existing investors like Lux Capital, A.Capital, and Betaworks participated. <ref name="”1”"></ref> <ref name="”4”"></ref> <ref name="”6”">Dillet, R (2021). Hugging Face raises $40 million for its natural language processing library.  TechCrunch. https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library/?guce_referrer=aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnLw&guce_referrer_sig=AQAAACIYR4_aqmp84G_gD8G4LGbxperNQX6g_CtEDFPaIJ9-rf3_yCSbMhn0b4nE-oyzeK0gbOaDYg_ZBF9UVOfhOG58FUzC_cKJFEnF0YaqhE2OsWp5DljgGXCzl-J4NWMV9FrWyYhc0JSUjVvDyYSuwx096p7ABZOPQdsjU0NCJLEn</ref> Besides increasing its funding, Hugging Face has also acquired [[Gradio]], “a platform that enables anyone to demo their ML models through a web-based interface.” <ref name="”1”"></ref>
In 2019, the company raised $15 million to build a comprehensive NLP library. In 2021, it raised another $40 million in a Series B funding round in which existing investors like Lux Capital, A.Capital, and Betaworks participated. <ref name="”1”"></ref> <ref name="”4”"></ref> <ref name="”6”">Dillet, R (2021). Hugging Face raises $40 million for its natural language processing library.  TechCrunch. https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library/?guce_referrer=aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnLw&guce_referrer_sig=AQAAACIYR4_aqmp84G_gD8G4LGbxperNQX6g_CtEDFPaIJ9-rf3_yCSbMhn0b4nE-oyzeK0gbOaDYg_ZBF9UVOfhOG58FUzC_cKJFEnF0YaqhE2OsWp5DljgGXCzl-J4NWMV9FrWyYhc0JSUjVvDyYSuwx096p7ABZOPQdsjU0NCJLEn</ref> Besides increasing its funding, Hugging Face has also acquired [[Gradio]], “a platform that enables anyone to demo their [[ML models]] through a web-based interface.” <ref name="”1”"></ref>


==Benefits, characteristics, and impact==
==Benefits, characteristics, and impact==
Line 18: Line 19:
*High-level computer vision and audio tasks. <ref name="”1”"></ref>
*High-level computer vision and audio tasks. <ref name="”1”"></ref>


The company is known for its contributions to the field of NLP. Its NLP tasks include text classification and generation, translation, summarization, fill-mask, question-answering, [[zero-shot]] classification and sentence similarity. Regarding audio tasks, it includes speech recognition, text-to-speech, automatic speech recognition, and audio classification. <ref name="”3”"></ref>
The company is known for its contributions to the field of NLP. Its NLP tasks include [[text classification]] and [[text generation|generation]], [[translation]], [[summarization]], [[fill-mask]], [[question-answering]], [[zero-shot]] [[classification]] and [[sentence similarity]]. Regarding [[audio]] tasks, it includes [[speech recognition]], [[text-to-speech]], [[automatic speech recognition]], and [[audio classification]]. <ref name="”3”"></ref>


From its beginnings has a chatbot, it’s becoming the GitHub of ML. The platform offers 100,000 pre-trained models and 10,000 datasets for NLP, speech, computer vision, time-series, biology, [[reinforcement learning]], chemistry, and others. <ref name="”7”">Krishna, S (2022). Hugging Face takes step toward democratizing AI and ML. VentureBeat. https://venturebeat.com/ai/hugging-face-steps-toward-democratizing-ai-and-ml-with-latest-offering%EF%BF%BC/</ref> Around 5,000 companies use Hugging Face, <ref name="”6”"></ref> and it has over 1,200 contributors and 25,800 users. <ref name="”1”"></ref>
From its beginnings has a chatbot, it’s becoming the GitHub of ML. The platform offers 100,000 pre-trained [[models]] and 10,000 [[datasets]] for NLP, speech, computer vision, [[time-series]], biology, [[reinforcement learning]], chemistry, and others. <ref name="”7”">Krishna, S (2022). Hugging Face takes step toward democratizing AI and ML. VentureBeat. https://venturebeat.com/ai/hugging-face-steps-toward-democratizing-ai-and-ml-with-latest-offering%EF%BF%BC/</ref> Around 5,000 companies use Hugging Face, <ref name="”6”"></ref> and it has over 1,200 contributors and 25,800 users. <ref name="”1”"></ref>
 
==Models, datasets, and spaces==


==Models==
[[File:3. Example of full model card-source-towardsdatascience.png|thumb|Figure 1. Model card elements. Source: Towards Data Science.]]
[[File:3. Example of full model card-source-towardsdatascience.png|thumb|Figure 1. Model card elements. Source: Towards Data Science.]]
[[File:4. Dataset elements-source-towardsdatascience.png|thumb|Figure 2. Dataset card. Source: Towards Data Science.]]
Creating a new [[model]] in the platform is actually a Git repo for the files related to an ML model a user wants to share with the same characteristics as versioning, branches, and discoverability to name a few. The type of OS license attributed to the contributing model and assets can also be specified as well as defining its visibility. <ref name="”2”"></ref>
[[File:5. Spaces-towardsdatascience.png|thumb|Figure 3. Hugging Face's Spaces. Source: Towards Data Science.]]
 
Creating a new model in the platform is actually a Git repo for the files related to an ML model a user wants to share with the same characteristics as versioning, branches, and discoverability to name a few. The type of OS license attributed to the contributing model and assets can also be specified as well as defining its visibility. <ref name="”2”"></ref>


The model in the user interface has several elements (figure 1):
The model in the user interface has several elements (figure 1):


   
*Name, likes, and associated tags.
*Name, likes, and associated tags.
*Main body of the model card where an overview of the model can be given, code snippets for how to use it, and other relevant information.
*Main body of the [[model card]] where an overview of the model can be given, code snippets for how to use it, and other relevant information.
*Train, fine tune, or deploy the model. This is done by pointing at an AWS Sagemaker instance, or using Hugging Face’s own infrastructure.
*[[Train]], [[fine-tune]], or [[deploy]] the model. This is done by pointing at an AWS Sagemaker instance or using Hugging Face’s own infrastructure.
*Metadata with information about datasets that were used to train the model and the Spaces that use it. <ref name="”2”"></ref>
*Metadata with information about datasets that were used to train the model and the Spaces that use it. <ref name="”2”"></ref>


Datasets are used to help with model training or fine-tuning and they are available in multiple languages. The company’s datasets library provides users with an easy way to load datasets and the most commonly used operations for processing them (e.g. sampling, shuffling, filtering, etc.). <ref name="”3”"></ref>
==Datasets==
[[File:4. Dataset elements-source-towardsdatascience.png|thumb|Figure 2. Dataset card. Source: Towards Data Science.]]
[[Datasets]] are used to help with model training or fine-tuning and they are available in multiple languages. The company’s datasets library provides users with an easy way to load datasets and the most commonly used operations for processing them (e.g. sampling, shuffling, filtering, etc.). <ref name="”3”"></ref>


When creating a new dataset, the user will also have to name it and choose its license type. <ref name="”2”"></ref> The dataset elements on the platform (figure 2) include:
When creating a new dataset, the user will also have to name it and choose its license type. <ref name="”2”"></ref> The dataset elements on the platform (figure 2) include:
Line 49: Line 47:
*Metadata about the origin, size, and models trained on the dataset. <ref name="”2”"></ref>
*Metadata about the origin, size, and models trained on the dataset. <ref name="”2”"></ref>


The platform also has Spaces (figure 3), a place to showcase the work done in a self contained ML demo app. The community actively contributes to it and a user can look for inspiration checking out different submissions. <ref name="”2”"></ref>
==Spaces==
[[File:5. Spaces-towardsdatascience.png|thumb|Figure 3. Hugging Face's Spaces. Source: Towards Data Science.]]
The platform also has Spaces (figure 3), a place to showcase the work done in a self-contained ML demo app. The community actively contributes to it and a user can look for inspiration by checking out different submissions. <ref name="”2”"></ref>


==Community==
==Community==
The vast community contribution of [[models]], [[datasets]], and [[spaces]] can be accessed through the platform. Most of the models in this repository are built in [[PyTorch]]. Sometimes, alternatives for the main tasks are also available in [[TensorFlow]] and other [[ML libraries]]. <ref name="”2”"></ref>


The vast community contribution of models, datasets, and spaces can be accessed though the platform. Most of the models in this repository are built in PyTorch. Sometimes, alternatives for the main tasks are also available in TensorFlow and other ML libraries. <ref name="”2”"></ref>
A quality of life feature that saves time from exploring through the community models is Tasks. This provides a curated view of the model, dependent on the task that a user wants to accomplish. For each task, there’s an explanation in a visual and intuitive way, with diagrams, videos, and links to a demo that uses the Inference API. To complement this there are also descriptions of use cases and task variants. <ref name="”2”"></ref>
 
A quality of life feature that saves time from exploring through the community models is Tasks. This provides a curated view of the model, dependent on the task that a user wants to accomplish. For each task there’s an explanation in a visual and intuitive way, with diagrams, video, and links to a demo that uses the Inference API. To complement this there are also descriptions of use cases and task variants. <ref name="”2”"></ref>


==Inference Endpoints==
==Inference Endpoints==