Hugging Face: Difference between revisions

no edit summary
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
Hugging face is a [[company]] that works on the field of [[artificial intelligence]] ([[AI]]), self-described as the “home of [[machine learning]].” <ref name="”1”">Romano, R (2022).  An introduction to Hugging Face transformers for NLP. Qwak. https://www.qwak.com/post/an-introduction-to-hugging-face-transformers-for-nlp</ref> It’s a community and data science platform that provides both tools that empower users to build, train, and deploy [[machine learning]] ([[ML]]) [[models]] that are based on [[open-source]] code, and a place where a community of researchers, data scientists, and ML engineers can participate by sharing ideas and contributing to open source projects. <ref name="”2”">Mahmood, O (2022). What’s Hugging Face? Towards Data Science. https://towardsdatascience.com/whats-hugging-face-122f4e7eb11a</ref> Its open-source hub offers a library of state-of-the-art models for [[Natural Language Processing]] ([[NLP]]), [[computer vision]], and others that are relevant to AI. In August 2022, there were more than 61 thousand [[pre-trained models]]. Technological giants like [[Microsoft]], [[Google]], [[Facebook]], [[Apple]], [[AWS]], and others have used Hugging Face’s models, datasets, and libraries. <ref name="”3”">Nabeel, M. What is Hugging Face? Educative. https://www.educative.io/answers/what-is-huggingface</ref> <ref name="”4”">Syal, A (2020). Hugging Face: A Step Towards Democratizing NLP. Towards Data Science. https://towardsdatascience.com/hugging-face-a-step-towards-democratizing-nlp-2c79f258c951</ref>
==Introduction==
[[Hugging Face]] is a [[company]] and [[model hub]] that works on the field of [[artificial intelligence]] ([[AI]]), self-described as the “home of [[machine learning]].” <ref name="”1”">Romano, R (2022).  An introduction to Hugging Face transformers for NLP. Qwak. https://www.qwak.com/post/an-introduction-to-hugging-face-transformers-for-nlp</ref> It’s a community and data science platform that provides both tools that empower users to build, train, and deploy [[machine learning]] ([[ML]]) [[models]] that are based on [[open-source]] code, and a place where a community of researchers, data scientists, and ML engineers can participate by sharing ideas and contributing to open source projects. <ref name="”2”">Mahmood, O (2022). What’s Hugging Face? Towards Data Science. https://towardsdatascience.com/whats-hugging-face-122f4e7eb11a</ref> Its open-source hub offers a library of state-of-the-art models for [[Natural Language Processing]] ([[NLP]]), [[computer vision]], and others that are relevant to AI. In August 2022, there were more than 61 thousand [[pre-trained models]]. Technological giants like [[Microsoft]], [[Google]], [[Facebook]], [[Apple]], [[AWS]], and others have used Hugging Face’s models, datasets, and libraries. <ref name="”3”">Nabeel, M. What is Hugging Face? Educative. https://www.educative.io/answers/what-is-huggingface</ref> <ref name="”4”">Syal, A (2020). Hugging Face: A Step Towards Democratizing NLP. Towards Data Science. https://towardsdatascience.com/hugging-face-a-step-towards-democratizing-nlp-2c79f258c951</ref>


The company began by offering a chat platform in 2017. Then, it focused on NLP, creating an NLP library that made easily accessible resources like [[transformers]], [[datasets]], [[tokenizers]], etc. Releasing a wide variety of tools made them popular among big tech companies. <ref name="”5”">Sarma, N (2023). Hugging Face pre-trained models: Find the best one for your task. Neptune.ai. https://neptune.ai/blog/hugging-face-pre-trained-models-find-the-best</ref> NLP technologies can help to bridge the communication gap between humans and machines since computers do not process information in the same way. <ref name="”5”"></ref> With these systems, “it is possible for computers to read text, hear speech, interpret it, measure sentiment, and even determine which parts of the text or speech are important”. <ref name="”4”"></ref>
The company began by offering a chat platform in 2017. Then, it focused on NLP, creating an NLP library that made easily accessible resources like [[transformers]], [[datasets]], [[tokenizers]], etc. Releasing a wide variety of tools made them popular among big tech companies. <ref name="”5”">Sarma, N (2023). Hugging Face pre-trained models: Find the best one for your task. Neptune.ai. https://neptune.ai/blog/hugging-face-pre-trained-models-find-the-best</ref> NLP technologies can help to bridge the communication gap between humans and machines since computers do not process information in the same way. <ref name="”5”"></ref> With these systems, “it is possible for computers to read text, hear speech, interpret it, measure sentiment, and even determine which parts of the text or speech are important”. <ref name="”4”"></ref>
Line 22: Line 23:
From its beginnings has a chatbot, it’s becoming the GitHub of ML. The platform offers 100,000 pre-trained [[models]] and 10,000 [[datasets]] for NLP, speech, computer vision, [[time-series]], biology, [[reinforcement learning]], chemistry, and others. <ref name="”7”">Krishna, S (2022). Hugging Face takes step toward democratizing AI and ML. VentureBeat. https://venturebeat.com/ai/hugging-face-steps-toward-democratizing-ai-and-ml-with-latest-offering%EF%BF%BC/</ref> Around 5,000 companies use Hugging Face, <ref name="”6”"></ref> and it has over 1,200 contributors and 25,800 users. <ref name="”1”"></ref>
From its beginnings has a chatbot, it’s becoming the GitHub of ML. The platform offers 100,000 pre-trained [[models]] and 10,000 [[datasets]] for NLP, speech, computer vision, [[time-series]], biology, [[reinforcement learning]], chemistry, and others. <ref name="”7”">Krishna, S (2022). Hugging Face takes step toward democratizing AI and ML. VentureBeat. https://venturebeat.com/ai/hugging-face-steps-toward-democratizing-ai-and-ml-with-latest-offering%EF%BF%BC/</ref> Around 5,000 companies use Hugging Face, <ref name="”6”"></ref> and it has over 1,200 contributors and 25,800 users. <ref name="”1”"></ref>


==Models, datasets, and spaces==
==Models==
 
[[File:3. Example of full model card-source-towardsdatascience.png|thumb|Figure 1. Model card elements. Source: Towards Data Science.]]
[[File:3. Example of full model card-source-towardsdatascience.png|thumb|Figure 1. Model card elements. Source: Towards Data Science.]]
[[File:4. Dataset elements-source-towardsdatascience.png|thumb|Figure 2. Dataset card. Source: Towards Data Science.]]
Creating a new [[model]] in the platform is actually a Git repo for the files related to an ML model a user wants to share with the same characteristics as versioning, branches, and discoverability to name a few. The type of OS license attributed to the contributing model and assets can also be specified as well as defining its visibility. <ref name="”2”"></ref>
[[File:5. Spaces-towardsdatascience.png|thumb|Figure 3. Hugging Face's Spaces. Source: Towards Data Science.]]
 
Creating a new model in the platform is actually a Git repo for the files related to an ML model a user wants to share with the same characteristics as versioning, branches, and discoverability to name a few. The type of OS license attributed to the contributing model and assets can also be specified as well as defining its visibility. <ref name="”2”"></ref>


The model in the user interface has several elements (figure 1):
The model in the user interface has several elements (figure 1):


   
*Name, likes, and associated tags.
*Name, likes, and associated tags.
*Main body of the model card where an overview of the model can be given, code snippets for how to use it, and other relevant information.
*Main body of the [[model card]] where an overview of the model can be given, code snippets for how to use it, and other relevant information.
*Train, fine tune, or deploy the model. This is done by pointing at an AWS Sagemaker instance, or using Hugging Face’s own infrastructure.
*[[Train]], [[fine-tune]], or [[deploy]] the model. This is done by pointing at an AWS Sagemaker instance or using Hugging Face’s own infrastructure.
*Metadata with information about datasets that were used to train the model and the Spaces that use it. <ref name="”2”"></ref>
*Metadata with information about datasets that were used to train the model and the Spaces that use it. <ref name="”2”"></ref>


Datasets are used to help with model training or fine-tuning and they are available in multiple languages. The company’s datasets library provides users with an easy way to load datasets and the most commonly used operations for processing them (e.g. sampling, shuffling, filtering, etc.). <ref name="”3”"></ref>
==Datasets==
[[File:4. Dataset elements-source-towardsdatascience.png|thumb|Figure 2. Dataset card. Source: Towards Data Science.]]
[[Datasets]] are used to help with model training or fine-tuning and they are available in multiple languages. The company’s datasets library provides users with an easy way to load datasets and the most commonly used operations for processing them (e.g. sampling, shuffling, filtering, etc.). <ref name="”3”"></ref>


When creating a new dataset, the user will also have to name it and choose its license type. <ref name="”2”"></ref> The dataset elements on the platform (figure 2) include:
When creating a new dataset, the user will also have to name it and choose its license type. <ref name="”2”"></ref> The dataset elements on the platform (figure 2) include:
Line 49: Line 47:
*Metadata about the origin, size, and models trained on the dataset. <ref name="”2”"></ref>
*Metadata about the origin, size, and models trained on the dataset. <ref name="”2”"></ref>


The platform also has Spaces (figure 3), a place to showcase the work done in a self contained ML demo app. The community actively contributes to it and a user can look for inspiration checking out different submissions. <ref name="”2”"></ref>
==Spaces==
[[File:5. Spaces-towardsdatascience.png|thumb|Figure 3. Hugging Face's Spaces. Source: Towards Data Science.]]
The platform also has Spaces (figure 3), a place to showcase the work done in a self-contained ML demo app. The community actively contributes to it and a user can look for inspiration by checking out different submissions. <ref name="”2”"></ref>


==Community==
==Community==
The vast community contribution of [[models]], [[datasets]], and [[spaces]] can be accessed through the platform. Most of the models in this repository are built in [[PyTorch]]. Sometimes, alternatives for the main tasks are also available in [[TensorFlow]] and other [[ML libraries]]. <ref name="”2”"></ref>


The vast community contribution of models, datasets, and spaces can be accessed though the platform. Most of the models in this repository are built in PyTorch. Sometimes, alternatives for the main tasks are also available in TensorFlow and other ML libraries. <ref name="”2”"></ref>
A quality of life feature that saves time from exploring through the community models is Tasks. This provides a curated view of the model, dependent on the task that a user wants to accomplish. For each task, there’s an explanation in a visual and intuitive way, with diagrams, videos, and links to a demo that uses the Inference API. To complement this there are also descriptions of use cases and task variants. <ref name="”2”"></ref>
 
A quality of life feature that saves time from exploring through the community models is Tasks. This provides a curated view of the model, dependent on the task that a user wants to accomplish. For each task there’s an explanation in a visual and intuitive way, with diagrams, video, and links to a demo that uses the Inference API. To complement this there are also descriptions of use cases and task variants. <ref name="”2”"></ref>


==Inference Endpoints==
==Inference Endpoints==