Hugging Face: Difference between revisions

Line 22: Line 22:
From its beginnings has a chatbot, it’s becoming the GitHub of ML. The platform offers 100,000 pre-trained [[models]] and 10,000 [[datasets]] for NLP, speech, computer vision, [[time-series]], biology, [[reinforcement learning]], chemistry, and others. <ref name="”7”">Krishna, S (2022). Hugging Face takes step toward democratizing AI and ML. VentureBeat. https://venturebeat.com/ai/hugging-face-steps-toward-democratizing-ai-and-ml-with-latest-offering%EF%BF%BC/</ref> Around 5,000 companies use Hugging Face, <ref name="”6”"></ref> and it has over 1,200 contributors and 25,800 users. <ref name="”1”"></ref>
From its beginnings has a chatbot, it’s becoming the GitHub of ML. The platform offers 100,000 pre-trained [[models]] and 10,000 [[datasets]] for NLP, speech, computer vision, [[time-series]], biology, [[reinforcement learning]], chemistry, and others. <ref name="”7”">Krishna, S (2022). Hugging Face takes step toward democratizing AI and ML. VentureBeat. https://venturebeat.com/ai/hugging-face-steps-toward-democratizing-ai-and-ml-with-latest-offering%EF%BF%BC/</ref> Around 5,000 companies use Hugging Face, <ref name="”6”"></ref> and it has over 1,200 contributors and 25,800 users. <ref name="”1”"></ref>


==Models, datasets, and spaces==
==Models==
 
[[File:3. Example of full model card-source-towardsdatascience.png|thumb|Figure 1. Model card elements. Source: Towards Data Science.]]
[[File:3. Example of full model card-source-towardsdatascience.png|thumb|Figure 1. Model card elements. Source: Towards Data Science.]]
[[File:4. Dataset elements-source-towardsdatascience.png|thumb|Figure 2. Dataset card. Source: Towards Data Science.]]
Creating a new [[model]] in the platform is actually a Git repo for the files related to an ML model a user wants to share with the same characteristics as versioning, branches, and discoverability to name a few. The type of OS license attributed to the contributing model and assets can also be specified as well as defining its visibility. <ref name="”2”"></ref>
[[File:5. Spaces-towardsdatascience.png|thumb|Figure 3. Hugging Face's Spaces. Source: Towards Data Science.]]
 
Creating a new model in the platform is actually a Git repo for the files related to an ML model a user wants to share with the same characteristics as versioning, branches, and discoverability to name a few. The type of OS license attributed to the contributing model and assets can also be specified as well as defining its visibility. <ref name="”2”"></ref>


The model in the user interface has several elements (figure 1):
The model in the user interface has several elements (figure 1):


   
*Name, likes, and associated tags.
*Name, likes, and associated tags.
*Main body of the model card where an overview of the model can be given, code snippets for how to use it, and other relevant information.
*Main body of the [[model card]] where an overview of the model can be given, code snippets for how to use it, and other relevant information.
*Train, fine tune, or deploy the model. This is done by pointing at an AWS Sagemaker instance, or using Hugging Face’s own infrastructure.
*[[Train]], [[fine-tune]], or [[deploy]] the model. This is done by pointing at an AWS Sagemaker instance or using Hugging Face’s own infrastructure.
*Metadata with information about datasets that were used to train the model and the Spaces that use it. <ref name="”2”"></ref>
*Metadata with information about datasets that were used to train the model and the Spaces that use it. <ref name="”2”"></ref>


Datasets are used to help with model training or fine-tuning and they are available in multiple languages. The company’s datasets library provides users with an easy way to load datasets and the most commonly used operations for processing them (e.g. sampling, shuffling, filtering, etc.). <ref name="”3”"></ref>
==Datasets==
[[File:4. Dataset elements-source-towardsdatascience.png|thumb|Figure 2. Dataset card. Source: Towards Data Science.]]
[[Datasets]] are used to help with model training or fine-tuning and they are available in multiple languages. The company’s datasets library provides users with an easy way to load datasets and the most commonly used operations for processing them (e.g. sampling, shuffling, filtering, etc.). <ref name="”3”"></ref>


When creating a new dataset, the user will also have to name it and choose its license type. <ref name="”2”"></ref> The dataset elements on the platform (figure 2) include:
When creating a new dataset, the user will also have to name it and choose its license type. <ref name="”2”"></ref> The dataset elements on the platform (figure 2) include:
Line 49: Line 46:
*Metadata about the origin, size, and models trained on the dataset. <ref name="”2”"></ref>
*Metadata about the origin, size, and models trained on the dataset. <ref name="”2”"></ref>


The platform also has Spaces (figure 3), a place to showcase the work done in a self contained ML demo app. The community actively contributes to it and a user can look for inspiration checking out different submissions. <ref name="”2”"></ref>
==Spaces==
[[File:5. Spaces-towardsdatascience.png|thumb|Figure 3. Hugging Face's Spaces. Source: Towards Data Science.]]
The platform also has Spaces (figure 3), a place to showcase the work done in a self-contained ML demo app. The community actively contributes to it and a user can look for inspiration by checking out different submissions. <ref name="”2”"></ref>


==Community==
==Community==