Hugging Face: Difference between revisions

From AI Wiki
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
Hugging face is a [[company]] that works on the field of [[artificial intelligence]] ([[AI]]), self-described as the “home of [[machine learning]].” <ref name="”1”">Romano, R (2022).  An introduction to Hugging Face transformers for NLP. Qwak. https://www.qwak.com/post/an-introduction-to-hugging-face-transformers-for-nlp</ref> It’s a community and data science platform that provides both tools that empower users to build, train, and deploy [[machine learning]] ([[ML]]) [[models]] that are based on [[open-source]] code, and a place where a community of researchers, data scientists, and ML engineers can participate by sharing ideas and contributing to open source projects. <ref name="”2”">Mahmood, O (2022). What’s Hugging Face? Towards Data Science. https://towardsdatascience.com/whats-hugging-face-122f4e7eb11a</ref> Its open-source hub offers a library of state-of-the-art models for [[Natural Language Processing]] ([[NLP]]), [[computer vision]], and others that are relevant to AI. In August 2022, there were more than 61 thousand [[pre-trained models]]. Technological giants like [[Microsoft]], [[Google]], [[Facebook]], [[Apple]], [[AWS]], and others have used Hugging Face’s models, datasets, and libraries. <ref name="”3”">Nabeel, M. What is Hugging Face? Educative. https://www.educative.io/answers/what-is-huggingface</ref> <ref name="”4”">Syal, A (2020). Hugging Face: A Step Towards Democratizing NLP. Towards Data Science. https://towardsdatascience.com/hugging-face-a-step-towards-democratizing-nlp-2c79f258c951</ref>
==Introduction==
[[Hugging Face]] is a [[company]] and [[model hub]] that works on the field of [[artificial intelligence]] ([[AI]]), self-described as the “home of [[machine learning]].” <ref name="”1”">Romano, R (2022).  An introduction to Hugging Face transformers for NLP. Qwak. https://www.qwak.com/post/an-introduction-to-hugging-face-transformers-for-nlp</ref> It’s a community and data science platform that provides both tools that empower users to build, train, and deploy [[machine learning]] ([[ML]]) [[models]] that are based on [[open-source]] code, and a place where a community of researchers, data scientists, and ML engineers can participate by sharing ideas and contributing to open source projects. <ref name="”2”">Mahmood, O (2022). What’s Hugging Face? Towards Data Science. https://towardsdatascience.com/whats-hugging-face-122f4e7eb11a</ref> Its open-source hub offers a library of state-of-the-art models for [[Natural Language Processing]] ([[NLP]]), [[computer vision]], and others that are relevant to AI. In August 2022, there were more than 61 thousand [[pre-trained models]]. Technological giants like [[Microsoft]], [[Google]], [[Facebook]], [[Apple]], [[AWS]], and others have used Hugging Face’s models, datasets, and libraries. <ref name="”3”">Nabeel, M. What is Hugging Face? Educative. https://www.educative.io/answers/what-is-huggingface</ref> <ref name="”4”">Syal, A (2020). Hugging Face: A Step Towards Democratizing NLP. Towards Data Science. https://towardsdatascience.com/hugging-face-a-step-towards-democratizing-nlp-2c79f258c951</ref>


The company began by offering a chat platform in 2017. Then, it focused on NLP, creating an NLP library that made easily accessible resources like [[transformers]], [[datasets]], [[tokenizers]], etc. Releasing a wide variety of tools made them popular among big tech companies. <ref name="”5”">Sarma, N (2023). Hugging Face pre-trained models: Find the best one for your task. Neptune.ai. https://neptune.ai/blog/hugging-face-pre-trained-models-find-the-best</ref> NLP technologies can help to bridge the communication gap between humans and machines since computers do not process information in the same way. <ref name="”5”"></ref> With these systems, “it is possible for computers to read text, hear speech, interpret it, measure sentiment, and even determine which parts of the text or speech are important”. <ref name="”4”"></ref>
The company began by offering a chat platform in 2017. Then, it focused on NLP, creating an NLP library that made easily accessible resources like [[transformers]], [[datasets]], [[tokenizers]], etc. Releasing a wide variety of tools made them popular among big tech companies. <ref name="”5”">Sarma, N (2023). Hugging Face pre-trained models: Find the best one for your task. Neptune.ai. https://neptune.ai/blog/hugging-face-pre-trained-models-find-the-best</ref> NLP technologies can help to bridge the communication gap between humans and machines since computers do not process information in the same way. <ref name="”5”"></ref> With these systems, “it is possible for computers to read text, hear speech, interpret it, measure sentiment, and even determine which parts of the text or speech are important”. <ref name="”4”"></ref>
Line 22: Line 23:
From its beginnings has a chatbot, it’s becoming the GitHub of ML. The platform offers 100,000 pre-trained [[models]] and 10,000 [[datasets]] for NLP, speech, computer vision, [[time-series]], biology, [[reinforcement learning]], chemistry, and others. <ref name="”7”">Krishna, S (2022). Hugging Face takes step toward democratizing AI and ML. VentureBeat. https://venturebeat.com/ai/hugging-face-steps-toward-democratizing-ai-and-ml-with-latest-offering%EF%BF%BC/</ref> Around 5,000 companies use Hugging Face, <ref name="”6”"></ref> and it has over 1,200 contributors and 25,800 users. <ref name="”1”"></ref>
From its beginnings has a chatbot, it’s becoming the GitHub of ML. The platform offers 100,000 pre-trained [[models]] and 10,000 [[datasets]] for NLP, speech, computer vision, [[time-series]], biology, [[reinforcement learning]], chemistry, and others. <ref name="”7”">Krishna, S (2022). Hugging Face takes step toward democratizing AI and ML. VentureBeat. https://venturebeat.com/ai/hugging-face-steps-toward-democratizing-ai-and-ml-with-latest-offering%EF%BF%BC/</ref> Around 5,000 companies use Hugging Face, <ref name="”6”"></ref> and it has over 1,200 contributors and 25,800 users. <ref name="”1”"></ref>


==Models, datasets, and spaces==
==Models==
 
[[File:3. Example of full model card-source-towardsdatascience.png|thumb|Figure 1. Model card elements. Source: Towards Data Science.]]
[[File:3. Example of full model card-source-towardsdatascience.png|thumb|Figure 1. Model card elements. Source: Towards Data Science.]]
[[File:4. Dataset elements-source-towardsdatascience.png|thumb|Figure 2. Dataset card. Source: Towards Data Science.]]
Creating a new [[model]] in the platform is actually a Git repo for the files related to an ML model a user wants to share with the same characteristics as versioning, branches, and discoverability to name a few. The type of OS license attributed to the contributing model and assets can also be specified as well as defining its visibility. <ref name="”2”"></ref>
[[File:5. Spaces-towardsdatascience.png|thumb|Figure 3. Hugging Face's Spaces. Source: Towards Data Science.]]
 
Creating a new model in the platform is actually a Git repo for the files related to an ML model a user wants to share with the same characteristics as versioning, branches, and discoverability to name a few. The type of OS license attributed to the contributing model and assets can also be specified as well as defining its visibility. <ref name="”2”"></ref>


The model in the user interface has several elements (figure 1):
The model in the user interface has several elements (figure 1):


   
*Name, likes, and associated tags.
*Name, likes, and associated tags.
*Main body of the model card where an overview of the model can be given, code snippets for how to use it, and other relevant information.
*Main body of the [[model card]] where an overview of the model can be given, code snippets for how to use it, and other relevant information.
*Train, fine tune, or deploy the model. This is done by pointing at an AWS Sagemaker instance, or using Hugging Face’s own infrastructure.
*[[Train]], [[fine-tune]], or [[deploy]] the model. This is done by pointing at an AWS Sagemaker instance or using Hugging Face’s own infrastructure.
*Metadata with information about datasets that were used to train the model and the Spaces that use it. <ref name="”2”"></ref>
*Metadata with information about datasets that were used to train the model and the Spaces that use it. <ref name="”2”"></ref>


Datasets are used to help with model training or fine-tuning and they are available in multiple languages. The company’s datasets library provides users with an easy way to load datasets and the most commonly used operations for processing them (e.g. sampling, shuffling, filtering, etc.). <ref name="”3”"></ref>
==Datasets==
[[File:4. Dataset elements-source-towardsdatascience.png|thumb|Figure 2. Dataset card. Source: Towards Data Science.]]
[[Datasets]] are used to help with model training or fine-tuning and they are available in multiple languages. The company’s datasets library provides users with an easy way to load datasets and the most commonly used operations for processing them (e.g. sampling, shuffling, filtering, etc.). <ref name="”3”"></ref>


When creating a new dataset, the user will also have to name it and choose its license type. <ref name="”2”"></ref> The dataset elements on the platform (figure 2) include:
When creating a new dataset, the user will also have to name it and choose its license type. <ref name="”2”"></ref> The dataset elements on the platform (figure 2) include:
Line 49: Line 47:
*Metadata about the origin, size, and models trained on the dataset. <ref name="”2”"></ref>
*Metadata about the origin, size, and models trained on the dataset. <ref name="”2”"></ref>


The platform also has Spaces (figure 3), a place to showcase the work done in a self contained ML demo app. The community actively contributes to it and a user can look for inspiration checking out different submissions. <ref name="”2”"></ref>
==Spaces==
[[File:5. Spaces-towardsdatascience.png|thumb|Figure 3. Hugging Face's Spaces. Source: Towards Data Science.]]
The platform also has Spaces (figure 3), a place to showcase the work done in a self-contained ML demo app. The community actively contributes to it and a user can look for inspiration by checking out different submissions. <ref name="”2”"></ref>


==Community==
==Community==
The vast community contribution of [[models]], [[datasets]], and [[spaces]] can be accessed through the platform. Most of the models in this repository are built in [[PyTorch]]. Sometimes, alternatives for the main tasks are also available in [[TensorFlow]] and other [[ML libraries]]. <ref name="”2”"></ref>


The vast community contribution of models, datasets, and spaces can be accessed though the platform. Most of the models in this repository are built in PyTorch. Sometimes, alternatives for the main tasks are also available in TensorFlow and other ML libraries. <ref name="”2”"></ref>
A quality of life feature that saves time from exploring through the community models is Tasks. This provides a curated view of the model, dependent on the task that a user wants to accomplish. For each task, there’s an explanation in a visual and intuitive way, with diagrams, videos, and links to a demo that uses the Inference API. To complement this there are also descriptions of use cases and task variants. <ref name="”2”"></ref>
 
A quality of life feature that saves time from exploring through the community models is Tasks. This provides a curated view of the model, dependent on the task that a user wants to accomplish. For each task there’s an explanation in a visual and intuitive way, with diagrams, video, and links to a demo that uses the Inference API. To complement this there are also descriptions of use cases and task variants. <ref name="”2”"></ref>


==Inference Endpoints==
==Inference Endpoints==

Latest revision as of 12:47, 21 February 2023

Introduction

Hugging Face is a company and model hub that works on the field of artificial intelligence (AI), self-described as the “home of machine learning.” [1] It’s a community and data science platform that provides both tools that empower users to build, train, and deploy machine learning (ML) models that are based on open-source code, and a place where a community of researchers, data scientists, and ML engineers can participate by sharing ideas and contributing to open source projects. [2] Its open-source hub offers a library of state-of-the-art models for Natural Language Processing (NLP), computer vision, and others that are relevant to AI. In August 2022, there were more than 61 thousand pre-trained models. Technological giants like Microsoft, Google, Facebook, Apple, AWS, and others have used Hugging Face’s models, datasets, and libraries. [3] [4]

The company began by offering a chat platform in 2017. Then, it focused on NLP, creating an NLP library that made easily accessible resources like transformers, datasets, tokenizers, etc. Releasing a wide variety of tools made them popular among big tech companies. [5] NLP technologies can help to bridge the communication gap between humans and machines since computers do not process information in the same way. [5] With these systems, “it is possible for computers to read text, hear speech, interpret it, measure sentiment, and even determine which parts of the text or speech are important”. [4]

As work practices have become more flexible, there has been an increase in the adoption of tools for remote collaboration between data science teams, experts, and amateurs. Sharing knowledge and resources is gaining relevance in AI in order to advance the field since probably no single company will be able to “solve” it on its own. Hugging Face embraces this community work by providing a community “Hub,” a place where users can share and examine models and datasets, therefore contributing to its goal of democratizing AI for all. [3] It is like the GitHub for AI models.

In 2019, the company raised $15 million to build a comprehensive NLP library. In 2021, it raised another $40 million in a Series B funding round in which existing investors like Lux Capital, A.Capital, and Betaworks participated. [1] [4] [6] Besides increasing its funding, Hugging Face has also acquired Gradio, “a platform that enables anyone to demo their ML models through a web-based interface.” [1]

Benefits, characteristics, and impact

There are several advantages in using Hugging face transformers library:

  • Ease of use.
  • State-of-the-art models.
  • Lower computing costs.
  • Easily customizable/adaptable models to different use cases.
  • High-level natural language understanding and generation.
  • High-level computer vision and audio tasks. [1]

The company is known for its contributions to the field of NLP. Its NLP tasks include text classification and generation, translation, summarization, fill-mask, question-answering, zero-shot classification and sentence similarity. Regarding audio tasks, it includes speech recognition, text-to-speech, automatic speech recognition, and audio classification. [3]

From its beginnings has a chatbot, it’s becoming the GitHub of ML. The platform offers 100,000 pre-trained models and 10,000 datasets for NLP, speech, computer vision, time-series, biology, reinforcement learning, chemistry, and others. [7] Around 5,000 companies use Hugging Face, [6] and it has over 1,200 contributors and 25,800 users. [1]

Models

Figure 1. Model card elements. Source: Towards Data Science.

Creating a new model in the platform is actually a Git repo for the files related to an ML model a user wants to share with the same characteristics as versioning, branches, and discoverability to name a few. The type of OS license attributed to the contributing model and assets can also be specified as well as defining its visibility. [2]

The model in the user interface has several elements (figure 1):

  • Name, likes, and associated tags.
  • Main body of the model card where an overview of the model can be given, code snippets for how to use it, and other relevant information.
  • Train, fine-tune, or deploy the model. This is done by pointing at an AWS Sagemaker instance or using Hugging Face’s own infrastructure.
  • Metadata with information about datasets that were used to train the model and the Spaces that use it. [2]

Datasets

Figure 2. Dataset card. Source: Towards Data Science.

Datasets are used to help with model training or fine-tuning and they are available in multiple languages. The company’s datasets library provides users with an easy way to load datasets and the most commonly used operations for processing them (e.g. sampling, shuffling, filtering, etc.). [3]

When creating a new dataset, the user will also have to name it and choose its license type. [2] The dataset elements on the platform (figure 2) include:

  • Title, likes, and tags.
  • Table of contents.
  • Main body of the dataset which can be configured to show an embedded dataset preview.
  • Quick links to the GitHub repository.
  • Code snippet to use the dataset through the platform’s python dataset library.
  • Metadata about the origin, size, and models trained on the dataset. [2]

Spaces

Figure 3. Hugging Face's Spaces. Source: Towards Data Science.

The platform also has Spaces (figure 3), a place to showcase the work done in a self-contained ML demo app. The community actively contributes to it and a user can look for inspiration by checking out different submissions. [2]

Community

The vast community contribution of models, datasets, and spaces can be accessed through the platform. Most of the models in this repository are built in PyTorch. Sometimes, alternatives for the main tasks are also available in TensorFlow and other ML libraries. [2]

A quality of life feature that saves time from exploring through the community models is Tasks. This provides a curated view of the model, dependent on the task that a user wants to accomplish. For each task, there’s an explanation in a visual and intuitive way, with diagrams, videos, and links to a demo that uses the Inference API. To complement this there are also descriptions of use cases and task variants. [2]

Inference Endpoints

Inference Endpoints is an AI-as-a-service by Hugging Face with the goal of facilitating the implementation of ML projects by easily helping to “deploy Transformers, Diffusers or any model on dedicated, fully managed infrastructure.” [7] [8] According to VentureBeat, “The AI-as-a-service offering is designed to be a solution to take on large workloads of enterprises — including in regulated industries that are heavy users of transformer models, like financial services (e.g., air gapped environments), healthcare services (e.g., HIPAA compliance) and consumer tech (e.g., GDPR compliance). The company claims that Inference Endpoints will enable more than 100,000 Hugging Face Hub users to go from experimentation to production in just a couple of minutes”. [7]

In this service, the user will select the model to deploy, choose the cloud provider and region, and specify the security settings. Any ML model, from transformers to diffusers, can be deployed. [7] [8]

Pricing

Hugging Face uses a freemium model for its pricing. A detailed tier structure can be viewed here.

References

  1. 1.0 1.1 1.2 1.3 1.4 Romano, R (2022). An introduction to Hugging Face transformers for NLP. Qwak. https://www.qwak.com/post/an-introduction-to-hugging-face-transformers-for-nlp
  2. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 Mahmood, O (2022). What’s Hugging Face? Towards Data Science. https://towardsdatascience.com/whats-hugging-face-122f4e7eb11a
  3. 3.0 3.1 3.2 3.3 Nabeel, M. What is Hugging Face? Educative. https://www.educative.io/answers/what-is-huggingface
  4. 4.0 4.1 4.2 Syal, A (2020). Hugging Face: A Step Towards Democratizing NLP. Towards Data Science. https://towardsdatascience.com/hugging-face-a-step-towards-democratizing-nlp-2c79f258c951
  5. 5.0 5.1 Sarma, N (2023). Hugging Face pre-trained models: Find the best one for your task. Neptune.ai. https://neptune.ai/blog/hugging-face-pre-trained-models-find-the-best
  6. 6.0 6.1 Dillet, R (2021). Hugging Face raises $40 million for its natural language processing library. TechCrunch. https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library/?guce_referrer=aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnLw&guce_referrer_sig=AQAAACIYR4_aqmp84G_gD8G4LGbxperNQX6g_CtEDFPaIJ9-rf3_yCSbMhn0b4nE-oyzeK0gbOaDYg_ZBF9UVOfhOG58FUzC_cKJFEnF0YaqhE2OsWp5DljgGXCzl-J4NWMV9FrWyYhc0JSUjVvDyYSuwx096p7ABZOPQdsjU0NCJLEn
  7. 7.0 7.1 7.2 7.3 Krishna, S (2022). Hugging Face takes step toward democratizing AI and ML. VentureBeat. https://venturebeat.com/ai/hugging-face-steps-toward-democratizing-ai-and-ml-with-latest-offering%EF%BF%BC/
  8. 8.0 8.1 Hugging Face. Inference Endpoints. Hugging Face. https://huggingface.co/inference-endpoints