GPT-3

From AI Wiki
Revision as of 13:42, 27 January 2023 by PauloPacheco (talk | contribs) (Created page with "GPT-3 (Generative Pre-Trained Transformer) is a language model developed by OpenAI. It is the third iteration of the autoregressive model that uses deep learning to generate text based on natural language inputs (prompts). Not only does it create human-like text but also is able to produce code or other data. The system has been used to generate poetry, write role-playing adventures and to create basic apps. <ref name="”1”"> Fl...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

GPT-3 (Generative Pre-Trained Transformer) is a language model developed by OpenAI. It is the third iteration of the autoregressive model that uses deep learning to generate text based on natural language inputs (prompts). Not only does it create human-like text but also is able to produce code or other data. The system has been used to generate poetry, write role-playing adventures and to create basic apps. [1] [2] GPT-3 was announced in May of 2020 for a few beta testers and afterwards released has a commercial API in order to fund its research. [2] [3] [4] [5] In November 18, 2021, OpenAI stated that they had removed the waitlist to access the API. [5]

In the area of artificial intelligence (AI), GPT-3 is called a neural network, a mathematical system inspired by the working of brain neurons. This network learns its skills by finding patterns in enormous amounts of digital data. It's the same type of technology that allows for facial recognition or the identification of voice commands on smartphones. [3]

According to Floridi & Chiriatti (2020), "the language model is trained on an unlabeled dataset that is made up of texts, such as Wikipedia and many other sites, primarily in English, but also in other languages. These statistical models need to be trained with large amounts of data to produce relevant results." [1] When the GPT-3 system receives a prompt, it produces a text completion in natural language and developers can tweak the program by showing it a few examples that demonstrate their intended goal. [6]

In 2018, the first version of GPT used 110 million learning parameters, a value similar to BERT base—another language model. The second version, in 2019, saw an increase of the parameters to 1.5 billion. The third version of the language model uses 175 billion parameters and it's trained on Microsoft's Azure's AI supercomputer with an estimated cost of $12 million. [1] [2] GPT-3 has 96 layers trained on a set of 499 billion tokens of web content. This makes it the largest language model developed to date, allowing it to produce more coherent and convincing texts in different styles of writing than its previous versions. [7] [8]

GPT-3 is a powerful model with various applications like automatic chat, search matching, code generation, article creation, and summarizing but it still has security and uncontrollability problems. These can include false content and biased information. For now, it's value is mainly on intelligent auxiliary tasks like summarizing reports or creating content that can then be manually reviewed and edited. [9]

Few-shot learning

Some AI researchers consider GPT-3 a step toward machines that can comprehend the complexities of human language. The system has been used in several experiments to generate tweets, poetry, summarizing emails, answering trivia questions, translating languages and coding computer programs, with little input from the user. [3] Indeed, OpenAI has suggested that the model is capable of performing zero-shot or few-shot learning, meaning that it can perform new tasks without extra training cycles. Instead, being shown a few example by prompts is sufficient for the model to learn. [2] [4]

Few-shot learning is one of the main problems in AI research. The ability to learn complex tasks from a few examples is a human characteristic that still eludes machines. Learning with no examples is called zero-shot. While GPT-3 is not the final solution to few-shot learning, it's an important development by showing the potential of scaling up to improve few-shot performance, as it did from the previous version, GPT-2. [2]

GPT-3 apps

Nine months after launch, more than 300 applications where using GPT-3, with tens of thousands of developers building on the platform. On average, 4.5 billion words per day were being generated. [6]

The OpenAI's language model increased interest in large language model (LLM) research with companies like Google (PaLM), Meta (OP-175B), Nvidia (Megatron MT-NLG) and DeepMind (Chinchilla) investing in their own LLMs. Not only that but also companies and startups have started to create or integrate GPT-3 in their products. Companies like Hugging Face, Cohere and Humanloop are also providing LLM APIs, competing with OpenAI. GPT-3 also fueled the development of various open-source projects with the goal of making LLMs available to a wider audience (BigScience's BLOOM and EleutherAI's GPT-J). [4]

OpenAI GPT-3 Playground

OpenAI made available its API service for GPT-3 on November 18, 2021. Developers can build applications based on the language model without the need to sign up for a waitlist. A simple web version of GPT-3 has also been made available, GPT-3 Playground. [10] [11]

A user can try out Playground for free, although a sign up is needed with an email address and a phone number. [12]

Development

In 2017, research labs at Google and OpenAI started working on neural networks that learned from massive amounts of text; these included Wikipedia articles and unpublished books. GPT-3 is the result of several years of work by OpenAI, which is backed by funding from Microsoft. [3] Its previous version, GPT-2, was announced in February 2019 and had 1.5 billion parameters trained on 40 GB of text (around 10 billion tokens). [7]

When released in June 2020, GPT-3 was the largest neural network and it captured the attention of media, researchers, and AI businesses. OpenAI decided to launch it as a private API, filtering who could use the system through a waitlist. [5]

The development of GPT-3 would not be possible without innovations like using "neural networks to generate word vector representations. Instead of using the word themselves in a machine learning algorithm, the idea is to first represent the words as mathematical vectors." [2] Another relevant innovation was the use of recurrent neural networks (RNN) to "read" sentences. The introduction of Transformer architecture is another milestone that allowed the creation of deeper neural networks (DNN) now available to researchers in the area of natural language processing (NLP). This new architecture permitted the development of much larger language models. [2]

GPT-3 continues to improve. According to OpenAI, new features have been added such as:

  • Answers endpoint: "Searches provided information (documents, knowledge bases etc.) for relevant context to be added to the prompt before completing with GPT-3."
  • Classification endpoint: leverages labeled training data without fine-tuning.
  • Enhanced search endpoint: provides the framework for the "Answers and Classifications endpoints that scales to a large number of documents while also being cheap and fast."
  • Safety: improvements to reduce the instances of bias and misuse.
  • Prompt library: examples of starter prompt designs for several use cases. [6]

Emergent abilities

GPT-3 demonstrated an interesting quality: it's general approach. Generally, in machine learning, a model is trained for a specific task and it'll be limited to that. However, OpenAI's language model can do different task without fine-tuning (no additional training). [2] This came about due to scaling the model and data size which had the effect of producing emergent abilities, executed through prompts "without expensive supervised data or repeated re-training". [13]

From it's original intent of generating text based on an initial input, GPT-3 has been able to translate between languages, perform, decently, reading comprehension tasks and answer questions analogous to SAT exams with some level of accuracy. [2] [3] Developers also showed that it was possible for the language model to create JavaScript code from a prompt, something that GPT-3 developers did not expect. This seems to be a consequence of the training process for the model, in which text that was used had samples of code. [2] [3] Dario Amodei, OpenAI's vice-president for research said that GPT-3 "has this emergent quality. It has some ability to recognize the pattern that you gave it and complete the story, give another example.” [3]

Pricing

GPT-3 service is charged per token, which is a part of a word. There’s a $18 free credit available during the first three months of use. On September 2022, OpenAI reduced the prices for the GPT-3 API service. The pricing plans can be checked at OpenAI´s website. [5]

Limitations

Dale (2021) and Brown et al. (2020) have noted several limitations of OpenAI's language model system in text synthesis and NLP tasks. [7] [14] Dale (2021) points out that "its outputs may lack semantic coherence, resulting in text that is gibberish and increasingly nonsensical as the output grows longer". Also, the system might take in biases that may be found in its training data and the system's outputs "may correspond to assertions that are not consonant with the truth." (5) Brown et al. (2020) also notes the loss of coherence over sufficiently long passages with contradictions and non-sequitur sentences. [14]

Si et al. (2022) highlights different studies that investigate GPT-3 capabilities on specific tasks like mathematical reasoning, multi-hop reasoning, and code generation. The authors note that while language models have improved, they are still not reliable, GPT-3 included. [13]

Zhang & Li (2021) wrote that "GPT-3 is still at the stage of perceptual intelligence, and a significant amount of development is required before it reaches general intelligence and cognitive intelligence." They note that the model has not met expectations in "natural language reasoning, filling in the blanks, long text generation, and reading comprehension tasks." [9]

Use

Universal language models can be useful to a broad range of applications such as automatic summarizing news articles or chatbots for online conversation. GPT-3 promises to take those capabilities further by being able to speed the development of smartphone apps or improving the chatbots so that the conversation seems closer to that of a human being. [3] This last use case is one that has been proven to be effective. GPT-3 is given a small program function or description and it can produce code in different languages. CoPilot is a specific version of GPT-3 for code generation. The recent version of GPT-3's Codex program, which is trained for code generation, can even identify bugs, fix mistakes in its code and explain the function of that code. [15]

Other use cases exist that are new opportunities for businesses and professionals that rely on content creation. Automated content generation, where content like articles, blog post, and social media posts can be created by the AI language model, saving time for business and professionals who need to create content frequently. There's also the possibility to improve content quality since the AI model can learn from a large amount of data, identifying patterns that otherwise could not be perceived. The result could be an increase in accuracy and more informative content. [15]

GPT-3 used together with another AI model that can generate image of video will increase the content variety, diversifying the output of businesses and making it interesting to a wider range of people. The system can also create personalized content tailored to the preferences of individual users. [15]

This generative model has been used more commonly for marketing applications. Jasper is a "marketing-focused version of GPT-3 that can create blogs, social media posts, web copy, sales emails, ads, and other types of customer-facing content." [15]

There are over 300 apps that use GPT-3 in different areas such as productivity, education, creativity, and games. Some examples are given in OpenAI's blog such as Viable, Fable Studio, and Algolia. [6] Other applications include OtherSideAI's Quick Responseu, an email generation program that writes in the style of writing given on an outline of the key points for the text; Kriya.aix offers personalised introduction requests; Dover.aiy produces longer versions of a short job description; and COpy.aiz writes ad copy based on a product description. [7]

Impact on creative work and future

There have been discussions around AI tools, whether for text, image, or video generation, about their impact both on creative work and on the professionals of those areas in which the AI software acts upon. For now, the user still has a lot of input since there is still a lot of prompting, editing, and iteration but future technological developments can lead to a decrease of those activities. [15]

According to Floridi & Chiriatti (2020), text generation models like GPT-3 signal "the arrival of a new age in which we can now mass produce good and cheap semantic artefacts. Translations, summaries, minutes, comments, webpages, catalogues, newspaper articles, guides, manuals, forms to fill, reports, recipes… soon an AI service may write, or at least draft, the necessary texts, which today still require human effort." The authors envision a future in which writers will have less work and "people whose jobs still consist in writing will be supported, increasingly, by tools such as GPT-3". These professionals will need to develop skills in prompting for the software to deliver the best results, an to collect and combine the results obtained. [1]

Finally, Floridi & Chiriatti (2020) also envision a world where "given the business models of many online companies, clickbait of all kinds will be boosted by tools like GPT-3, which can produce excellent prose cheaply, quickly, purposefully, and in ways that can be automatically targeted to the reader. GPT-3 will be another weapon in the competition for users’ attention. Furthermore, the wide availability of tools like GPT-3 will support the development of “no-code platforms”, which will enable marketers to create applications to automate repetitive tasks, starting from data commands in natural language (written or spoken). [1]

Others see the development of technologies like GPT-3 not as threats to their careers but as tools to make their jobs easier like in the case of app development. GPT-3 is a tool that researchers and entrepreneurs are exploring and building new technologies, products and companies. [3]

Concerns and safety guidelines

Like OpenAI's image-generation counterpart, DALL-E, GPT-3 also raises concerns and safety guidelines are put in place in order to prevent the creation of content that includes hate, harassment, violence, self-harm, adult, political, spam, deception, and malware. [5]

Language models could also increase the production of texts with misinformation or fraudulent academic essay writing. According to Brown et al. (2020), "the misuse potential of language models increases as the quality of text synthesis improves. The ability of GPT-3 to generate several paragraphs of synthetic content that people find difficult to distinguish from human-written text[...] represents a concerning milestone in this regard." Dehouche (2021) raised his concerns regarding content generated by GPT-3, in which he notes "the possible use of this technology to facilitate scientific misconduct, and call for an urgent revision of publishing standards. I believe that the advent of this powerful NLP technology calls for an urgent update of our concepts of plagiarism." [16]

References

  1. 1.0 1.1 1.2 1.3 1.4 Floridi, L and Chiriatti, M (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines 30:681-694
  2. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 Lauret, J (2020) GPT-3: The first artificial general intelligence. Towards Data Science. https://towardsdatascience.com/gpt-3-the-first-artificial-general-intelligence-b8d9b38557a1
  3. 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 Metz, C (2020). Meet GPT-3. It has learned to code (and blog and argue). The New York Times. https://www.nytimes.com/2020/11/24/science/artificial-intelligence-ai-gpt3.html
  4. 4.0 4.1 4.2 Dickson, B (2022). OpenAI is reducing the price of the GPT-3 API—here’s why it matters. VentureBeat. https://venturebeat.com/ai/openai-is-reducing-the-price-of-the-gpt-3-api-heres-why-it-matters/
  5. 5.0 5.1 5.2 5.3 5.4 Romero, A (2021). OpenAI opens GPT-3 for everyone. Towards Data Science. https://towardsdatascience.com/openai-opens-gpt-3-for-everyone-fb7fed309f6
  6. 6.0 6.1 6.2 6.3 OpenAI (2021). GPT-3 powers the next generation of apps. OpenAI. https://openai.com/blog/gpt-3-apps/
  7. 7.0 7.1 7.2 7.3 Dale, R (2021). GPT-3: What’s it good for? Natural Language Engineering 27:113-118
  8. Heaven, WD (2020). OpenAI’s new language generator GPT-3 is shockingly good—and completely mindless. MIT Technology Review. https://www.technologyreview.com/2020/07/20/1005454/openai-machine-learning-language-generator-gpt-3-nlp/
  9. 9.0 9.1 Zhang, M and Li, J (2021). A commentary of GPT-3 in MIT Technology Review 2021. Fundamental Research 1:831-833
  10. GPT-3 Demo. OpenAI GPT-3 Playground. GPT-3 Demo. https://gpt3demo.com/apps/openai-gpt-3-playground
  11. Wu, G (2022). How to use GPT-3 in OpenAI Playground. MSN. https://www.msn.com/en-us/news/technology/how-to-use-gpt-3-in-openai-playground/ar-AA13RZw4
  12. Willison, S (2022). How to use the GPT-3 language model. Simon Willison’s Weblog. https://simonwillison.net/2022/Jun/5/play-with-gpt3/
  13. 13.0 13.1 Si, C, Gan, Z, Yang, Z, Wang, S, Wang, J, Boyd-Graber, J and Wang, L (2022). Prompring GPT-3 to be reliable. arXiv:2210.09150v1
  14. 14.0 14.1 Brown, TB, Mann, B, Ryder, N, Subbiah, M, Kaplan, J, Dhariwal, P, Neelakantan, A, Shyam, P, Sastry, G, Askell, A, Agarwal, S, Herbert-Voss, A, Krueger, G, Henighan, T, Child, R, Ramesh, A, Ziegler, DM, Wu, J, Winter, C, Hesse, C, Chen, M, Sigler; E, Litwin, M, Gray, S, Chess, B, Clark, J, Berner, C, McCandlish, S, Radford, A, Sutskever, H and Amodei, D (2020). Language models are few-shot learners. arXiv:2005.14165v4
  15. 15.0 15.1 15.2 15.3 15.4 Davenport, TH and Mittal, N (2022). How generative AI is changing creative work. Harvard Bursiness Review. https://hbr.org/2022/11/how-generative-ai-is-changing-creative-work
  16. Dehouche, N (2021). Plagiarism in the age of massive Generative Pre-Trained Transformers (GPT-3). Ethics in Science and Environmental Politics 21:17-23