Prompt: Difference between revisions

From AI Wiki
No edit summary
 
(29 intermediate revisions by 4 users not shown)
Line 1: Line 1:
==Introduction==
[[File:0. chat-email-copy Blusteak.png|thumb|Figure 1. Example of a prompt on ChatGPT. Source: Blusteak.]]
[[File:0. chat-email-copy Blusteak.png|thumb|Figure 1. Example of a prompt on ChatGPT. Source: Blusteak.]]
A [[prompt]] or an [[artificial intelligence]] ([[AI]]) prompt is a [[natural language]] set of instructions, a text, that functions as [[input]] for an [[AI generator]]. <ref name="”1”">Ana. B (2022). Design your AI Art generator prompt using ChatGPT. Towards AI. https://pub.towardsai.net/design-your-ai-art-generator-prompt-using-chatgpt-7a3dfddf6f76</ref> Simply, it is a phrase or individual keywords used in tools like [[ChatGPT]] (figure 1), a [[text-to-text]] generator, or in [[text-to-image]] generators like [[DALL-E]]. After the input, the [[AI model]] tries to interpret it and generates a response. <ref name="”2”">Schmid, S (2022).ChatGPT: How to write the perfect prompts. Neuroflash. https://neuroflash.com/chatgpt-how-to-write-the-perfect-prompts/</ref>
A [[prompt]] or an [[artificial intelligence]] ([[AI]]) prompt is a [[natural language]] set of instructions, a text, that functions as [[input]] for an [[AI generator]]. <ref name="”1”">Ana. B (2022). Design your AI Art generator prompt using ChatGPT. Towards AI. https://pub.towardsai.net/design-your-ai-art-generator-prompt-using-chatgpt-7a3dfddf6f76</ref> Simply, it is a phrase or individual keywords used in tools like [[ChatGPT]] (figure 1), a [[text-to-text]] generator, or in [[text-to-image]] generators like [[DALL-E]]. After the input, the [[AI model]] tries to interpret it and generates a response. <ref name="”2”">Schmid, S (2022).ChatGPT: How to write the perfect prompts. Neuroflash. https://neuroflash.com/chatgpt-how-to-write-the-perfect-prompts/</ref>


Line 13: Line 13:
===Text-to-text prompts===
===Text-to-text prompts===


[[ChatGPT]] is a model trained using [[Reinforcement Learning]] that interacts with the user conversationally, responding to the text input.
[[ChatGPT]] is a model trained using [[Reinforcement Learning]] that interacts with the user conversationally, responding to the text input. <ref name="”6”">OpenAI (2022). ChatGPT: Optimizing Language Models for dialogue. OpenAI. https://openai.com/blog/chatgpt/ </ref>


For a [[text-to-text model]], there are some general guidelines for a good prompt:
For a [[text-to-text model]], there are some general guidelines for a good prompt:


*Precision and clarity by avoiding long sentences with many subpoints. Easy-to-understand shorter sentences are preferable.
* Precision and clarity by avoiding long sentences with many subpoints. Easy-to-understand shorter sentences are preferable.
*Specify and contextualize the questions.
*Specify and contextualize the questions.
*Be selective regarding word choice, avoiding jargon or slang.
*Be selective regarding word choice, avoiding jargon or slang.
Line 31: Line 31:
In general, a good prompt for image generation (figure 2) should have in its structure:
In general, a good prompt for image generation (figure 2) should have in its structure:


*Subject: suggests to the AI model what scene to generate. Represented by nouns.
* Subject: suggests to the AI model what scene to generate. Represented by nouns.
*Description: additional information related to the subject. Represented by adjectives, background description, or others.
*Description: additional information related to the subject. Represented by adjectives, background description, or others.
*Style: the theme of the image, which can include artist names or custom styles like fantasy, contemporary, etc.
* Style: the theme of the image, which can include artist names or custom styles like fantasy, contemporary, etc.
*Graphics: computer graphics engine type that enforces the efectiveness of the image.
* Graphics: computer graphics engine type that enforces the efectiveness of the image.
*Quality: quality of the image (e.g. 4K). <ref name="”1”"></ref>
* Quality: quality of the image (e.g. 4K). <ref name="”1”"></ref>


While the subject of an intended image, the modifiers— words that describe the style, graphics, and quality—can elevate the quality of the image created. As an example, figure 3 illustrates the most frequently used phrases by [[Midjourney]] users. It can be seen that the modifiers are the most used in prompts. <ref name="”5”"></ref>
While the subject of an intended image, the modifiers— words that describe the style, graphics, and quality—can elevate the quality of the image created. As an example, figure 3 illustrates the most frequently used phrases by [[Midjourney]] users. It can be seen that the modifiers are the most used in prompts. <ref name="”5”"></ref>


==Prompt engineering==
==Prompt engineering==
[[Prompt design]] or [[prompt engineering]] is the practice of discovering the prompt that gets the best result from the AI system. <ref name="”4”"></ref> The development of prompts requires human intuition with results that can look arbitrary. <ref name="”9”">Pavlichenko, N, Zhdanov, F and Ustalov, D (2022). Best prompts for text-to-image models and how to find them. arXiv:2209.11711v2</ref> Manual prompt engineering is laborious, it may be infeasible in some situations, and the prompt results may vary between various model versions. <ref name="”3”"></ref> However, there have been developments in automated prompt generation which rephrases the input, making it more model-friendly. <ref name="”5”"></ref>
[[Prompt engineering]] or [[Prompt design]] is the practice of discovering the prompt that gets the best result from the [[AI system]]. <ref name="”4”"></ref> The development of prompts requires human intuition with results that can look arbitrary. <ref name="”9”">Pavlichenko, N, Zhdanov, F and Ustalov, D (2022). Best prompts for text-to-image models and how to find them. arXiv:2209.11711v2</ref> Manual prompt engineering is laborious, it may be infeasible in some situations, and the prompt results may vary between various model versions. <ref name="”3”"></ref> However, there have been developments in automated [[prompt generation]] which rephrases the input, making it more model-friendly. <ref name="”5”"></ref>


A list of prompts for beginners is [https://mpost.io/top-50-text-to-image-prompts-for-ai-art-generators-midjourney-and-dall-e/ available] as well as a compilation of the best [https://mpost.io/best-10-ai-prompt-guides-and-tutorials-for-text-to-image-models-midjourney-stable-diffusion-dall-e/ prompt guides and tutorials].
===Text-to-Text===
'''[[Prompt engineering for text generation]]'''


===Language models===
===Text-to-Image===
'''[[Prompt engineering for image generation]]'''


In language models like [[GPT]], the output quality is influenced by a combination of prompt design, sample data, and temperature (a parameter that controls the “creativity” of the responses). Furthermore, to properly design a prompt the user has to have a good understanding of the problem, good grammar skills, and produce many iterations. <ref name="”10”">Shynkarenka, V (2020). Hacking Hacker News frontpage with GPT-3. Vasili Shynkarenka. https://vasilishynkarenka.com/gpt-3/ </ref>
==Prompt generators ==
 
Therefore, to create a good prompt it’s necessary to be attentive to the following elements:
 
#The problem: the user needs to know clearly what he wants the generative model to do and its context. <ref name="”10”"></ref> <ref name="”11”">Robinson, R (2023). How to write an effective GPT-3 prompt. Zapier. https://zapier.com/blog/gpt-3-prompt/ </ref> For example, the AI can change the writing style of the output ("write a professional but friendly email" or "write a formal executive summary.") <ref name="”11”"></ref>. Since the AI understands natural language, the user can think of the generative model as a human assistant. Therefore, thinking “how would I describe the problem to my assistant who hasn’t done this task before?” may provide some help in defining clearly the problem and context. <ref name="”10”"></ref>
#Grammar check: simple and clear terms. Avoid subtle meaning and complex sentences with predicates. Write short sentences with specifics at the end of the prompt. Different conversation styles can be achieved with the use of adjectives. <ref name="”10”"></ref>
#Sample data: the AI may need information to perform the task that is being asked of it. This can be a text for paraphrasing or a copy of a resume or LinkedIn profile, for example. <ref name="”11”"></ref> The data provided must be coherent with the prompt. <ref name="”10”"></ref>
#Temperature: a parameter that influences how “creative” the response will be. For creative work, the temperature should be high (e.g. .9) while for strict factual responses, a temperature of zero is better. <ref name="”10”"></ref>
#Test and iterate: test different combinations of the elements of the prompt. <ref name="”10”"></ref>
 
Besides this, a prompt can also have other elements such as the desired length of the response, the output format ([[GPT-3]] can output various code languages, charts, and CSVs), and specific phrases that users have discovered that work well to achieve specific outcomes (e.g. “Let's think step by step,” “thinking backwards,” or “in the style of [famous person]”). <ref name="”11”"></ref>
 
===Text-to-image generators===
 
[[File:11a. Without Unbundling.png|thumb|Figure 4a. Without unbundling . Prompt: Kobe Bryant shooting free throws, in the style of The Old Guitarist by Pablo Picasso, digital art. Source: DecentralizedCreator.]]
[[File:11b. With Unbundling.png|thumb|Figure 4b. With unbundling. Prompt: Kobe Bryant shooting free throws, The painting has a simple composition, with just three primary colors: red, blue and yellow. However, it is also packed with hidden meanings and visual complexities, digital art. Source: DecentralizedCreator.]]
 
Some basic elements influence the quality of a text-to-image prompt. While these elements will work on different generator models, their impact on the final image quality may be different.
*Nouns: denotes the subject in a prompt. The generator will produce an image without a noun although not meaningful. <ref name="”12”">Raj, G (2022). How to write good prompts for AI art generators: Prompt engineering made easy. Decentralized Creator. https://decentralizedcreator.com/write-good-prompts-for-ai-art-generators/ </ref>
 
*Adjectives: can be used to try to convey an emotion or be used more technically (e.g. beautiful, magnificent, colorful, massive). <ref name="”12”"></ref>
*Artist names: the art style of the chosen artist will be included in the image generation. There is also an unbundling technique (figures 4a and 4b) that proposes a “long description of a particular style of the artist’s various characteristics and components instead of just giving the artist names.” <ref name="”12”"></ref>
*Style: instead of using the style of artists, the prompt can include keywords related to certain styles like “surrealism,” “fantasy,” “contemporary,” “pixel art”, etc. <ref name="”12”"></ref>
*Computer graphics: keywords like “octane render,” “Unreal Engine,” or “Ray Tracing” can enhance the effectiveness and meaning of the artwork. <ref name="”12”"></ref>
*Quality: quality of the generated image (e.g. high, 4K, 8K). <ref name="”12”"></ref>
*Art platform names: these keywords are another way to include styles. For example, “trending on Behance, “Weta Digital”, or “trending on artstation.” <ref name="”12”"></ref>
*Art medium: there is a multitude of art mediums that can be chosen to modify the AI-generated image like “pencil art,” “chalk art,” “ink art,” “watercolor,” “wood,” and others. <ref name="”12”"></ref>
 
In-depth lists with modifier prompts can be found [https://decentralizedcreator.com/write-good-prompts-for-ai-art-generators/ here] and [https://aesthetics.fandom.com/wiki/List_of_Aesthetics here].
 
====Midjourney====
 
[[File:4. Styles in Midjourney.png|thumb|Figure 5. Midjourney elements. Source: MLearning.ai.]]
[[File:5. Midjourney Styles words.png|thumb|Figure 6. Different keywords for styles result in different outputs. Source: MLearning.ai.]]
[[File:6. Rendering and lighting properties as style.png|thumb|Figure 7. Different lighting options. Source: MLearning.ai.]]
[[File:7. Midjourney Chaos.png|thumb|Figure 8. Chaos option. Source. MLearning.ai.]]
 
In Midjourney, a very descriptive text will result in a more vibrant and unique output. <ref name="”16”">Nielsen, L (2022). An advanced guide to writing prompts for Midjourney ( text-to-image). Mlearning. https://medium.com/mlearning-ai/an-advanced-guide-to-writing-prompts-for-midjourney-text-to-image-aa12a1e33b6</ref> Prompt engineering for this AI image generator follows the same basic elements as all others (figure 5) but some keywords and options will be provided here that are known to work well with this system.
 
*Style: standard, pixar movie style, anime style, cyber punk style, steam punk style, waterhouse style, bloodborne style, grunge style (figure 6). An artist’s name can also be used. <ref name="”16”"></ref>
*Rendering/lighting properties: volumetric lighting, octane render, softbox lighting, fairy lights, long exposure, cinematic lighting, glowing lights,and blue lighting (figure 7). <ref name="”16”"></ref>
*Style setting: adding the command –s <number> after the prompt will increase or decrease the stylize option (e.g. /imagine firefighters --s 6000). <ref name="”16”"></ref>
*Chaos: a setting to increase abstraction (figure 8) using the command /imagine prompt --chaos <a number from 0 to 100> (e.g. /imagine Eiffel tower --chaos 60). <ref name="”16”"></ref>
*Resolution: the resolution can be inserted in the prompt or using the standard commands --hd and --quality or --q <number>. <ref name="”16”"></ref>
*Aspect ratio: the default aspect ratio is 1:1. This can be modified with the comman --ar <number: number> (e.g. /imagine jasmine in the wild flower --ar 4:3). A custom size image can also be specified using the command --w <number> --h <number> after the prompt. <ref name="”16”"></ref>
*Images as prompts: Midjourney allows the user to use images to get outputs similar to the one used. This can be done by inserting a URL of the image in the prompt (e.g. /imagine http://www.imgur.com/Im3424.jpg box full of chocolates). Multiple images can be used. <ref name="”16”"></ref>
*Weight: increases or decreases the influence of a specific prompt keyword or image on the output. For text prompts, the command ::<number> should be used after the keywords according to their intended impact on the final image (e.g. /imagine wild animals tiger::2 zebra::4 lions::1.5). <ref name="”16”"></ref>
*Filter: to discard unwanted elements from appearing in the output use the --no <keyword>  command (e.g./imagine KFC fried chicken --no sauce). <ref name="”16”"></ref>
 
====DALL-E====
 
For DALL-E, a tip is to write adjectives + nouns instead of verbs or complex scenes. To this, the user can add keywords like “gorgeous,” “amazing,” and “beautiful,” plus “digital painting,” “oil painting”, etc., and “unreal engine,” or “unity engine.” <ref name="”17”">Strikingloo (2022). Text to image art: Experiments and prompt guide for DALL-E Mini and other AI art models. Strikingloo. https://strikingloo.github.io/art-prompts </ref>
 
Other templates can be used that work well with this model:
 
*A photograph of X, 4k, detailed.
*Pixar style 3D render of X.
*Subdivision control mesh of X.
*Low-poly render of X; high resolution, 4k.
*A digital illustration of X, 4k, detailed, trending in artstation, fantasy vivid colors. <ref name="”17”"></ref>
 
Other user experiments can be accessed [https://strikingloo.github.io/DALL-E-2-prompt-guide here]. <ref name="”17”"></ref>
 
====Stable Diffusion====
 
Overall, prompt engineering in Stable Diffusion doesn’t differ from other AI image-generating models. However, it should be noted that it also allows prompt weighting and negative prompting. <ref name="”18”">DreamStudio. Prompt guide. DreamStudio. https://beta.dreamstudio.ai/prompt-guide</ref>
 
*Prompt weighting: varies between 1 and -1. Decimals can be used to reduce a prompt’s influence. <ref name="”18”">DreamStudio. Prompt guide. DreamStudio. https://beta.dreamstudio.ai/prompt-guide</ref>
*Negative prompting: in DreamStudo negative prompts can be added by using | <negative prompt>: -1.0 (e.g. | disfigured, ugly:-1.0, too many fingers:-1.0). <ref name="”18”"></ref>
 
====Jasper Art====
 
[[Jasper Art]] is similar to DALL-E 2 but results are different since Jasper gives priority to [[Natural Language Processing]] ([[NLP]]), being able to handle complex sentences with semantic articulation. <ref name="”15”">The Jasper Whisperer (2022). Improve your AI text-to-image prompts with enhanced NLP. Bootcamp. https://bootcamp.uxdesign.cc/improve-your-ai-text-to-image-prompts-with-enhanced-nlp-fc804964747f</ref>
 
There has been some experimentation with narrative prompts, an alternative to the combinations of keywords in a prompt, using instead more expressive descriptions. <ref name="”15”"></ref> For example, instead of using “tiny lion cub, 8k, kawaii, adorable eyes, pixar style, winter snowflakes, wind, dramatic lighting, pose, full body, adventure, fantasy, renderman, concept art, octane render, artgerm,” convert it to a sentence as if painting with words like, “Lion cub, small but mighty, with eyes that seem to pierce your soul. In a winter wonderland, he stands tall against the snow, wind ruffling his fur. He seems almost like a creature of legend, ready for an adventure. The lighting is dramatic and striking, and the render is breathtakingly beautiful.” <ref name="”15”"></ref>
 
==Prompt generators==


[[File:2. Text prompt generator model.png|thumb|Figure 9. Example of a text prompt generator. Source: Towards Data Science.]]
[[File:2. Text prompt generator model.png|thumb|Figure 9. Example of a text prompt generator. Source: Towards Data Science.]]
Line 128: Line 54:
[[File:10. ChatGPT prompt after.png|thumb|Figure 10b. Creating prompts with ChatGPT (after). Prompt: Christmas village, magical, enchanting, wreaths, snow-covered streets, colorful buildings, sparkling, charming, detailed, glittery, shiny, twinkling lights, festive, ornate, traditional, whimsical, Christmastide, highly detailed, hyperrealistic, illustration, Unreal Engine 5,8K. Source: Towards AI.]]
[[File:10. ChatGPT prompt after.png|thumb|Figure 10b. Creating prompts with ChatGPT (after). Prompt: Christmas village, magical, enchanting, wreaths, snow-covered streets, colorful buildings, sparkling, charming, detailed, glittery, shiny, twinkling lights, festive, ornate, traditional, whimsical, Christmastide, highly detailed, hyperrealistic, illustration, Unreal Engine 5,8K. Source: Towards AI.]]


Due to the difficulty of good manual prompt development, several prompt generator models have surfaced (figure 9) that help the user in refining the text input to obtain the best result possible, automatically performing prompt engineering. <ref name="”3”"></ref> <ref name="”5”"></ref>
Due to the difficulty of good manual prompt development, several [[prompt generators|prompt generator models]] have surfaced (figure 9) that help the user in refining the text input to obtain the best result possible, automatically performing prompt engineering. <ref name="”3”"></ref> <ref name="”5”"></ref>


*[https://huggingface.co/spaces/doevent/prompt-generator Midjourney Prompt Generator]: unofficial Midjourney prompt builder. <ref name="”14”">Strikingloo (2022). Stable Diffusion: Prompt guide and examples. Strikingloo. https://strikingloo.github.io/stable-diffusion-vs-dalle-2 </ref>
*[https://huggingface.co/spaces/doevent/prompt-generator Midjourney Prompt Generator]: unofficial Midjourney prompt builder. <ref name="”14”">Strikingloo (2022). Stable Diffusion: Prompt guide and examples. Strikingloo. https://strikingloo.github.io/stable-diffusion-vs-dalle-2 </ref>
*[https://phraser.tech/ Phraser]: assists in creating stronger [[neural network]] prompts for [[Midjourney]] and [[DALL-E]]. <ref name="”13”">Yalalov, D (2023). 6 free AI prompt builders and tools that artists actually use in 2023 (Updated). Metaverse Post. https://mpost.io/6-free-prompt-builders-and-helpers-that-artists-actually-use-in-2022/ </ref>
*[https://phraser.tech/ Phraser]: assists in creating stronger [[neural network]] prompts for [[Midjourney]] and [[DALL-E]]. <ref name="”13”">Yalalov, D (2023). 6 free AI prompt builders and tools that artists actually use in 2023 (Updated). Metaverse Post. https://mpost.io/6-free-prompt-builders-and-helpers-that-artists-actually-use-in-2022/ </ref>
*[https://app.noonshot.com/midjourney MidJourney Prompt Helper]: text-to-image prompt builder developed for [[Midjourney]] and [[DALL-E]]. <ref name="”13”"></ref>
*[https://app.noonshot.com/midjourney MidJourney Prompt Helper]: text-to-image prompt builder developed for [[Midjourney]] and [[DALL-E]]. <ref name="”13”"></ref>
Drawing Prompt Generator: a prompt helper to aid with artists’ block. <ref name="”13”"></ref>
Drawing Prompt Generator: a prompt helper to aid with artists’ block. <ref name="”13”"></ref>  
*[https://promptomania.com/prompt-builder/ Promptomania Builder]: easy-to-use prompt builder for AI art generators. Works with most CLIP and VQCAN-based models, DALL-E, Midjourney, and others. <ref name="”13”"></ref>
*[https://promptomania.com/prompt-builder/ Promptomania Builder]: easy-to-use prompt builder for AI art generators. Works with most CLIP and VQCAN-based models, DALL-E, Midjourney, and others. <ref name="”13”"></ref>
*[https://blog.user.today/midjourney/ MidJourney Random Commands Generator]: unofficial Midjourney prompt generator for complex outputs. <ref name="”13”"></ref>
*[https://blog.user.today/midjourney/ MidJourney Random Commands Generator]: unofficial Midjourney prompt generator for complex outputs. <ref name="”13”"></ref>
*[https://lexica.art/ Lexica.art]: a search engine for prompts and artworks. <ref name="”14”"></ref>
*[https://lexica.art/ Lexica.art]: a search engine for prompts and artworks. <ref name="”14”"></ref>


[[ChatGPT]] can also be used to design prompts for AI image generators besides the options above. This can be achieved by asking for adjectives that describe a specific scene (figures 10a and 10b) or directly asking it to write a prompt (e.g. “Write a text prompt for an AI art generation software that would fit the art style of Kilian Eng”). <ref name="”1”"></ref> <ref name="”20”">EdXD (2022). Using GPT-3 to generate text prompts for “AI” generated art. ByteXD. https://bytexd.com/using-gpt-3-to-generate-text-prompts-for-ai-generated-art/ </ref>
[[ChatGPT]] can also be used to design prompts for AI image generators besides the options above. This can be achieved by asking for adjectives that describe a specific scene (figures 10a and 10b) or directly asking it to write a prompt (e.g. “Write a text prompt for an AI art generation software that would fit the art style of Kilian Eng”). <ref name="”1”"></ref> <ref name="”15”">EdXD (2022). Using GPT-3 to generate text prompts for “AI” generated art. ByteXD. https://bytexd.com/using-gpt-3-to-generate-text-prompts-for-ai-generated-art/ </ref>
 
==Security Risks==
*[[Prompt injection]]
 
==Prompting vs. Fine-tuning==
Prompting and [[Fine-tuning]] represent two different ways to leverage [[large language models]] (LLMs) like [[GPT-4]].
 
Fine-tuning involves adapting an LLM's [[parameters]] based on a specific [[dataset]], making it a potent tool for complex tasks where accurate, trusted output is vital. However, fine-tuning often requires a labeled dataset and is potentially expensive during the [[training]] phase.
 
Conversely, prompting is the technique of providing specific instructions to an LLM to guide its responses. It doesn't necessitate model retraining for each new prompt or data change, and thus, offers a quicker iterative process. Importantly, it doesn't require a labeled dataset, making it a viable option when training data is scant or absent. Prompting can be an excellent starting point for solving tasks, especially simpler ones, as it can be resource-friendly and computationally efficient.
 
[[File:prompting_vs_finetuning1.png|400px]]
 
Despite its advantages, prompting may underperform compared to fine-tuning for complex tasks. There's a clear trade-off in terms of [[inference]] costs. Fine-tuned models, by integrating task-specific knowledge into the model's parameters, can generate accurate responses with minimal explicit instructions or prompts, making them cheaper in the long run. In contrast, prompted models, which rely heavily on explicit instructions, can be resource-intensive and more expensive, particularly for large-scale applications. Therefore, the choice between fine-tuning and prompting will depend on the specific use case, data availability, task complexity, and computational resources.
 
==Related Pages==
*[[Prompt engineering]]
*[[Prompt injection]]


==References==
==References==
<references />
<references />
[[Category:Terms]]
[[Category:AI Terms]]

Latest revision as of 08:38, 2 August 2023

Introduction

Figure 1. Example of a prompt on ChatGPT. Source: Blusteak.

A prompt or an artificial intelligence (AI) prompt is a natural language set of instructions, a text, that functions as input for an AI generator. [1] Simply, it is a phrase or individual keywords used in tools like ChatGPT (figure 1), a text-to-text generator, or in text-to-image generators like DALL-E. After the input, the AI model tries to interpret it and generates a response. [2]

It's relevant that prompts are written in a way that the generative model will understand since there is a direct relation between prompt quality and its output. [1] [2] For example, to obtain high-quality art it is necessary to provide adequate prompts with curated keywords. [1]

Prompt design has become a relevant field of study and experimentation since it plays an essential role in the generation quality. Prompt design or engineering is the adjustment of the textual input for the model to better understand the intentions of the user and produce higher-quality results. [3] Indeed, according to Hao et al. (2022), "empirical observations also confirm that common user input is often insufficient to produce aesthetically pleasing images with current models." [3] These improvements can be achieved in all forms of AI generative systems, creating better stories, summaries, images, or videos. [1] [4]

Julia Turc, the author of the article “Crafting Prompts for Text-to-Image Models”, argues that prompting “is the newest and most extreme form of transfer learning: a mechanism that allows previously-trained model weights to be reused in a novel context.” She further expounds that “each request for an image can be seen as a new task to be accomplished by a model that was pre-trained on a vast amount of data. In a way, prompting has democratized transfer learning, but has not yet made it effortless. Writing effective prompts can require as much work as picking up a new hobby.“ [5]

Prompting overview

Text-to-text prompts

ChatGPT is a model trained using Reinforcement Learning that interacts with the user conversationally, responding to the text input. [6]

For a text-to-text model, there are some general guidelines for a good prompt:

  • Precision and clarity by avoiding long sentences with many subpoints. Easy-to-understand shorter sentences are preferable.
  • Specify and contextualize the questions.
  • Be selective regarding word choice, avoiding jargon or slang.
  • Avoid asking questions with a binary answer or general questions (e.g. “What is love?”). [2]

Text-to-image prompts

Figure 2. General prompt structure. Source: Towards AI.
Figure 3. Frequent keywords used in Midjourney. Source: Towards Data Science.

Stable Diffusion, DALL-E, Midjourney, and other text-to-image systems rely on written descriptions to generate images using algorithms to convert the text into an image. [7] [8] These system can even produce images according to a specific style like (i.e. photograph, watercolor, illustration, etc.) or artist. [8]

In general, a good prompt for image generation (figure 2) should have in its structure:

  • Subject: suggests to the AI model what scene to generate. Represented by nouns.
  • Description: additional information related to the subject. Represented by adjectives, background description, or others.
  • Style: the theme of the image, which can include artist names or custom styles like fantasy, contemporary, etc.
  • Graphics: computer graphics engine type that enforces the efectiveness of the image.
  • Quality: quality of the image (e.g. 4K). [1]

While the subject of an intended image, the modifiers— words that describe the style, graphics, and quality—can elevate the quality of the image created. As an example, figure 3 illustrates the most frequently used phrases by Midjourney users. It can be seen that the modifiers are the most used in prompts. [5]

Prompt engineering

Prompt engineering or Prompt design is the practice of discovering the prompt that gets the best result from the AI system. [4] The development of prompts requires human intuition with results that can look arbitrary. [9] Manual prompt engineering is laborious, it may be infeasible in some situations, and the prompt results may vary between various model versions. [3] However, there have been developments in automated prompt generation which rephrases the input, making it more model-friendly. [5]

Text-to-Text

Prompt engineering for text generation

Text-to-Image

Prompt engineering for image generation

Prompt generators

Figure 9. Example of a text prompt generator. Source: Towards Data Science.
Figure 10a. Creating prompts with ChatGPT (before). Prompt: beautiful village on Christmas, covered by snow, modern, unreal engine, 8K. Source: Towards AI.
Figure 10b. Creating prompts with ChatGPT (after). Prompt: Christmas village, magical, enchanting, wreaths, snow-covered streets, colorful buildings, sparkling, charming, detailed, glittery, shiny, twinkling lights, festive, ornate, traditional, whimsical, Christmastide, highly detailed, hyperrealistic, illustration, Unreal Engine 5,8K. Source: Towards AI.

Due to the difficulty of good manual prompt development, several prompt generator models have surfaced (figure 9) that help the user in refining the text input to obtain the best result possible, automatically performing prompt engineering. [3] [5]

Drawing Prompt Generator: a prompt helper to aid with artists’ block. [11]

ChatGPT can also be used to design prompts for AI image generators besides the options above. This can be achieved by asking for adjectives that describe a specific scene (figures 10a and 10b) or directly asking it to write a prompt (e.g. “Write a text prompt for an AI art generation software that would fit the art style of Kilian Eng”). [1] [12]

Security Risks

Prompting vs. Fine-tuning

Prompting and Fine-tuning represent two different ways to leverage large language models (LLMs) like GPT-4.

Fine-tuning involves adapting an LLM's parameters based on a specific dataset, making it a potent tool for complex tasks where accurate, trusted output is vital. However, fine-tuning often requires a labeled dataset and is potentially expensive during the training phase.

Conversely, prompting is the technique of providing specific instructions to an LLM to guide its responses. It doesn't necessitate model retraining for each new prompt or data change, and thus, offers a quicker iterative process. Importantly, it doesn't require a labeled dataset, making it a viable option when training data is scant or absent. Prompting can be an excellent starting point for solving tasks, especially simpler ones, as it can be resource-friendly and computationally efficient.

Prompting vs finetuning1.png

Despite its advantages, prompting may underperform compared to fine-tuning for complex tasks. There's a clear trade-off in terms of inference costs. Fine-tuned models, by integrating task-specific knowledge into the model's parameters, can generate accurate responses with minimal explicit instructions or prompts, making them cheaper in the long run. In contrast, prompted models, which rely heavily on explicit instructions, can be resource-intensive and more expensive, particularly for large-scale applications. Therefore, the choice between fine-tuning and prompting will depend on the specific use case, data availability, task complexity, and computational resources.

Related Pages

References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 Ana. B (2022). Design your AI Art generator prompt using ChatGPT. Towards AI. https://pub.towardsai.net/design-your-ai-art-generator-prompt-using-chatgpt-7a3dfddf6f76
  2. 2.0 2.1 2.2 Schmid, S (2022).ChatGPT: How to write the perfect prompts. Neuroflash. https://neuroflash.com/chatgpt-how-to-write-the-perfect-prompts/
  3. 3.0 3.1 3.2 3.3 Hao, Y, Chi, Z, Dong, L and Wei, F (2022). Optimizing prompts for text-to-image generation. arXiv:2212.09611v1
  4. 4.0 4.1 Bouchard, L (2022). Prompting explained: How to talk to ChatGPT. Louis Bouchard. https://www.louisbouchard.ai/prompting-explained/
  5. 5.0 5.1 5.2 5.3 Turc, J (2022). Crafting prompts for text-to-image models. Towards Data Science. https://towardsdatascience.com/the-future-of-crafting-prompts-for-text-to-image-models-fc7d9614cb65
  6. OpenAI (2022). ChatGPT: Optimizing Language Models for dialogue. OpenAI. https://openai.com/blog/chatgpt/
  7. Arunk89 (2023). How to write great prompts for AI text-to-image generators. Turbo Future. https://turbofuture.com/internet/How-to-write-great-prompts-for-AI-text-to-image-generator
  8. 8.0 8.1 ZMO.AI (2022). How do AI text-to-image generators work? ZMO.AI. https://www.zmo.ai/how-do-ai-text-to-image-generators-work/
  9. Pavlichenko, N, Zhdanov, F and Ustalov, D (2022). Best prompts for text-to-image models and how to find them. arXiv:2209.11711v2
  10. 10.0 10.1 Strikingloo (2022). Stable Diffusion: Prompt guide and examples. Strikingloo. https://strikingloo.github.io/stable-diffusion-vs-dalle-2
  11. 11.0 11.1 11.2 11.3 11.4 Yalalov, D (2023). 6 free AI prompt builders and tools that artists actually use in 2023 (Updated). Metaverse Post. https://mpost.io/6-free-prompt-builders-and-helpers-that-artists-actually-use-in-2022/
  12. EdXD (2022). Using GPT-3 to generate text prompts for “AI” generated art. ByteXD. https://bytexd.com/using-gpt-3-to-generate-text-prompts-for-ai-generated-art/