|
|
(70 intermediate revisions by 4 users not shown) |
Line 1: |
Line 1: |
| Prompt engineering or [[Prompt design]] is the practice of discovering the prompt that gets the best result from the [[AI system]]. <ref name="”4”"></ref> The development of prompts requires human intuition with results that can look arbitrary. <ref name="”9”">Pavlichenko, N, Zhdanov, F and Ustalov, D (2022). Best prompts for text-to-image models and how to find them. arXiv:2209.11711v2</ref> Manual prompt engineering is laborious, it may be infeasible in some situations, and the prompt results may vary between various model versions. <ref name="”3”"></ref> However, there have been developments in automated [[prompt generation]] which rephrases the input, making it more model-friendly. <ref name="”5”"></ref> | | {{see also|Prompts|Prompt engineering for image generation|Prompt engineering for text generation}} |
| | __TOC__ |
| | ==Introduction== |
| | [[Prompt engineering]], also known as [[in-context learning]], is an emerging research area within Human-Computer Interaction (HCI) that involves the formal search for [[prompts]] to produce desired outcomes from [[AI models]]. Prompt engineering involves techniques that guide the behavior of [[Large language models]] ([[LLM]]s) towards specific goals without modifying the [[model]]'s [[weights]]. As an experimental discipline, the impact of these prompting strategies can differ significantly across various [[models]], requiring extensive trial and error along with heuristic approaches. This process involves selecting and composing sentences to achieve a certain result, such as a specific visual style in [[text-to-image models]] or a different tone in the response of a [[text-to-text models|text-to-text one]]. Unlike the hard sciences of STEM fields, this is an evolving technique based on trial and error to produce effective AI outcomes. <ref name="”1”">Bouchard, L (2022). Prompting eExplained: How to Talk to ChatGPT. Louis Bouchard. https://www.louisbouchard.ai/prompting-explained/</ref> <ref name="”2”">Oppenlaender, J (2022). A Taxonomy of Prompt Modifiers for Text-To-Image Generation. arXiv:2204.13988v2</ref> <ref name="”3”">Liu, V and Chilton, LB (2021). Design Guidelines for Prompt Engineering Text-to-Image Generative Models. arXiv:2109.06977v2</ref> Prompt engineers serve as translators between "human language" and "AI language," transforming an idea into words that the AI model can comprehend. <ref name="”1”"></ref> |
|
| |
|
| A list of prompts for beginners is [https://mpost.io/top-50-text-to-image-prompts-for-ai-art-generators-midjourney-and-dall-e/ available] as well as a compilation of the best [https://mpost.io/best-10-ai-prompt-guides-and-tutorials-for-text-to-image-models-midjourney-stable-diffusion-dall-e/ prompt guides and tutorials].
| | The process of prompt engineering is similar to a conversation with the [[generative system]], with practitioners adapting and refining prompts to improve outcomes. <ref name="”2”"></ref> It has emerged as a new form of interaction with models that have learned complex abstractions from consuming large amounts of data from the internet. These models have metalearning capabilities and can adapt their abstractions on the fly to fit new tasks, making it necessary to prompt them with specific knowledge and abstractions to perform well on new tasks. The term "prompt engineering" was coined by [[Gwern]] (writer and technologist), who evaluated GPT3's capabilities on creative fiction and suggested that a new course of interaction would be to figure out how to prompt the model to elicit specific knowledge and abstractions. <ref name="”3”"></ref> |
|
| |
|
| ===Language models===
| | In order to get the best results from these large and powerful generative models, prompt engineering is a critical skill that users must possess. Adding certain keywords and phrases to the textual input prompts known as "[[prompt modifiers]]" can improve the aesthetic qualities and subjective attractiveness of the [[generated images]], for example. The process of prompt engineering is iterative and experimental in nature, where practitioners formulate prompts as probes into the generative models' latent space. There are various resources and guides available to novices to help them write effective input prompts for [[text-to-image generation]] systems, however, prompt engineering is still an emerging practice that requires extensive experimentation and trial and error. <ref name="”1”"></ref><ref name="”2”"></ref><ref name="”3”"></ref> |
|
| |
|
| In [[language models]] like [[GPT]], the output quality is influenced by a combination of [[prompt design]], [[sample data]], and [[temperature]] (a [[parameter]] that controls the “creativity” of the responses). Furthermore, to properly design a prompt the user has to have a good understanding of the problem, good grammar skills, and produce many iterations. <ref name="”10”">Shynkarenka, V (2020). Hacking Hacker News frontpage with GPT-3. Vasili Shynkarenka. https://vasilishynkarenka.com/gpt-3/ </ref>
| | Manual prompt engineering is laborious, it may be infeasible in some situations, and the prompt results may vary between various model versions. <ref name="”4”">Hao, Y, Chi, Z, Dong, L and Wei, F (2022). Optimizing Prompts for Text-to-Image Generation. arXiv:2212.09611v1</ref> However, there have been developments in automated [[prompt generation]] which rephrases the input, making it more model-friendly. <ref name="”5”">Ana, B (2022). Design your AI Art Generator Prompt Using ChatGPT. Towards AI. https://pub.towardsai.net/design-your-ai-art-generator-prompt-using-chatgpt-7a3dfddf6f76</ref> |
|
| |
|
| Therefore, to create a good prompt it’s necessary to be attentive to the following elements: | | Therefore, this field is important for the generation of high-quality [[AI-generated outputs]]. [[Text-to-image models]], in particular, face limitations in their text encoders, making prompt design even more crucial to produce aesthetically pleasing images with current models. <ref name="”4”"></ref> These models work based on caption matching techniques and are pre-trained using millions of [[text-image datasets]]. While a result will be generated for any prompt, the quality of the artwork is directly proportional to the quality of the prompt. <ref name="”6”">Raj, G (2022). How to Write Good Prompts for AI Art Generators: Prompt Engineering Made Easy. Decentralized Creator. https://decentralizedcreator.com/write-good-prompts-for-ai-art-generators/</ref> |
|
| |
|
| #The problem: the user needs to know clearly what he wants the generative model to do and its context. <ref name="”10”"></ref> <ref name="”11”">Robinson, R (2023). How to write an effective GPT-3 prompt. Zapier. https://zapier.com/blog/gpt-3-prompt/ </ref> For example, the AI can change the writing style of the output ("write a professional but friendly email" or "write a formal executive summary.") <ref name="”11”"></ref>. Since the AI understands natural language, the user can think of the generative model as a human assistant. Therefore, thinking “how would I describe the problem to my assistant who hasn’t done this task before?” may provide some help in defining clearly the problem and context. <ref name="”10”"></ref>
| | ==Image Generation== |
| #Grammar check: simple and clear terms. Avoid subtle meaning and complex sentences with predicates. Write short sentences with specifics at the end of the prompt. Different conversation styles can be achieved with the use of adjectives. <ref name="”10”"></ref>
| | '''[[Prompt engineering for image generation]]''' |
| #Sample data: the AI may need information to perform the task that is being asked of it. This can be a text for paraphrasing or a copy of a resume or LinkedIn profile, for example. <ref name="”11”"></ref> The data provided must be coherent with the prompt. <ref name="”10”"></ref>
| |
| #Temperature: a parameter that influences how “creative” the response will be. For creative work, the temperature should be high (e.g. .9) while for strict factual responses, a temperature of zero is better. <ref name="”10”"></ref>
| |
| #Test and iterate: test different combinations of the elements of the prompt. <ref name="”10”"></ref>
| |
|
| |
|
| Besides this, a prompt can also have other elements such as the desired length of the response, the output format ([[GPT-3]] can output various code languages, charts, and CSVs), and specific phrases that users have discovered that work well to achieve specific outcomes (e.g. “Let's think step by step,” “thinking backwards,” or “in the style of [famous person]”). <ref name="”11”"></ref>
| | ==Text Generation== |
| | '''[[Prompt engineering for text generation]]''' |
|
| |
|
| ===Text-to-image generators=== | | ==Prompt Template== |
| | {{see also|Prompt templates}} |
| | [[Prompt template]] allows the [[prompt]] to use of variables. It allows the prompt to stay largely the same while being used with different input values. |
|
| |
|
| [[File:11a. Without Unbundling.png|thumb|Figure 4a. Without unbundling . Prompt: Kobe Bryant shooting free throws, in the style of The Old Guitarist by Pablo Picasso, digital art. Source: DecentralizedCreator.]]
| | ==Products== |
| [[File:11b. With Unbundling.png|thumb|Figure 4b. With unbundling. Prompt: Kobe Bryant shooting free throws, The painting has a simple composition, with just three primary colors: red, blue and yellow. However, it is also packed with hidden meanings and visual complexities, digital art. Source: DecentralizedCreator.]]
| | *[[LangChain]] - library for combining language models with other components to build applications. |
| | |
| Some basic elements influence the quality of a [[text-to-image]] prompt. While these elements will work on different generator models, their impact on the final image quality may be different.
| |
| *Nouns: denotes the subject in a prompt. The generator will produce an image without a noun although not meaningful. <ref name="”12”">Raj, G (2022). How to write good prompts for AI art generators: Prompt engineering made easy. Decentralized Creator. https://decentralizedcreator.com/write-good-prompts-for-ai-art-generators/ </ref>
| |
| | |
| *Adjectives: can be used to try to convey an emotion or be used more technically (e.g. beautiful, magnificent, colorful, massive). <ref name="”12”"></ref>
| |
| *Artist names: the art style of the chosen artist will be included in the [[image generation]]. There is also an unbundling technique (figures 4a and 4b) that proposes a “long description of a particular style of the artist’s various characteristics and components instead of just giving the artist names.” <ref name="”12”"></ref>
| |
| *Style: instead of using the style of artists, the prompt can include keywords related to certain styles like “surrealism,” “fantasy,” “contemporary,” “pixel art”, etc. <ref name="”12”"></ref>
| |
| *Computer graphics: keywords like “octane render,” “Unreal Engine,” or “Ray Tracing” can enhance the effectiveness and meaning of the artwork. <ref name="”12”"></ref>
| |
| *Quality: quality of the generated image (e.g. high, 4K, 8K). <ref name="”12”"></ref>
| |
| *Art platform names: these keywords are another way to include styles. For example, “trending on Behance, “Weta Digital”, or “trending on artstation.” <ref name="”12”"></ref>
| |
| *Art medium: there is a multitude of art mediums that can be chosen to modify the AI-generated image like “pencil art,” “chalk art,” “ink art,” “watercolor,” “wood,” and others. <ref name="”12”"></ref> | |
| | |
| In-depth lists with modifier prompts can be found [https://decentralizedcreator.com/write-good-prompts-for-ai-art-generators/ here] and [https://aesthetics.fandom.com/wiki/List_of_Aesthetics here].
| |
| | |
| ====Midjourney====
| |
| | |
| [[File:4. Styles in Midjourney.png|thumb|Figure 5. Midjourney elements. Source: MLearning.ai.]]
| |
| [[File:5. Midjourney Styles words.png|thumb|Figure 6. Different keywords for styles result in different outputs. Source: MLearning.ai.]]
| |
| [[File:6. Rendering and lighting properties as style.png|thumb|Figure 7. Different lighting options. Source: MLearning.ai.]]
| |
| [[File:7. Midjourney Chaos.png|thumb|Figure 8. Chaos option. Source. MLearning.ai.]]
| |
| | |
| In [[Midjourney]], a very descriptive text will result in a more vibrant and unique output. <ref name="”16”">Nielsen, L (2022). An advanced guide to writing prompts for Midjourney ( text-to-image). Mlearning. https://medium.com/mlearning-ai/an-advanced-guide-to-writing-prompts-for-midjourney-text-to-image-aa12a1e33b6</ref> Prompt engineering for this [[AI image generator]] follows the same basic elements as all others (figure 5) but some keywords and options will be provided here that are known to work well with this system.
| |
| | |
| *Style: standard, pixar movie style, anime style, cyber punk style, steam punk style, waterhouse style, bloodborne style, grunge style (figure 6). An artist’s name can also be used. <ref name="”16”"></ref>
| |
| *Rendering/lighting properties: volumetric lighting, octane render, softbox lighting, fairy lights, long exposure, cinematic lighting, glowing lights,and blue lighting (figure 7). <ref name="”16”"></ref>
| |
| *Style setting: adding the command –s <number> after the prompt will increase or decrease the stylize option (e.g. /imagine firefighters --s 6000). <ref name="”16”"></ref>
| |
| *Chaos: a setting to increase abstraction (figure 8) using the command /imagine prompt --chaos <a number from 0 to 100> (e.g. /imagine Eiffel tower --chaos 60). <ref name="”16”"></ref>
| |
| *Resolution: the resolution can be inserted in the prompt or using the standard commands --hd and --quality or --q <number>. <ref name="”16”"></ref>
| |
| *Aspect ratio: the default aspect ratio is 1:1. This can be modified with the comman --ar <number: number> (e.g. /imagine jasmine in the wild flower --ar 4:3). A custom size image can also be specified using the command --w <number> --h <number> after the prompt. <ref name="”16”"></ref>
| |
| *Images as prompts: Midjourney allows the user to use images to get outputs similar to the one used. This can be done by inserting a URL of the image in the prompt (e.g. /imagine http://www.imgur.com/Im3424.jpg box full of chocolates). Multiple images can be used. <ref name="”16”"></ref>
| |
| *Weight: increases or decreases the influence of a specific prompt keyword or image on the output. For text prompts, the command ::<number> should be used after the keywords according to their intended impact on the final image (e.g. /imagine wild animals tiger::2 zebra::4 lions::1.5). <ref name="”16”"></ref>
| |
| *Filter: to discard unwanted elements from appearing in the output use the --no <keyword> command (e.g./imagine KFC fried chicken --no sauce). <ref name="”16”"></ref>
| |
| | |
| ====DALL-E====
| |
| | |
| For [[DALL-E]], a tip is to write adjectives + nouns instead of verbs or complex scenes. To this, the user can add keywords like “gorgeous,” “amazing,” and “beautiful,” plus “digital painting,” “oil painting”, etc., and “unreal engine,” or “unity engine.” <ref name="”17”">Strikingloo (2022). Text to image art: Experiments and prompt guide for DALL-E Mini and other AI art models. Strikingloo. https://strikingloo.github.io/art-prompts </ref>
| |
| | |
| Other templates can be used that work well with this model:
| |
| | |
| *A photograph of X, 4k, detailed.
| |
| *Pixar style 3D render of X.
| |
| *Subdivision control mesh of X.
| |
| *Low-poly render of X; high resolution, 4k.
| |
| *A digital illustration of X, 4k, detailed, trending in artstation, fantasy vivid colors. <ref name="”17”"></ref>
| |
| | |
| Other user experiments can be accessed [https://strikingloo.github.io/DALL-E-2-prompt-guide here]. <ref name="”17”"></ref>
| |
| | |
| ====Stable Diffusion====
| |
| | |
| Overall, prompt engineering in [[Stable Diffusion]] doesn’t differ from other AI image-generating models. However, it should be noted that it also allows prompt weighting and negative prompting. <ref name="”18”">DreamStudio. Prompt guide. DreamStudio. https://beta.dreamstudio.ai/prompt-guide</ref>
| |
| | |
| *Prompt weighting: varies between 1 and -1. Decimals can be used to reduce a prompt’s influence. <ref name="”18”">DreamStudio. Prompt guide. DreamStudio. https://beta.dreamstudio.ai/prompt-guide</ref>
| |
| *Negative prompting: in DreamStudo negative prompts can be added by using | <negative prompt>: -1.0 (e.g. | disfigured, ugly:-1.0, too many fingers:-1.0). <ref name="”18”"></ref>
| |
| | |
| ====Jasper Art====
| |
| | |
| [[Jasper Art]] is similar to DALL-E 2 but results are different since Jasper gives priority to [[Natural Language Processing]] ([[NLP]]), being able to handle complex sentences with semantic articulation. <ref name="”19”">The Jasper Whisperer (2022). Improve your AI text-to-image prompts with enhanced NLP. Bootcamp. https://bootcamp.uxdesign.cc/improve-your-ai-text-to-image-prompts-with-enhanced-nlp-fc804964747f</ref>
| |
| | |
| There has been some experimentation with narrative prompts, an alternative to the combinations of keywords in a prompt, using instead more expressive descriptions. <ref name="”19”"></ref> For example, instead of using “tiny lion cub, 8k, kawaii, adorable eyes, pixar style, winter snowflakes, wind, dramatic lighting, pose, full body, adventure, fantasy, renderman, concept art, octane render, artgerm,” convert it to a sentence as if painting with words like, “Lion cub, small but mighty, with eyes that seem to pierce your soul. In a winter wonderland, he stands tall against the snow, wind ruffling his fur. He seems almost like a creature of legend, ready for an adventure. The lighting is dramatic and striking, and the render is breathtakingly beautiful.” <ref name="”19”"></ref>
| |
|
| |
|
| ==References== | | ==References== |
| <references /> | | <references /> |
| | |
| | [[Category:Terms]] |
| | [[Category:AI Terms]] |
- See also: Prompts, Prompt engineering for image generation and Prompt engineering for text generation
Introduction
Prompt engineering, also known as in-context learning, is an emerging research area within Human-Computer Interaction (HCI) that involves the formal search for prompts to produce desired outcomes from AI models. Prompt engineering involves techniques that guide the behavior of Large language models (LLMs) towards specific goals without modifying the model's weights. As an experimental discipline, the impact of these prompting strategies can differ significantly across various models, requiring extensive trial and error along with heuristic approaches. This process involves selecting and composing sentences to achieve a certain result, such as a specific visual style in text-to-image models or a different tone in the response of a text-to-text one. Unlike the hard sciences of STEM fields, this is an evolving technique based on trial and error to produce effective AI outcomes. [1] [2] [3] Prompt engineers serve as translators between "human language" and "AI language," transforming an idea into words that the AI model can comprehend. [1]
The process of prompt engineering is similar to a conversation with the generative system, with practitioners adapting and refining prompts to improve outcomes. [2] It has emerged as a new form of interaction with models that have learned complex abstractions from consuming large amounts of data from the internet. These models have metalearning capabilities and can adapt their abstractions on the fly to fit new tasks, making it necessary to prompt them with specific knowledge and abstractions to perform well on new tasks. The term "prompt engineering" was coined by Gwern (writer and technologist), who evaluated GPT3's capabilities on creative fiction and suggested that a new course of interaction would be to figure out how to prompt the model to elicit specific knowledge and abstractions. [3]
In order to get the best results from these large and powerful generative models, prompt engineering is a critical skill that users must possess. Adding certain keywords and phrases to the textual input prompts known as "prompt modifiers" can improve the aesthetic qualities and subjective attractiveness of the generated images, for example. The process of prompt engineering is iterative and experimental in nature, where practitioners formulate prompts as probes into the generative models' latent space. There are various resources and guides available to novices to help them write effective input prompts for text-to-image generation systems, however, prompt engineering is still an emerging practice that requires extensive experimentation and trial and error. [1][2][3]
Manual prompt engineering is laborious, it may be infeasible in some situations, and the prompt results may vary between various model versions. [4] However, there have been developments in automated prompt generation which rephrases the input, making it more model-friendly. [5]
Therefore, this field is important for the generation of high-quality AI-generated outputs. Text-to-image models, in particular, face limitations in their text encoders, making prompt design even more crucial to produce aesthetically pleasing images with current models. [4] These models work based on caption matching techniques and are pre-trained using millions of text-image datasets. While a result will be generated for any prompt, the quality of the artwork is directly proportional to the quality of the prompt. [6]
Image Generation
Prompt engineering for image generation
Text Generation
Prompt engineering for text generation
Prompt Template
- See also: Prompt templates
Prompt template allows the prompt to use of variables. It allows the prompt to stay largely the same while being used with different input values.
Products
- LangChain - library for combining language models with other components to build applications.
References
- ↑ 1.0 1.1 1.2 Bouchard, L (2022). Prompting eExplained: How to Talk to ChatGPT. Louis Bouchard. https://www.louisbouchard.ai/prompting-explained/
- ↑ 2.0 2.1 2.2 Oppenlaender, J (2022). A Taxonomy of Prompt Modifiers for Text-To-Image Generation. arXiv:2204.13988v2
- ↑ 3.0 3.1 3.2 Liu, V and Chilton, LB (2021). Design Guidelines for Prompt Engineering Text-to-Image Generative Models. arXiv:2109.06977v2
- ↑ 4.0 4.1 Hao, Y, Chi, Z, Dong, L and Wei, F (2022). Optimizing Prompts for Text-to-Image Generation. arXiv:2212.09611v1
- ↑ Ana, B (2022). Design your AI Art Generator Prompt Using ChatGPT. Towards AI. https://pub.towardsai.net/design-your-ai-art-generator-prompt-using-chatgpt-7a3dfddf6f76
- ↑ Raj, G (2022). How to Write Good Prompts for AI Art Generators: Prompt Engineering Made Easy. Decentralized Creator. https://decentralizedcreator.com/write-good-prompts-for-ai-art-generators/