Prompt engineering: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 10: Line 10:


===Basic prompt structure===
===Basic prompt structure===
[[File:Prompt writing elements.png|thumb|Figure 1. Prompt writing elements. Source: Oppenlaender (2022)]]


A prompt usually includes a subject term, while any other parts of the prompt are optional. However, modifiers are often added to improve the resulting images and provide more control over the creation process. These modifiers are applied through experimentation or based on best practices learned from experience or online resources. <ref name="”2”"></ref> Modifiers can either alter the style of the generated image, for example, or boost its quality. There can be overlapping effects between style modifiers and quality boosters. Once a style modifier has been added, solidifiers (using repetition) can be applied to any of the other types of modifiers. The textual prompt can be divided into two main components: the physical and factual content of the image, and the stylistic considerations in the way the physical content is displayed. <ref name="”2”"></ref><ref name="”7”">Witteveen, S and Andrews, M (2022). Investigating Prompt Engineering in Diffusion Models. arXiv:2211.15462v1 https://arxiv.org/pdf/2211.15462.pdf</ref>
A prompt usually includes a subject term, while any other parts of the prompt are optional (figure 1). However, modifiers are often added to improve the resulting images and provide more control over the creation process. These modifiers are applied through experimentation or based on best practices learned from experience or online resources. <ref name="”2”"></ref> Modifiers can either alter the style of the generated image, for example, or boost its quality. There can be overlapping effects between style modifiers and quality boosters. Once a style modifier has been added, solidifiers (using repetition) can be applied to any of the other types of modifiers. The textual prompt can be divided into two main components: the physical and factual content of the image, and the stylistic considerations in the way the physical content is displayed. <ref name="”2”"></ref><ref name="”7”">Witteveen, S and Andrews, M (2022). Investigating Prompt Engineering in Diffusion Models. arXiv:2211.15462v1 https://arxiv.org/pdf/2211.15462.pdf</ref>


To enhance the quality of the output images, it is common to include specific keywords before and after the image description following the formula prompt = [keyword1, . . . , keywordm−1] [description] [keywordm, . . . , keywordn]. For example, a user wanting to generate an image of a cat using a text-to-image model may use a specific prompt template that includes a description of a painting of a calico cat and keywords such as highly detailed, cinematic lighting, dramatic atmosphere, and others. This approach helps to provide additional information to the model and improve the generated image's quality. <ref name="”8”">Pavlichenko, N, Zhdanov and Ustalov, D (2022) Best Prompts for Text-to-Image Models and How to Find Them. arXiv:2209.11711v2</ref>
To enhance the quality of the output images, it is common to include specific keywords before and after the image description following the formula prompt = [keyword1, . . . , keywordm−1] [description] [keywordm, . . . , keywordn]. For example, a user wanting to generate an image of a cat using a text-to-image model may use a specific prompt template that includes a description of a painting of a calico cat and keywords such as highly detailed, cinematic lighting, dramatic atmosphere, and others. This approach helps to provide additional information to the model and improve the generated image's quality. <ref name="”8”">Pavlichenko, N, Zhdanov and Ustalov, D (2022) Best Prompts for Text-to-Image Models and How to Find Them. arXiv:2209.11711v2</ref>
Line 18: Line 19:


*'''Prompt engineering in Human-Computer Interaction (HCI):''' a research area that is gaining interest due to the increasing use of deep generative models by people without technical expertise. Social aspects of prompt engineering are important since text-to-image systems were trained on images and text scraped from the web. Prompt engineers need to predict how others described and reacted to the images posted on the web, making describing an image in detail often not enough. There are also dedicated communities that have recently emerged, adding another social aspect to prompt engineering.
*'''Prompt engineering in Human-Computer Interaction (HCI):''' a research area that is gaining interest due to the increasing use of deep generative models by people without technical expertise. Social aspects of prompt engineering are important since text-to-image systems were trained on images and text scraped from the web. Prompt engineers need to predict how others described and reacted to the images posted on the web, making describing an image in detail often not enough. There are also dedicated communities that have recently emerged, adding another social aspect to prompt engineering.
*'''Human-AI co-creation:''' Prompt writing is the central part of prompt engineering, but it is only a starting point in some practitioners' creative workflows. Novel creative practices are emerging, where practitioners develop complex workflows for creating their artworks.
* '''Human-AI co-creation:''' Prompt writing is the central part of prompt engineering, but it is only a starting point in some practitioners' creative workflows. Novel creative practices are emerging, where practitioners develop complex workflows for creating their artworks.
*'''Bias:''' an interesting area for future work is bias encoded in text-to-image generation systems.
*'''Bias:''' an interesting area for future work is bias encoded in text-to-image generation systems.
*'''Computational aesthetics and Human-AI alignment:''' Making computers evaluate and understand aesthetics is an old goal that has recently received renewed attention. Computational aesthetics and Human-AI alignment are areas of research that are being explored through neural image assessment and computational aesthetics. <ref name="”1”"></ref>
*'''Computational aesthetics and Human-AI alignment:''' Making computers evaluate and understand aesthetics is an old goal that has recently received renewed attention. Computational aesthetics and Human-AI alignment are areas of research that are being explored through neural image assessment and computational aesthetics. <ref name="”1”"></ref>
Line 44: Line 45:


*'''Capacity and Role:''' "Act as an expert on software development on the topic of machine learning frameworks, and an expert blog writer."
*'''Capacity and Role:''' "Act as an expert on software development on the topic of machine learning frameworks, and an expert blog writer."
*'''Insight:''' "The audience for this blog is technical professionals who are interested in learning about the latest advancements in machine learning."
* '''Insight:''' "The audience for this blog is technical professionals who are interested in learning about the latest advancements in machine learning."
*'''Statement:''' "Provide a comprehensive overview of the most popular machine learning frameworks, including their strengths and weaknesses. Include real-life examples and case studies to illustrate how these frameworks have been successfully used in various industries."
*'''Statement:''' "Provide a comprehensive overview of the most popular machine learning frameworks, including their strengths and weaknesses. Include real-life examples and case studies to illustrate how these frameworks have been successfully used in various industries."
*'''Personality:''' "When responding, use a mix of the writing styles of Andrej Karpathy, Francois Chollet, Jeremy Howard, and Yann LeCun."
*'''Personality:''' "When responding, use a mix of the writing styles of Andrej Karpathy, Francois Chollet, Jeremy Howard, and Yann LeCun."
*'''Experiment:''' "Give me multiple different examples." <ref name="”11”">Matt Nigh. ChatGPT3 Prompt Engineering. GitHub. https://github.com/mattnigh/ChatGPT3-Free-Prompt-List</ref>
* '''Experiment:''' "Give me multiple different examples." <ref name="”11”">Matt Nigh. ChatGPT3 Prompt Engineering. GitHub. https://github.com/mattnigh/ChatGPT3-Free-Prompt-List</ref>


The process of prompt refinement is a method to improve the quality of written content by transforming it into a compelling, imaginative, and relatable piece, fixing "soulless writing". The aim is to make the content engaging and impactful by focusing on storytelling, using persuasive language, emphasizing emotion and sensory details, making the content concise and highlighting key points. To create a sense of urgency and make the content relatable, the language can be personalized to the reader and potential objections can be addressed. <ref name="”11”"></ref>
The process of prompt refinement is a method to improve the quality of written content by transforming it into a compelling, imaginative, and relatable piece, fixing "soulless writing". The aim is to make the content engaging and impactful by focusing on storytelling, using persuasive language, emphasizing emotion and sensory details, making the content concise and highlighting key points. To create a sense of urgency and make the content relatable, the language can be personalized to the reader and potential objections can be addressed. <ref name="”11”"></ref>
Line 56: Line 57:


===Prompt Engineering for Code Generation Models===
===Prompt Engineering for Code Generation Models===
[[File:coding_model_diagram1.png|400px|right]]
[[File:Coding_model_diagram1.png|alt=Figure 2. Prompt to completion.|thumb|400x400px|Figure 2. From prompt to completion.]]
Genearte [[code]] using [[models]] like the [[OpenAI Codex]].
Genearte [[code]] using [[models]] like the [[OpenAI Codex]].


Line 70: Line 71:
However, sometimes the generated code may not be optimal, in which case you can provide more specific instructions such as importing libraries before using them. By combining a high-level task description with detailed user instructions, you can create a more effective prompt for coding model to generate code.
However, sometimes the generated code may not be optimal, in which case you can provide more specific instructions such as importing libraries before using them. By combining a high-level task description with detailed user instructions, you can create a more effective prompt for coding model to generate code.


====Examples====
==== Examples====
Gives the coding model examples. Imagine you prefer a unique style of writing Python code that differs from what model produces. Take, for instance, when adding two numbers, you prefer to label the arguments differently. The key to working with models like Codex is to clearly communicate what you want it to do. One effective way to do this is to provide examples for Codex to learn from and strive to match its output to your preferred style. If you give the model a longer prompt that includes the example mentioned, it will then name the arguments in the same manner as in the example.
Gives the coding model examples. Imagine you prefer a unique style of writing Python code that differs from what model produces. Take, for instance, when adding two numbers, you prefer to label the arguments differently. The key to working with models like Codex is to clearly communicate what you want it to do. One effective way to do this is to provide examples for Codex to learn from and strive to match its output to your preferred style. If you give the model a longer prompt that includes the example mentioned, it will then name the arguments in the same manner as in the example.


Line 93: Line 94:
*Edit and revise. Don't be afraid of revising and editing the generated text.
*Edit and revise. Don't be afraid of revising and editing the generated text.
*You can ask the chatbot for assistance. The chatbot will explain why it selected a specific detail or phrase in a reply. The chatbot can also help you create a better prompt. You can point out individual phrases and ask the chatbot for alternatives or suggestions.
*You can ask the chatbot for assistance. The chatbot will explain why it selected a specific detail or phrase in a reply. The chatbot can also help you create a better prompt. You can point out individual phrases and ask the chatbot for alternatives or suggestions.
====Template====
==== Template====
<blockquote>
<blockquote>
Describe ''YOUR SCENE''. Use sensory language and detail to describe the ''OBJECTS IN THE SCENE vividly''. Describe ''SPECIFIC DETAILS'' and any other sensory details that come to mind. Vary the sentence structure and use figurative language as appropriate. Avoid telling the reader how to feel or think about the scene.
Describe ''YOUR SCENE''. Use sensory language and detail to describe the ''OBJECTS IN THE SCENE vividly''. Describe ''SPECIFIC DETAILS'' and any other sensory details that come to mind. Vary the sentence structure and use figurative language as appropriate. Avoid telling the reader how to feel or think about the scene.
</blockquote>
</blockquote>


==Text-to-Image==
==Text-to-Image ==
 
[[File:11a. Without Unbundling.png|thumb|Figure 3a. Without unbundling. Prompt: Kobe Bryant shooting free throws, in the style of The Old Guitarist by Pablo Picasso, digital art. Source: DecentralizedCreator.]]
[[File:11b. With Unbundling.png|thumb|Figure 3b. With unbundling. Prompt: Kobe Bryant shooting free throws, The painting has a simple composition, with just three primary colors: red, blue and yellow. However, it is also packed with hidden meanings and visual complexities, digital art. Source: DecentralizedCreator.]]
[[File:4. Styles in Midjourney.png|thumb|Figure 4. Midjourney elements. Source: Mlearning.ai.]]
[[File:5. Midjourney Styles words.png|thumb|Figure 5. Different keywords for styles result in different outputs. Source: Mlearning.ai.]]
[[File:6. Rendering and lighting properties as style.png|thumb|Figure 6. Different lighting options. Source: Mlearning.ai.]]
[[File:7. Midjourney Chaos.png|thumb|Figure 7. Chaos option. Source. MLearning.ai.]]
Text prompts can be used to generate images using a text-to-image model, where words are used to describe an image and the model creates it accordingly. Emojis or single lines of text can also be used as prompts to get optimal results. However, the subject term is important to control the generation of digital images. <ref name="”1”"></ref><ref name="”12”">Zerkova, A (2022). How to Create Effective Prompts for AI Image Generation. Re-thought. https://re-thought.com/how-to-create-effective-prompts-for-ai-image-generation/</ref> In the online community for AI-generated art, templates for writing input prompts have emerged, such as the "Traveler's Guide to the Latent Space," which recommends specific prompt templates such as [Medium][Subject][Artist(s)][Details][Image repository support]. <ref name="”2”"></ref>
Text prompts can be used to generate images using a text-to-image model, where words are used to describe an image and the model creates it accordingly. Emojis or single lines of text can also be used as prompts to get optimal results. However, the subject term is important to control the generation of digital images. <ref name="”1”"></ref><ref name="”12”">Zerkova, A (2022). How to Create Effective Prompts for AI Image Generation. Re-thought. https://re-thought.com/how-to-create-effective-prompts-for-ai-image-generation/</ref> In the online community for AI-generated art, templates for writing input prompts have emerged, such as the "Traveler's Guide to the Latent Space," which recommends specific prompt templates such as [Medium][Subject][Artist(s)][Details][Image repository support]. <ref name="”2”"></ref>


Line 108: Line 114:
The prompt should contain a noun, adjective, and verb to create an interesting subject. A prompt with more than three words should be written to give the AI a clear context. Multiple adjectives should be used to infuse multiple feelings into the artwork. It is also recommended to include the name of the artist, which will mimic the style of that artist. Additionally, banned words by the AI generator should be avoided to prevent being banned. <ref name="”12”"></ref> The use of abstract words leads to more diverse results, while concrete words lead to all pictures showing the same concrete thing. For tokenization (the separation of a text into smaller units—tokens), commas, pipes, or double colons can be used as hard separators, but the direct impact of tokenization is not always clear. <ref name="”13”"></ref>
The prompt should contain a noun, adjective, and verb to create an interesting subject. A prompt with more than three words should be written to give the AI a clear context. Multiple adjectives should be used to infuse multiple feelings into the artwork. It is also recommended to include the name of the artist, which will mimic the style of that artist. Additionally, banned words by the AI generator should be avoided to prevent being banned. <ref name="”12”"></ref> The use of abstract words leads to more diverse results, while concrete words lead to all pictures showing the same concrete thing. For tokenization (the separation of a text into smaller units—tokens), commas, pipes, or double colons can be used as hard separators, but the direct impact of tokenization is not always clear. <ref name="”13”"></ref>


*'''Nouns:''' denotes the subject in a prompt. The generator will produce an image without a noun although not meaningfull. <ref name="”6”"></ref>
* '''Nouns:''' denotes the subject in a prompt. The generator will produce an image without a noun although not meaningfull. <ref name="”6”"></ref>
*'''Adjectives:''' can be used to try to convey an emotion or be used more technically (e.g. beautiful, magnificent, colorful, massive). <ref name="”6”"></ref>
*'''Adjectives:''' can be used to try to convey an emotion or be used more technically (e.g. beautiful, magnificent, colorful, massive). <ref name="”6”"></ref>  
*'''Artist names:''' the artstyle of the chosen artist will be included in the image generation. There is also an unbundling technique that proposes a “long description of a particular style of the artist’s various characteristics and components instead of just giving the artist names.” <ref name="”6”"></ref>
*'''Artist names:''' the artstyle of the chosen artist will be included in the image generation. There is also an unbundling technique (figure 3a and 3b) that proposes a “long description of a particular style of the artist’s various characteristics and components instead of just giving the artist names.” <ref name="”6”"></ref>  
*'''Style:''' instead of using the style of artists, the prompt can include keywords related to certain styles like “surrealism,” “fantasy,” “contemporary,” “pixel art”, etc. <ref name="”6”"></ref>
*'''Style:''' instead of using the style of artists, the prompt can include keywords related to certain styles like “surrealism,” “fantasy,” “contemporary,” “pixel art”, etc. <ref name="”6”"></ref>
*'''Computer graphics:''' keywords like “octane render,” “Unreal Engine,” or “Ray Tracing” can enhance the effectiveness and meaning of the artwork. <ref name="”6”"></ref>
*'''Computer graphics:''' keywords like “octane render,” “Unreal Engine,” or “Ray Tracing” can enhance the effectiveness and meaning of the artwork. <ref name="”6”"></ref>
*'''Quality:''' quality of the generated image (e.g. high, 4K, 8K). <ref name="”6”"></ref>
*'''Quality:''' quality of the generated image (e.g. high, 4K, 8K). <ref name="”6”"></ref>
*'''Art platform names:''' these keywords are another way to include styles. For example, “trending on Behance, “Weta Digital”, or “trending on artstation.” <ref name="”6”"></ref>
*'''Art platform names:''' these keywords are another way to include styles. For example, “trending on Behance, “Weta Digital”, or “trending on artstation.” <ref name="”6”"></ref>  
*'''Art medium:''' there is a multitude of art mediums that can be chosen to modify the AI-generated image like “pencil art,” “chalk art,” “ink art,” “watercolor,” “wood,” and others. <ref name="”6”"></ref>
*'''Art medium:''' there is a multitude of art mediums that can be chosen to modify the AI-generated image like “pencil art,” “chalk art,” “ink art,” “watercolor,” “wood,” and others. <ref name="”6”"></ref>
*'''Weight:''' To give a specific subject a higher weight in a prompt, there are several techniques available. Tokens near the beginning of a prompt carry more weight than those at the end. Repeating the subject by using different phrasing or multiple languages, or even using emojis, can also increase its weighting. In some generative models like [[Midjourney]], you can use parameters such as ::weight to assign a weight to specific parts of a prompt. <ref name="”13”"></ref>
*'''Weight:''' To give a specific subject a higher weight in a prompt, there are several techniques available. Tokens near the beginning of a prompt carry more weight than those at the end. Repeating the subject by using different phrasing or multiple languages, or even using emojis, can also increase its weighting. In some generative models like [[Midjourney]], you can use parameters such as ::weight to assign a weight to specific parts of a prompt. <ref name="”13”"></ref>  


In-depth lists with modifier prompts can be found [https://decentralizedcreator.com/write-good-prompts-for-ai-art-generators/ here] and [https://aesthetics.fandom.com/wiki/List%20of%20Aesthetics here].
In-depth lists with modifier prompts can be found [https://decentralizedcreator.com/write-good-prompts-for-ai-art-generators/ here] and [https://aesthetics.fandom.com/wiki/List%20of%20Aesthetics here].
Line 122: Line 128:
===Midjourney===
===Midjourney===


In [[Midjourney]], a very descriptive text will result in a more vibrant and unique output. <ref name="”14”">Nielsen, L (2022). An advanced guide to writing prompts for Midjourney ( text-to-image). Mlearning. https://medium.com/mlearning-ai/an-advanced-guide-to-writing-prompts-for-midjourney-text-to-image-aa12a1e33b6</ref> Prompt engineering for this [[AI image generator]] follows the same basic elements as all others but some keywords and options will be provided here that are known to work well with this system.
In [[Midjourney]], a very descriptive text will result in a more vibrant and unique output. <ref name="”14”">Nielsen, L (2022). An advanced guide to writing prompts for Midjourney ( text-to-image). Mlearning. https://medium.com/mlearning-ai/an-advanced-guide-to-writing-prompts-for-midjourney-text-to-image-aa12a1e33b6</ref> Prompt engineering for this [[AI image generator]] follows the same basic elements as all others (figure 4) but some keywords and options will be provided here that are known to work well with this system.


*'''Style:''' standard, pixar movie style, anime style, cyber punk style, steam punk style, waterhouse style, bloodborne style, grunge style. An artist’s name can also be used.
*'''Style:''' standard, pixar movie style, anime style, cyber punk style, steam punk style, waterhouse style, bloodborne style, grunge style (figure 5). An artist’s name can also be used.
*'''Rendering/lighting properties:''' volumetric lighting, octane render, softbox lighting, fairy lights, long exposure, cinematic lighting, glowing lights, and blue lighting.
*'''Rendering/lighting properties:''' volumetric lighting, octane render, softbox lighting, fairy lights, long exposure, cinematic lighting, glowing lights, and blue lighting (figure 6).
*'''Style setting:''' adding the command –s <number> after the prompt will increase or decrease the stylize option (e.g. /imagine firefighters --s 6000).
*'''Style setting:''' adding the command –s <number> after the prompt will increase or decrease the stylize option (e.g. /imagine firefighters --s 6000).
*'''Chaos:''' a setting to increase abstraction using the command /imagine prompt --chaos <a number from 0 to 100> (e.g. /imagine Eiffel tower --chaos 60).
*'''Chaos:''' a setting to increase abstraction (figure 7) using the command /imagine prompt --chaos <a number from 0 to 100> (e.g. /imagine Eiffel tower --chaos 60).
*Resolution: the resolution can be inserted in the prompt or using the standard commands --hd and --quality or --q <number>.
*Resolution: the resolution can be inserted in the prompt or using the standard commands --hd and --quality or --q <number>.
*'''Aspect ratio:''' the default aspect ratio is 1:1. This can be modified with the comman --ar <number: number> (e.g. /imagine jasmine in the wild flower --ar 4:3). A custom size image can also be specified using the command --w <number> --h <number> after the prompt.
*'''Aspect ratio:''' the default aspect ratio is 1:1. This can be modified with the comman --ar <number: number> (e.g. /imagine jasmine in the wild flower --ar 4:3). A custom size image can also be specified using the command --w <number> --h <number> after the prompt.
*'''Images as prompts:''' Midjourney allows the user to use images to get outputs similar to the one used. This can be done by inserting a URL of the image in the prompt (e.g. /imagine http://www.imgur.com/Im3424.jpg box full of chocolates). Multiple images can be used.
*'''Images as prompts:''' Midjourney allows the user to use images to get outputs similar to the one used. This can be done by inserting a URL of the image in the prompt (e.g. /imagine http://www.imgur.com/Im3424.jpg box full of chocolates). Multiple images can be used.
*'''Weight:''' increases or decreases the influence of a specific prompt keyword or image on the output. For text prompts, the command ::<number> should be used after the keywords according to their intended impact on the final image (e.g. /imagine wild animals tiger::2 zebra::4 lions::1.5).
* '''Weight:''' increases or decreases the influence of a specific prompt keyword or image on the output. For text prompts, the command ::<number> should be used after the keywords according to their intended impact on the final image (e.g. /imagine wild animals tiger::2 zebra::4 lions::1.5).
*'''Filter:''' to discard unwanted elements from appearing in the output use the --no <keyword> command (e.g./imagine KFC fried chicken --no sauce). <ref name="”14”"></ref>
*'''Filter:''' to discard unwanted elements from appearing in the output use the --no <keyword> command (e.g./imagine KFC fried chicken --no sauce). <ref name="”14”"></ref>


Line 147: Line 153:
Other user experiments can be accessed [https://strikingloo.github.io/DALL-E-2-prompt-guide here]. <ref name="”15”"></ref>
Other user experiments can be accessed [https://strikingloo.github.io/DALL-E-2-prompt-guide here]. <ref name="”15”"></ref>


===Stable Diffusion===
=== Stable Diffusion===


Overall, prompt engineering in [[Stable Diffusion]] doesn’t differ from other AI image-generating models. However, it should be noted that it also allows prompt weighting and negative prompting. <ref name="”16”">DreamStudio. Prompt guide. DreamStudio. https://beta.dreamstudio.ai/prompt-guide</ref>
Overall, prompt engineering in [[Stable Diffusion]] doesn’t differ from other AI image-generating models. However, it should be noted that it also allows prompt weighting and negative prompting. <ref name="”16”">DreamStudio. Prompt guide. DreamStudio. https://beta.dreamstudio.ai/prompt-guide</ref>


*'''Prompt weighting:''' varies between 1 and -1. Decimals can be used to reduce a prompt’s influence.
*'''Prompt weighting:''' varies between 1 and -1. Decimals can be used to reduce a prompt’s influence.
*'''Negative prompting:''' in DreamStudo negative prompts can be added by using | <negative prompt>: -1.0 (e.g. | disfigured, ugly:-1.0, too many fingers:-1.0). <ref name="”16”"></ref>
*'''Negative prompting:''' in DreamStudo negative prompts can be added by using | <negative prompt>: -1.0 (e.g. | disfigured, ugly:-1.0, too many fingers:-1.0). <ref name="”16”"></ref>  


===Jasper Art===
===Jasper Art===
Line 161: Line 167:


==Research on Prompt engineering==
==Research on Prompt engineering==
[[File:Promptist training overview.png|thumb|Figure 8. PROMPTIST training overview. Source: Hao et al. (2022)]]
[[File:Comparison between the results of the original user prompt-Hao et al.png|thumb|Figure 9. Comparison between the results of the original user prompt and the optimized prompt. Source: Hao et al. (2022)]]


===Automatic prompt engineering===
===Automatic prompt engineering===
Line 166: Line 174:
Hao et al. (2022) mention that the implementation of manual prompt engineering towards specific text-to-image models can be laborious and sometimes infeasible. The process of manually engineering prompts is often not transferrable between various model versions. Therefore, a systematic way to automatically align user intentions and various model-preferred prompts is necessary. To address this, a prompt adaptation framework for automatic prompt engineering via reinforcement learning was proposed. The method uses supervised fine-tuning on a small collection of manually engineered prompts to initialize the prompt policy network for reinforcement learning. The model is trained by exploring optimized prompts of user inputs, where the training objective is to maximize the reward, which is defined as a combination of relevance scores and aesthetic scores of generated images. The goal of the framework is to automatically perform prompt engineering that generates model-preferred prompts to obtain better output images while preserving the original intentions of the user. <ref name="”4”"></ref>
Hao et al. (2022) mention that the implementation of manual prompt engineering towards specific text-to-image models can be laborious and sometimes infeasible. The process of manually engineering prompts is often not transferrable between various model versions. Therefore, a systematic way to automatically align user intentions and various model-preferred prompts is necessary. To address this, a prompt adaptation framework for automatic prompt engineering via reinforcement learning was proposed. The method uses supervised fine-tuning on a small collection of manually engineered prompts to initialize the prompt policy network for reinforcement learning. The model is trained by exploring optimized prompts of user inputs, where the training objective is to maximize the reward, which is defined as a combination of relevance scores and aesthetic scores of generated images. The goal of the framework is to automatically perform prompt engineering that generates model-preferred prompts to obtain better output images while preserving the original intentions of the user. <ref name="”4”"></ref>


The resulting prompt optimization model, named PROMPTIST, is built upon a pretrained language model, such as GPT, and is flexible to align human intentions and model-favored languages. Optimized prompts can generate more aesthetically pleasing images. Experimental results show that the proposed method outperforms human prompt engineering and supervised fine-tuning in terms of automatic metrics and human evaluation. Although experiments are conducted on text-to-image models, the framework can be easily applied to other tasks. <ref name="”4”"></ref>
The resulting prompt optimization model, named PROMPTIST (figure 8), is built upon a pretrained language model, such as GPT, and is flexible to align human intentions and model-favored languages. Optimized prompts can generate more aesthetically pleasing images (figure 9). Experimental results show that the proposed method outperforms human prompt engineering and supervised fine-tuning in terms of automatic metrics and human evaluation. Although experiments are conducted on text-to-image models, the framework can be easily applied to other tasks. <ref name="”4”"></ref>


Jian et al. (2020) proposed two automatic methods to improve the quality and scope of prompts used for querying language models about the existence of a relation. The methods are inspired by previous relation extraction techniques and use either mining-based or paraphrasing-based approaches to generate diverse prompts that are semantically similar to a seed prompt. The authors also investigated lightweight ensemble methods that can combine the answers from different prompts to improve retrieval accuracy for different subject-object pairs. <ref name="”18”">Jiang, Z, Xu, FF, Araki, J and Neubig, G (2020). How Can We Know What Language Models Know? https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00324/96460/How-Can-We-Know-What-Language-Models-Know</ref>
Jian et al. (2020) proposed two automatic methods to improve the quality and scope of prompts used for querying language models about the existence of a relation. The methods are inspired by previous relation extraction techniques and use either mining-based or paraphrasing-based approaches to generate diverse prompts that are semantically similar to a seed prompt. The authors also investigated lightweight ensemble methods that can combine the answers from different prompts to improve retrieval accuracy for different subject-object pairs. <ref name="”18”">Jiang, Z, Xu, FF, Araki, J and Neubig, G (2020). How Can We Know What Language Models Know? https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00324/96460/How-Can-We-Know-What-Language-Models-Know</ref>


Their paper examined the importance of prompts for retrieving factual knowledge from language models and proposed the use of automated techniques to generate diverse and semantically similar prompts. By combining the different prompts, the research shows that factual knowledge retrieval accuracy can be improved by up to 8% compared to manually designed prompts. The proposed methods outperform the traditional manual prompt design approach and the use of the ensemble approach allows for greater flexibility and improved accuracy for different subject-object pairs. <ref name="”18”"></ref>
Their paper examined the importance of prompts for retrieving factual knowledge from language models and proposed the use of automated techniques to generate diverse and semantically similar prompts. By combining the different prompts, the research shows that factual knowledge retrieval accuracy can be improved by up to 8% compared to manually designed prompts. The proposed methods outperform the traditional manual prompt design approach and the use of the ensemble approach allows for greater flexibility and improved accuracy for different subject-object pairs. <ref name="”18”"></ref>
[[File:List of styles used in one of the experiments in Liu and Chilton (2021). Source- Liu and Chilton (2021)..png|thumb|Figure 10. List of styles used in one of the experiments in Liu and Chilton (2021). Source: Liu and Chilton (2021).]]


===Prompt variables===
===Prompt variables===
Line 178: Line 187:
Prompt engineering for text-to-image generative models is an emerging area of research. Previous studies have used text-to-image models to generate visual blends of concepts. BERT, a large language model, was utilized to help users generate prompts, and generations were evaluated using crowd-source workers on Mechanical Turk. Similar crowd-sourced approaches have been used in the past to evaluate machine-generated images for quality and coherence. <ref name="”3”"></ref>
Prompt engineering for text-to-image generative models is an emerging area of research. Previous studies have used text-to-image models to generate visual blends of concepts. BERT, a large language model, was utilized to help users generate prompts, and generations were evaluated using crowd-source workers on Mechanical Turk. Similar crowd-sourced approaches have been used in the past to evaluate machine-generated images for quality and coherence. <ref name="”3”"></ref>


The guidelines provided suggest:
The guidelines provided suggest:  


*Focusing on keywords during prompt engineering rather than rephrasings, as rephrasing does not have a significant impact on the quality of the generation.
*Focusing on keywords during prompt engineering rather than rephrasings, as rephrasing does not have a significant impact on the quality of the generation.
Line 184: Line 193:
*To speed up the iteration process, the user should choose lower lengths of optimization, as the number of iterations and length of optimization do not significantly impact user satisfaction with the generation.
*To speed up the iteration process, the user should choose lower lengths of optimization, as the number of iterations and length of optimization do not significantly impact user satisfaction with the generation.
*Users can experiment with a variety of artistic styles to manipulate the aesthetic of their generations, but should avoid style keywords with multiple meanings.
*Users can experiment with a variety of artistic styles to manipulate the aesthetic of their generations, but should avoid style keywords with multiple meanings.
*Choosing subjects and styles that complement each other at an elementary level, either by selecting subjects with forms or subparts that are easily interpreted by certain styles or by selecting highly relevant subjects for a given style.
*Choosing subjects and styles (figure 10) that complement each other at an elementary level, either by selecting subjects with forms or subparts that are easily interpreted by certain styles or by selecting highly relevant subjects for a given style.
*Considering the interaction between levels of abstraction for the subject and style, as they can lead to incompatible representations. <ref name="”3”"></ref>
*Considering the interaction between levels of abstraction for the subject and style, as they can lead to incompatible representations. <ref name="”3”"></ref>
[[File:Ranking of top-15 most important keywords.png|thumb|Figure 11. Ranking of top-15 most important keywords. Source: Pavlichenko et al. (2022)]]


===Prompt keyword combinations===
===Prompt keyword combinations===
Line 191: Line 201:
Pavlichenko et al. (2022) aimed to improve the aesthetic appeal of computer-generated images by developing a human-in-the-loop approach that involves human feedback to determine the most effective combination of prompt keywords. In combination with this, they used a genetic algorithm that learned the optimal prompt formulation and keyword combination for generating aesthetically pleasing images. <ref name="”8”"></ref>
Pavlichenko et al. (2022) aimed to improve the aesthetic appeal of computer-generated images by developing a human-in-the-loop approach that involves human feedback to determine the most effective combination of prompt keywords. In combination with this, they used a genetic algorithm that learned the optimal prompt formulation and keyword combination for generating aesthetically pleasing images. <ref name="”8”"></ref>


The study showed that adding prompt keywords can significantly enhance the quality of computer-generated images. However, the most commonly used keywords do not necessarily lead to the best-looking images. To determine the importance of different keywords, the authors trained a random forest regressor on sets of keywords and their metrics. They found that the most important keywords for generating aesthetically pleasing images were different from the most widely used ones. The approach presented in this paper can be applied to evaluate an arbitrary prompt template in various settings. <ref name="”8”"></ref>
The study showed that adding prompt keywords can significantly enhance the quality of computer-generated images. However, the most commonly used keywords do not necessarily lead to the best-looking images. To determine the importance of different keywords, the authors trained a random forest regressor on sets of keywords and their metrics. They found that the most important keywords for generating aesthetically pleasing images were different from the most widely used ones (figure 11). The approach presented in this paper can be applied to evaluate an arbitrary prompt template in various settings. <ref name="”8”"></ref>
[[File:Effect of different image modifiers.png|thumb|Figure 12. Effect of different image modifiers. Source: Witteveen and Andrews (2022).]]
[[File:Repeating words.png|thumb|Figure 13. Repeating words. Source: Witteveen and Andrews (2022).]]
[[File:Light modifiers.png|thumb|Figure 14. Light modifiers. Source: Witteveen and Andrews (2022).]]
[[File:Effect of styled by artist.png|thumb|Figure 15. Effect of styled by artist. Source: Witteveen and Andrews (2022).]]


===Prompt Modifiers===
===Prompt Modifiers===


Witteveen and Andrews (2022) presented an evaluation of the Stable Diffusion model with chosen metrics on over 15,000 image generations, using more than 2,000 prompt variations. The results revealed that different linguistic categories, such as adjectives, nouns, and proper nouns, have varying impacts on the generated images. Simple adjectives have a relatively small effect, whereas nouns can dramatically alter the images as they introduce new content beyond mere modifiers. The paper demonstrated that words and phrases can be categorized based on their impact on image generation, and this categorization can be applied to various types of models. While the effects of each word or phrase may vary depending on the model used, the evaluation process described can establish baselines for future model evaluations. <ref name="”7”"></ref>
Witteveen and Andrews (2022) presented an evaluation of the Stable Diffusion model with chosen metrics on over 15,000 image generations, using more than 2,000 prompt variations. The results revealed that different linguistic categories, such as adjectives, nouns, and proper nouns, have varying impacts on the generated images (figure 12). Simple adjectives have a relatively small effect, whereas nouns can dramatically alter the images as they introduce new content beyond mere modifiers. The paper demonstrated that words and phrases can be categorized based on their impact on image generation, and this categorization can be applied to various types of models. While the effects of each word or phrase may vary depending on the model used, the evaluation process described can establish baselines for future model evaluations. <ref name="”7”"></ref>


Creating a prompt to generate an image can be challenging. The authors propose starting with a clear noun-based statement that contains the main subject of the image. Then, to record what seeds are effective, look for artists and key styles to emulate and add that to the prompt, and experiment with descriptors such as adding lighting effects phrases and repeating words. <ref name="”7”"></ref>
Creating a prompt to generate an image can be challenging. The authors propose starting with a clear noun-based statement that contains the main subject of the image. Then, to record what seeds are effective, look for artists and key styles to emulate and add that to the prompt, and experiment with descriptors such as adding lighting effects phrases and repeating words. <ref name="”7”"></ref>


'''Repeating Words.''' A technique to enhance prompts involves repeating words. The researchers examined repeating modifiers from the descriptor class to compare the effects of having the modifier once versus repeating it two, three, and five times. Repetition has been found to remove details from the background, and eventually, with five occurrences of the word, it affects the actual subject of the image. However, multiple occurrences of a word may not necessarily have the desired semantic effect that the word is expected to contribute. <ref name="”7”"></ref>
'''Repeating Words.''' A technique to enhance prompts involves repeating words. The researchers examined repeating modifiers from the descriptor class to compare the effects of having the modifier once versus repeating it two, three, and five times. Repetition has been found to remove details from the background, and eventually, with five occurrences of the word, it affects the actual subject of the image (figure 13). However, multiple occurrences of a word may not necessarily have the desired semantic effect that the word is expected to contribute. <ref name="”7”"></ref>


'''Adding "Lighting" Words.''' Words and phrases that describe lighting effects have unique properties. They can act as descriptors, which do not significantly change generated images, or as nouns, which make larger changes in the actual content of the image. Phrases such as "ambient lighting" can change the content significantly, whereas a phrase like "beautiful volumetric lighting" has relatively little impact on the generated image. Lighting phrases can alter the look of the subject, the mood of the image, and the background of the image. <ref name="”7”"></ref>
'''Adding "Lighting" Words.''' Words and phrases that describe lighting effects have unique properties. They can act as descriptors, which do not significantly change generated images, or as nouns, which make larger changes in the actual content of the image. Phrases such as "ambient lighting" can change the content significantly, whereas a phrase like "beautiful volumetric lighting" has relatively little impact on the generated image (figure 14). Lighting phrases can alter the look of the subject, the mood of the image, and the background of the image. <ref name="”7”"></ref>


'''Styled by Artist.''' Adding the prompt "in the style of" with an artist's name to the original prompt can lead to changes in image generation on multiple levels, such as the art medium, the color palette, and the racial qualities of the subject. <ref name="”7”"></ref>
'''Styled by Artist.''' Adding the prompt "in the style of" with an artist's name to the original prompt can lead to changes in image generation on multiple levels, such as the art medium, the color palette, and the racial qualities of the subject (figure 15). <ref name="”7”"></ref>


Finally, Oppenlaender (2022) noted that text-to-image art practitioners uses six different types of prompt modifiers to create images of specific subjects in different styles and qualities. These six types of prompt modifiers are subject terms, image prompts, style modifiers, quality boosters, repetitions, and magic terms. <ref name="”2”"></ref>
Finally, Oppenlaender (2022) noted that text-to-image art practitioners uses six different types of prompt modifiers to create images of specific subjects in different styles and qualities. These six types of prompt modifiers are subject terms, image prompts, style modifiers, quality boosters, repetitions, and magic terms. <ref name="”2”"></ref>
Line 212: Line 226:


==Overview of Tones==
==Overview of Tones==
===Suggested Tones===
===Suggested Tones ===
*'''[[Authoritative]]''' - confident, knowledgeable,
*'''[[Authoritative]]''' - confident, knowledgeable,
*'''[[Casual]]''' - relaxed, friendly, playful
*'''[[Casual]]''' - relaxed, friendly, playful
*'''[[Conversational]]''' - conversational, engaging,
*'''[[Conversational]]''' - conversational, engaging,
*'''[[Empathetic]]''' - understanding, caring
*'''[[Empathetic]]''' - understanding, caring
*'''[[Enthusiastic]]''' - enthusiastic, optimistic
* '''[[Enthusiastic]]''' - enthusiastic, optimistic
*'''[[Expert]]''' - authoritative, respected
*'''[[Expert]]''' - authoritative, respected
*'''[[Friendly]]''' - warm, approachable
* '''[[Friendly]]''' - warm, approachable
*'''[[Funny]]''' - humorous, entertaining
*'''[[Funny]]''' - humorous, entertaining
*'''[[Humorous]]''' - entertaining, playful,
*'''[[Humorous]]''' - entertaining, playful,
Line 240: Line 254:
*'''[[Informal]]''' and '''[[Humorous]]''' - social media posts, blog content, internal communication
*'''[[Informal]]''' and '''[[Humorous]]''' - social media posts, blog content, internal communication
*'''[[Informative]]''' and '''[[Authoritative]]''' - thought leadership articles, industry reports
*'''[[Informative]]''' and '''[[Authoritative]]''' - thought leadership articles, industry reports
*'''[[Persuasive]]''' and '''[[Urgent]]''' - limited-time offers, promotional campaigns
* '''[[Persuasive]]''' and '''[[Urgent]]''' - limited-time offers, promotional campaigns
*'''[[Professional]]''' and '''[[Authoritative]]''' - executive communication, industry presentation, boarding meeting
*'''[[Professional]]''' and '''[[Authoritative]]''' - executive communication, industry presentation, boarding meeting
*'''[[Professional]]''' and '''[[Friendly]]''' - sales emails, customer service, marketing copy
*'''[[Professional]]''' and '''[[Friendly]]''' - sales emails, customer service, marketing copy
Line 246: Line 260:
*'''[[Trustworthy]]''' and '''[[Professional]]''' - business proposals, executive summaries, investor pitches
*'''[[Trustworthy]]''' and '''[[Professional]]''' - business proposals, executive summaries, investor pitches


==Parameters==
== Parameters==
===Common Parameters===
===Common Parameters===
====Temperature====
====Perplexity====


====Burstiness====
* Temperature
* Perplexity
* Burstiness


===User-created Parameters===
===User-created Parameters ===
====Introduction====
====Introduction====
These are user-created parameters. They serve to convey the intent of the users in a more concise way. These are not part of the [[model]] API but patterns the [[LLM]] has picked up through its [[training]]. These parameters are just a compact way to deliver what is usually expressed in [[natural language]].
These are user-created parameters. They serve to convey the intent of the users in a more concise way. These are not part of the [[model]] API but patterns the [[LLM]] has picked up through its [[training]]. These parameters are just a compact way to deliver what is usually expressed in [[natural language]].
Line 274: Line 287:
*[[Professionalism]] -
*[[Professionalism]] -
*[[Randomness]] -
*[[Randomness]] -
*[[Sentimentality]] -
* [[Sentimentality]] -
*[[Sesquipedalianism]] -
*[[Sesquipedalianism]] -
*[[Sarcasm]] -
*[[Sarcasm]] -
100

edits