Prompt engineering: Difference between revisions
No edit summary |
|||
Line 13: | Line 13: | ||
[[File:Prompt writing elements.png|thumb|Figure 1. Prompt writing elements. Source: Oppenlaender (2022)]] | [[File:Prompt writing elements.png|thumb|Figure 1. Prompt writing elements. Source: Oppenlaender (2022)]] | ||
A prompt usually includes a subject term, while any other parts of the prompt are optional (figure 1). However, modifiers are often added to improve the resulting images and provide more control over the creation process. These modifiers are applied through experimentation or based on best practices learned from experience or online resources. <ref name="”2”"></ref> Modifiers can either alter the style of the generated image, for example, or boost its quality. There can be overlapping effects between style modifiers and quality boosters. Once a style modifier has been added, solidifiers (using repetition) can be applied to any of the other types of modifiers. The textual prompt can be divided into two main components: the physical and factual content of the image, and the stylistic considerations in the way the physical content is displayed. <ref name="”2”"></ref><ref name="”7”">Witteveen, S and Andrews, M (2022). Investigating Prompt Engineering in Diffusion Models. arXiv:2211.15462v1 https://arxiv.org/pdf/2211.15462.pdf</ref> | A [[prompt]] usually includes a subject term, while any other parts of the prompt are optional (figure 1). However, [[modifiers]] are often added to improve the resulting images and provide more control over the creation process. These modifiers are applied through experimentation or based on best practices learned from experience or online resources. <ref name="”2”"></ref> Modifiers can either alter the style of the generated image, for example, or boost its quality. There can be overlapping effects between style modifiers and quality boosters. Once a style modifier has been added, solidifiers (using repetition) can be applied to any of the other types of modifiers. The textual prompt can be divided into two main components: the physical and factual content of the image, and the stylistic considerations in the way the physical content is displayed. <ref name="”2”"></ref><ref name="”7”">Witteveen, S and Andrews, M (2022). Investigating Prompt Engineering in Diffusion Models. arXiv:2211.15462v1 https://arxiv.org/pdf/2211.15462.pdf</ref> | ||
To enhance the quality of the output images, it is common to include specific keywords before and after the image description following the formula prompt = [keyword1, . . . , keywordm−1] [description] [keywordm, . . . , keywordn]. For example, a user wanting to generate an image of a cat using a text-to-image model may use a specific prompt template that includes a description of a painting of a calico cat and keywords such as highly detailed, cinematic lighting, dramatic atmosphere, and others. This approach helps to provide additional information to the model and improve the generated image's quality. <ref name="”8”">Pavlichenko, N, Zhdanov and Ustalov, D (2022) Best Prompts for Text-to-Image Models and How to Find Them. arXiv:2209.11711v2</ref> | To enhance the quality of the output images, it is common to include specific keywords before and after the image description following the formula prompt = [keyword1, . . . , keywordm−1] [description] [keywordm, . . . , keywordn]. For example, a user wanting to generate an image of a cat using a text-to-image model may use a specific prompt template that includes a description of a painting of a calico cat and keywords such as highly detailed, cinematic lighting, dramatic atmosphere, and others. This approach helps to provide additional information to the model and improve the generated image's quality. <ref name="”8”">Pavlichenko, N, Zhdanov and Ustalov, D (2022) Best Prompts for Text-to-Image Models and How to Find Them. arXiv:2209.11711v2</ref> |
Revision as of 12:00, 2 April 2023
Introduction
Prompt engineering is an emerging research area within Human-Computer Interaction (HCI) that involves the formal search for prompts to produce desired outcomes from AI models. This process involves selecting and composing sentences to achieve a certain result, such as a specific visual style in text-to-image models or a different tone in the response of a text-to-text one. Unlike the hard sciences of STEM fields, this is an evolving technique based on trial and error to produce effective AI outcomes. [1] [2] [3] Prompt engineers serve as translators between "human language" and "AI language," transforming an idea into words that the AI model can comprehend. [1]
The process of prompt engineering is similar to a conversation with the generative system, with practitioners adapting and refining prompts to improve outcomes. [2] It has emerged as a new form of interaction with models that have learned complex abstractions from consuming large amounts of data from the internet. These models have metalearning capabilities and can adapt their abstractions on the fly to fit new tasks, making it necessary to prompt them with specific knowledge and abstractions to perform well on new tasks. The term "prompt engineering" was coined by Gwern (writer and technologist), who evaluated GPT3's capabilities on creative fiction and suggested that a new course of interaction would be to figure out how to prompt the model to elicit specific knowledge and abstractions. [3]
In order to get the best results from these large and powerful generative models, prompt engineering is a critical skill that users must possess. Adding certain keywords and phrases to the textual input prompts known as "prompt modifiers" can improve the aesthetic qualities and subjective attractiveness of the generated images, for example. The process of prompt engineering is iterative and experimental in nature, where practitioners formulate prompts as probes into the generative models' latent space. There are various resources and guides available to novices to help them write effective input prompts for text-to-image generation systems, however, prompt engineering is still an emerging practice that requires extensive experimentation and trial and error. [1][2][3]
Manual prompt engineering is laborious, it may be infeasible in some situations, and the prompt results may vary between various model versions. [4] However, there have been developments in automated prompt generation which rephrases the input, making it more model-friendly. [5]
Therefore, this field is important for the generation of high-quality AI-generated outputs. Text-to-image models, in particular, face limitations in their text encoders, making prompt design even more crucial to produce aesthetically pleasing images with current models. [4] These models work based on caption matching techniques and are pre-trained using millions of text-image datasets. While a result will be generated for any prompt, the quality of the artwork is directly proportional to the quality of the prompt. [6]
Basic prompt structure
A prompt usually includes a subject term, while any other parts of the prompt are optional (figure 1). However, modifiers are often added to improve the resulting images and provide more control over the creation process. These modifiers are applied through experimentation or based on best practices learned from experience or online resources. [2] Modifiers can either alter the style of the generated image, for example, or boost its quality. There can be overlapping effects between style modifiers and quality boosters. Once a style modifier has been added, solidifiers (using repetition) can be applied to any of the other types of modifiers. The textual prompt can be divided into two main components: the physical and factual content of the image, and the stylistic considerations in the way the physical content is displayed. [2][7]
To enhance the quality of the output images, it is common to include specific keywords before and after the image description following the formula prompt = [keyword1, . . . , keywordm−1] [description] [keywordm, . . . , keywordn]. For example, a user wanting to generate an image of a cat using a text-to-image model may use a specific prompt template that includes a description of a painting of a calico cat and keywords such as highly detailed, cinematic lighting, dramatic atmosphere, and others. This approach helps to provide additional information to the model and improve the generated image's quality. [8]
According to Oppenlaender (2022), there are several opportunities for future research on this field of study:
- Prompt engineering in Human-Computer Interaction (HCI): a research area that is gaining interest due to the increasing use of deep generative models by people without technical expertise. Social aspects of prompt engineering are important since text-to-image systems were trained on images and text scraped from the web. Prompt engineers need to predict how others described and reacted to the images posted on the web, making describing an image in detail often not enough. There are also dedicated communities that have recently emerged, adding another social aspect to prompt engineering.
- Human-AI co-creation: Prompt writing is the central part of prompt engineering, but it is only a starting point in some practitioners' creative workflows. Novel creative practices are emerging, where practitioners develop complex workflows for creating their artworks.
- Bias: an interesting area for future work is bias encoded in text-to-image generation systems.
- Computational aesthetics and Human-AI alignment: Making computers evaluate and understand aesthetics is an old goal that has recently received renewed attention. Computational aesthetics and Human-AI alignment are areas of research that are being explored through neural image assessment and computational aesthetics. [1]
Text-to-text models
Prompt engineering is not limited to text-to-image generation and has found a fitting application in AI-generated art. Various templates and "recipes" have been created to optimize the process of providing the most effective textual inputs to the model. OpenAI has published such "recipes" for their language model that can be adapted to different downstream tasks, including grammar correction, text summarization, answering questions, generating product names, and functioning as a chatbot. [2]
In language models like GPT, the output quality is influenced by a combination of prompt design, sample data, and temperature (a parameter that controls the “creativity” of the responses). Furthermore, to properly design a prompt the user has to have a good understanding of the problem, good grammar skill, and produce many iterations. [9]
Therefore, to create a good prompt it’s necessary to be attentive to the following elements:
- The problem: the user needs to know clearly what he wants the generative model to do and its context. [9][10] For example, the AI can change the writing style of the output ("write a professional but friendly email" or "write a formal executive summary."). [10] Since the AI understands natural language, the user can think of the generative model as a human assistant. Therefore, thinking “how would I describe the problem to my assistant who haven’t done this task before?” may provide some help in defining clearly the problem and context. [9]
- Grammar check: simple and clear terms. Avoid subtle meaning and complex sentences with predicates. Write short sentences with specifics at the end of the prompt. Different conversation styles can be achieved with the use of adjectives. [9]
- Sample data: the AI may need information to perform the task that is being asked of it. This can be a text for paraphrasing or a copy of a resume or LinkedIn profile, for example. [10] It’s important that the data provided is coherent with the prompt. [9]
- Temperature: a parameter that influences how “creative” the response will be. For creative work, temperature should be high (e.g. .9) while for strict factual responses a temperature of zero is better. [9]
- Test and iterate: test different combinations of the elements of the prompt. [9]
Besides this, a prompt can also have other elements such as the desired length of the response, the output format (GPT-3 can output various code languages, charts, and CSVs), and specific phrases that users have discovered that work well to achieve specific outcomes (e.g. “Let's think step by step,” “thinking backwards,” or “in the style of [famous person]”). [10]
Prompt Engineering or ChatGPT should be avoided in certain scenarios. Firstly, when 100% reliability is required. Secondly, when the accuracy of the model's output cannot be evaluated. Finally, when generating content that is not in the model's training data, these techniques may not be the best approach to use. [11]
Building Prompts
In a text-to-text model, the user can insert diferent parameters in the prompt to modulate its response. The following parameter and prompt examples are taken from Matt Night's GitHub:
- Capacity and Role: "Act as an expert on software development on the topic of machine learning frameworks, and an expert blog writer."
- Insight: "The audience for this blog is technical professionals who are interested in learning about the latest advancements in machine learning."
- Statement: "Provide a comprehensive overview of the most popular machine learning frameworks, including their strengths and weaknesses. Include real-life examples and case studies to illustrate how these frameworks have been successfully used in various industries."
- Personality: "When responding, use a mix of the writing styles of Andrej Karpathy, Francois Chollet, Jeremy Howard, and Yann LeCun."
- Experiment: "Give me multiple different examples." [11]
The process of prompt refinement is a method to improve the quality of written content by transforming it into a compelling, imaginative, and relatable piece, fixing "soulless writing". The aim is to make the content engaging and impactful by focusing on storytelling, using persuasive language, emphasizing emotion and sensory details, making the content concise and highlighting key points. To create a sense of urgency and make the content relatable, the language can be personalized to the reader and potential objections can be addressed. [11]
To increase the readability, there are several strategies that can be employed in the prompt. First, it's important to request answers with a clear and concise language. Additionally, visual aids such as diagrams can be requested (e.g "Using mermaid.js you can include diagrams to illustrate complex concepts (low reliability)") Asking the AI to use of headings and subheadings is also recommended to divide the document into sections with clear organization. Important information should be emphasized using bold or italic text, and real-life examples such as case studies or real-world examples can be included to make concepts more relatable. Consistent formatting, including a consistent font, font size, and layout, should be used throughout the document. Analogies or comparisons can be employed to explain complex ideas, and asking for writing in active voice can make sentences more engaging and easier to follow. [11]
The model can also be asked to act as a technical advisor, mentor, quality assurance, code reviewer, debugging assistant, compliance checker, code optimization specialist, accessibility expert, search engine optimization specialist, and performance analyst. Examples of prompts for the use cases are available here.
Prompt Engineering for Code Generation Models
Genearte code using models like the OpenAI Codex.
- Describe the task - tell the coding model what you want it to do at a high level.
- Describe the context - describe background information like API hints and database schema to help the model understand the task.
- Show examples - show the model examples of what you want.
Task
Give the coding model a high-level task description. To improve the quality of the generated code, it's recommended to start the prompt with a broad description of the task at hand. For example, if you want to generate Python code to plot data from a standard dataset, you can provide a prompt like this:
# Load iris data from scikit-learn datasets and plot the training data.
However, sometimes the generated code may not be optimal, in which case you can provide more specific instructions such as importing libraries before using them. By combining a high-level task description with detailed user instructions, you can create a more effective prompt for coding model to generate code.
Examples
Gives the coding model examples. Imagine you prefer a unique style of writing Python code that differs from what model produces. Take, for instance, when adding two numbers, you prefer to label the arguments differently. The key to working with models like Codex is to clearly communicate what you want it to do. One effective way to do this is to provide examples for Codex to learn from and strive to match its output to your preferred style. If you give the model a longer prompt that includes the example mentioned, it will then name the arguments in the same manner as in the example.
See also zero shot, one shot and few shot learning
Context
If you want to use a library that the coding model is not familiar with, you can guide it by describing the API library beforehand.
For instance, the Minecraft Codex sample uses the Simulated Player API in TypeScript to control a character in the game. Since this is a newer API that the model does not know about yet, When given the prompt, the model attempts to make an educated guess based on the terms "bot" and "Simulated Player". However, the resulting code is not correct.
To correct this, you can show the model model the API definition, including function signatures and examples, so that it can generate code that follows the API correctly. As demonstrated in the example, by providing high-level context in the form of the API definition and examples, the model can understand what you want it to do and generate more accurate code.
How to Create Descriptive, Poetic Text
Tips
- Choose a topic and narrow down the scope.
- Select a point-of-view like third, second or first person.
- Directly or indirectly convey a mood. A subject or scene could evoke a particular feeling or you could give the chatbot a mood directly.
- Describe sensory details. Add details about the scene such as sounds, sights, smells, or textures. By pointing out an important detail, you can guide the output.
- Don't tell, Show. Ask the chatbot not to tell the user how to think or feel.
- Use figurative language. The chatbot should be encouraged to use metaphors, similes and descriptive phrases. Request a description that is evocative, lyrical, beautiful or poetic.
- Iterate and iterate. Your first prompt might not yield the desired result. Rework the prompt until you find an appealing answer. After you have created a prompt that is appealing, the chatbot can create many descriptions and you can pick the one you like.
- Edit and revise. Don't be afraid of revising and editing the generated text.
- You can ask the chatbot for assistance. The chatbot will explain why it selected a specific detail or phrase in a reply. The chatbot can also help you create a better prompt. You can point out individual phrases and ask the chatbot for alternatives or suggestions.
Template
Describe YOUR SCENE. Use sensory language and detail to describe the OBJECTS IN THE SCENE vividly. Describe SPECIFIC DETAILS and any other sensory details that come to mind. Vary the sentence structure and use figurative language as appropriate. Avoid telling the reader how to feel or think about the scene.
Text-to-Image
Text prompts can be used to generate images using a text-to-image model, where words are used to describe an image and the model creates it accordingly. Emojis or single lines of text can also be used as prompts to get optimal results. However, the subject term is important to control the generation of digital images. [1][12] In the online community for AI-generated art, templates for writing input prompts have emerged, such as the "Traveler's Guide to the Latent Space," which recommends specific prompt templates such as [Medium][Subject][Artist(s)][Details][Image repository support]. [2]
Prompt anatomy
When designing a prompt for text-to-image generative models, it is important to keep in mind that the tips provided may not apply to all models. The prompt length should be kept short, with prompts for DALL·E2 having a recommended maximum of 400 characters and prompts for Midjourney staying under 60 words. English is the best language to use statistically, but Stable Diffusion can handle other languages and even emojis. However, using non-English prompts may result in failures. [13]
The prompt should contain a noun, adjective, and verb to create an interesting subject. A prompt with more than three words should be written to give the AI a clear context. Multiple adjectives should be used to infuse multiple feelings into the artwork. It is also recommended to include the name of the artist, which will mimic the style of that artist. Additionally, banned words by the AI generator should be avoided to prevent being banned. [12] The use of abstract words leads to more diverse results, while concrete words lead to all pictures showing the same concrete thing. For tokenization (the separation of a text into smaller units—tokens), commas, pipes, or double colons can be used as hard separators, but the direct impact of tokenization is not always clear. [13]
- Nouns: denotes the subject in a prompt. The generator will produce an image without a noun although not meaningfull. [6]
- Adjectives: can be used to try to convey an emotion or be used more technically (e.g. beautiful, magnificent, colorful, massive). [6]
- Artist names: the artstyle of the chosen artist will be included in the image generation. There is also an unbundling technique (figure 3a and 3b) that proposes a “long description of a particular style of the artist’s various characteristics and components instead of just giving the artist names.” [6]
- Style: instead of using the style of artists, the prompt can include keywords related to certain styles like “surrealism,” “fantasy,” “contemporary,” “pixel art”, etc. [6]
- Computer graphics: keywords like “octane render,” “Unreal Engine,” or “Ray Tracing” can enhance the effectiveness and meaning of the artwork. [6]
- Quality: quality of the generated image (e.g. high, 4K, 8K). [6]
- Art platform names: these keywords are another way to include styles. For example, “trending on Behance, “Weta Digital”, or “trending on artstation.” [6]
- Art medium: there is a multitude of art mediums that can be chosen to modify the AI-generated image like “pencil art,” “chalk art,” “ink art,” “watercolor,” “wood,” and others. [6]
- Weight: To give a specific subject a higher weight in a prompt, there are several techniques available. Tokens near the beginning of a prompt carry more weight than those at the end. Repeating the subject by using different phrasing or multiple languages, or even using emojis, can also increase its weighting. In some generative models like Midjourney, you can use parameters such as ::weight to assign a weight to specific parts of a prompt. [13]
In-depth lists with modifier prompts can be found here and here.
Midjourney
In Midjourney, a very descriptive text will result in a more vibrant and unique output. [14] Prompt engineering for this AI image generator follows the same basic elements as all others (figure 4) but some keywords and options will be provided here that are known to work well with this system.
- Style: standard, pixar movie style, anime style, cyber punk style, steam punk style, waterhouse style, bloodborne style, grunge style (figure 5). An artist’s name can also be used.
- Rendering/lighting properties: volumetric lighting, octane render, softbox lighting, fairy lights, long exposure, cinematic lighting, glowing lights, and blue lighting (figure 6).
- Style setting: adding the command –s <number> after the prompt will increase or decrease the stylize option (e.g. /imagine firefighters --s 6000).
- Chaos: a setting to increase abstraction (figure 7) using the command /imagine prompt --chaos <a number from 0 to 100> (e.g. /imagine Eiffel tower --chaos 60).
- Resolution: the resolution can be inserted in the prompt or using the standard commands --hd and --quality or --q <number>.
- Aspect ratio: the default aspect ratio is 1:1. This can be modified with the comman --ar <number: number> (e.g. /imagine jasmine in the wild flower --ar 4:3). A custom size image can also be specified using the command --w <number> --h <number> after the prompt.
- Images as prompts: Midjourney allows the user to use images to get outputs similar to the one used. This can be done by inserting a URL of the image in the prompt (e.g. /imagine http://www.imgur.com/Im3424.jpg box full of chocolates). Multiple images can be used.
- Weight: increases or decreases the influence of a specific prompt keyword or image on the output. For text prompts, the command ::<number> should be used after the keywords according to their intended impact on the final image (e.g. /imagine wild animals tiger::2 zebra::4 lions::1.5).
- Filter: to discard unwanted elements from appearing in the output use the --no <keyword> command (e.g./imagine KFC fried chicken --no sauce). [14]
DALL-E
For DALL-E, a tip is to write adjectives plus nouns instead of verbs or complex scenes. To this, the user can add keywords like “gorgeous,” “amazing,” and “beautiful,” plus “digital painting,” “oil painting”, etc., and “unreal engine,” or “unity engine.” [15] Other templates can be used that work well with this model:
- A photograph of X, 4k, detailed.
- Pixar style 3D render of X.
- Subdivision control mesh of X.
- Low-poly render of X; high resolution, 4k.
- A digital illustration of X, 4k, detailed, trending in artstation, fantasy vivid colors.
Other user experiments can be accessed here. [15]
Stable Diffusion
Overall, prompt engineering in Stable Diffusion doesn’t differ from other AI image-generating models. However, it should be noted that it also allows prompt weighting and negative prompting. [16]
- Prompt weighting: varies between 1 and -1. Decimals can be used to reduce a prompt’s influence.
- Negative prompting: in DreamStudo negative prompts can be added by using | <negative prompt>: -1.0 (e.g. | disfigured, ugly:-1.0, too many fingers:-1.0). [16]
Jasper Art
Jasper Art is similar to DALL-E 2 but results are different since Jasper gives priority to Natural Language Processing (NLP), being able to handle complex sentences with semantic articulation. [17]
There has been some experimentation with narrative prompts, an alternative to the combinations of keywords in a prompt, using instead more expressive descriptions. [17] For example, instead of using “tiny lion cub, 8k, kawaii, adorable eyes, pixar style, winter snowflakes, wind, dramatic lighting, pose, full body, adventure, fantasy, renderman, concept art, octane render, artgerm,” convert it to a sentence as if painting with words like, “Lion cub, small but mighty, with eyes that seem to pierce your soul. In a winter wonderland, he stands tall against the snow, wind ruffling his fur. He seems almost like a creature of legend, ready for an adventure. The lighting is dramatic and striking, and the render is breathtakingly beautiful.” [17]
Research on Prompt engineering
Automatic prompt engineering
Hao et al. (2022) mention that the implementation of manual prompt engineering towards specific text-to-image models can be laborious and sometimes infeasible. The process of manually engineering prompts is often not transferrable between various model versions. Therefore, a systematic way to automatically align user intentions and various model-preferred prompts is necessary. To address this, a prompt adaptation framework for automatic prompt engineering via reinforcement learning was proposed. The method uses supervised fine-tuning on a small collection of manually engineered prompts to initialize the prompt policy network for reinforcement learning. The model is trained by exploring optimized prompts of user inputs, where the training objective is to maximize the reward, which is defined as a combination of relevance scores and aesthetic scores of generated images. The goal of the framework is to automatically perform prompt engineering that generates model-preferred prompts to obtain better output images while preserving the original intentions of the user. [4]
The resulting prompt optimization model, named PROMPTIST (figure 8), is built upon a pretrained language model, such as GPT, and is flexible to align human intentions and model-favored languages. Optimized prompts can generate more aesthetically pleasing images (figure 9). Experimental results show that the proposed method outperforms human prompt engineering and supervised fine-tuning in terms of automatic metrics and human evaluation. Although experiments are conducted on text-to-image models, the framework can be easily applied to other tasks. [4]
Jian et al. (2020) proposed two automatic methods to improve the quality and scope of prompts used for querying language models about the existence of a relation. The methods are inspired by previous relation extraction techniques and use either mining-based or paraphrasing-based approaches to generate diverse prompts that are semantically similar to a seed prompt. The authors also investigated lightweight ensemble methods that can combine the answers from different prompts to improve retrieval accuracy for different subject-object pairs. [18]
Their paper examined the importance of prompts for retrieving factual knowledge from language models and proposed the use of automated techniques to generate diverse and semantically similar prompts. By combining the different prompts, the research shows that factual knowledge retrieval accuracy can be improved by up to 8% compared to manually designed prompts. The proposed methods outperform the traditional manual prompt design approach and the use of the ensemble approach allows for greater flexibility and improved accuracy for different subject-object pairs. [18]
Prompt variables
Liu and Chilton (2021) explored the challenges associated with generating coherent outputs using text-to-image generative models. The free-form nature of text interaction can lead to poor result quality, necessitating brute-force trial and error. The research systematically investigated various variables involved in prompt engineering for text-to-image generative models. It examined different parameters, such as prompt keywords, random seeds, and the length of iterations. It also explores the use of subject and style as dimensions in structuring prompts. Furthermore, they analyzed how the abstract nature of a subject or style can impact generation quality. The results of the study are presented as design guidelines to help users prompt text-to-image models for better outcomes. [3]
Prompt engineering for text-to-image generative models is an emerging area of research. Previous studies have used text-to-image models to generate visual blends of concepts. BERT, a large language model, was utilized to help users generate prompts, and generations were evaluated using crowd-source workers on Mechanical Turk. Similar crowd-sourced approaches have been used in the past to evaluate machine-generated images for quality and coherence. [3]
The guidelines provided suggest:
- Focusing on keywords during prompt engineering rather than rephrasings, as rephrasing does not have a significant impact on the quality of the generation.
- Trying multiple generations to collect a representative idea of what prompts may return.
- To speed up the iteration process, the user should choose lower lengths of optimization, as the number of iterations and length of optimization do not significantly impact user satisfaction with the generation.
- Users can experiment with a variety of artistic styles to manipulate the aesthetic of their generations, but should avoid style keywords with multiple meanings.
- Choosing subjects and styles (figure 10) that complement each other at an elementary level, either by selecting subjects with forms or subparts that are easily interpreted by certain styles or by selecting highly relevant subjects for a given style.
- Considering the interaction between levels of abstraction for the subject and style, as they can lead to incompatible representations. [3]
Prompt keyword combinations
Pavlichenko et al. (2022) aimed to improve the aesthetic appeal of computer-generated images by developing a human-in-the-loop approach that involves human feedback to determine the most effective combination of prompt keywords. In combination with this, they used a genetic algorithm that learned the optimal prompt formulation and keyword combination for generating aesthetically pleasing images. [8]
The study showed that adding prompt keywords can significantly enhance the quality of computer-generated images. However, the most commonly used keywords do not necessarily lead to the best-looking images. To determine the importance of different keywords, the authors trained a random forest regressor on sets of keywords and their metrics. They found that the most important keywords for generating aesthetically pleasing images were different from the most widely used ones (figure 11). The approach presented in this paper can be applied to evaluate an arbitrary prompt template in various settings. [8]
Prompt Modifiers
Witteveen and Andrews (2022) presented an evaluation of the Stable Diffusion model with chosen metrics on over 15,000 image generations, using more than 2,000 prompt variations. The results revealed that different linguistic categories, such as adjectives, nouns, and proper nouns, have varying impacts on the generated images (figure 12). Simple adjectives have a relatively small effect, whereas nouns can dramatically alter the images as they introduce new content beyond mere modifiers. The paper demonstrated that words and phrases can be categorized based on their impact on image generation, and this categorization can be applied to various types of models. While the effects of each word or phrase may vary depending on the model used, the evaluation process described can establish baselines for future model evaluations. [7]
Creating a prompt to generate an image can be challenging. The authors propose starting with a clear noun-based statement that contains the main subject of the image. Then, to record what seeds are effective, look for artists and key styles to emulate and add that to the prompt, and experiment with descriptors such as adding lighting effects phrases and repeating words. [7]
Repeating Words. A technique to enhance prompts involves repeating words. The researchers examined repeating modifiers from the descriptor class to compare the effects of having the modifier once versus repeating it two, three, and five times. Repetition has been found to remove details from the background, and eventually, with five occurrences of the word, it affects the actual subject of the image (figure 13). However, multiple occurrences of a word may not necessarily have the desired semantic effect that the word is expected to contribute. [7]
Adding "Lighting" Words. Words and phrases that describe lighting effects have unique properties. They can act as descriptors, which do not significantly change generated images, or as nouns, which make larger changes in the actual content of the image. Phrases such as "ambient lighting" can change the content significantly, whereas a phrase like "beautiful volumetric lighting" has relatively little impact on the generated image (figure 14). Lighting phrases can alter the look of the subject, the mood of the image, and the background of the image. [7]
Styled by Artist. Adding the prompt "in the style of" with an artist's name to the original prompt can lead to changes in image generation on multiple levels, such as the art medium, the color palette, and the racial qualities of the subject (figure 15). [7]
Finally, Oppenlaender (2022) noted that text-to-image art practitioners uses six different types of prompt modifiers to create images of specific subjects in different styles and qualities. These six types of prompt modifiers are subject terms, image prompts, style modifiers, quality boosters, repetitions, and magic terms. [2]
Subject terms are used to indicate the desired subject for image generation, while image prompts are used to provide visual targets for the synthesis of images. Style modifiers can be added to a prompt to generate images in a certain style, while quality boosters can be used to increase the level of detail and aesthetic qualities of the image. Repetitions in the prompt can potentially strengthen the associations formed by the generative system, while magic terms (e.g. “control the soul”) introduce randomness and unpredictability to the generated images. [2]
Practitioners can assign weights to these six types of prompt modifiers, which can be negative to exclude subjects. It is important to note that the taxonomy of these prompt modifiers reflects the community's understanding and categorization of modifiers. Although text-to-image systems are trained to produce images of subjects in a particular style and quality, prompt modifiers enable the generation of unique and creative images that reflect the desired style and subject matter. [2]
Overview of Tones
Suggested Tones
- Authoritative - confident, knowledgeable,
- Casual - relaxed, friendly, playful
- Conversational - conversational, engaging,
- Empathetic - understanding, caring
- Enthusiastic - enthusiastic, optimistic
- Expert - authoritative, respected
- Friendly - warm, approachable
- Funny - humorous, entertaining
- Humorous - entertaining, playful,
- Informal - relaxed, conversational
- Informative - informative, helpful
- Persuasive - persuasive, convincing
- Positive - positive, upbeat
- Professional - knowledgeable, competent, polished, professional, refined
- Serious - serious, sincere
- Straightforward - direct, concise
- Trustworthy - reliable, dependable
- Urgent - sense of urgency, importance
Tone Combinations and Use Cases
- Conversational and Casual - social media posts, blog content, internal communication
- Empathetic and Serious - crisis communication, customer service, sensitive topics
- Enthusiastic and Positive - sales pitches, customer service, motivational content
- Expert and Authoritative - thought leadership articles, industry reports, export opinions
- Funny and Casual - social media posts, branding, content marketing
- Informal and Humorous - social media posts, blog content, internal communication
- Informative and Authoritative - thought leadership articles, industry reports
- Persuasive and Urgent - limited-time offers, promotional campaigns
- Professional and Authoritative - executive communication, industry presentation, boarding meeting
- Professional and Friendly - sales emails, customer service, marketing copy
- Straightforward and Professional - business emails, formal communication, legal documents
- Trustworthy and Professional - business proposals, executive summaries, investor pitches
Parameters
Common Parameters
- Temperature
- Perplexity
- Burstiness
User-created Parameters
Introduction
These are user-created parameters. They serve to convey the intent of the users in a more concise way. These are not part of the model API but patterns the LLM has picked up through its training. These parameters are just a compact way to deliver what is usually expressed in natural language.
Example in ChatGPT
Prompt: Write a paragraph about how adorable a puppy is.
Temperature: 1.0
Sarcasm: 0.9
Vividness: 0.4
We add "Prompt: " to the start of our prompt to make sure ChatGPT knows where our prompt is. We add the GPT parameter temperature, which goes from 0 to 1 to indicate the following parameters also range from 0 to 1. Then we list our parameters along with their values which go from 0 to 1 (0 is the smallest, and 1 is the largest). Note that having too many or contradictory parameters may lower the quality of the response.
List of Parameters
- Professionalism -
- Randomness -
- Sentimentality -
- Sesquipedalianism -
- Sarcasm -
- Laconic -
- Asyndetic -
- Vividness -
- Ecclesiastical -
References
- ↑ 1.0 1.1 1.2 1.3 1.4 Bouchard, L (2022). Prompting eExplained: How to Talk to ChatGPT. Louis Bouchard. https://www.louisbouchard.ai/prompting-explained/
- ↑ 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 Oppenlaender, J (2022). A Taxonomy of Prompt Modifiers for Text-To-Image Generation. arXiv:2204.13988v2
- ↑ 3.0 3.1 3.2 3.3 3.4 3.5 Liu, V and Chilton, LB (2021). Design Guidelines for Prompt Engineering Text-to-Image Generative Models. arXiv:2109.06977v2
- ↑ 4.0 4.1 4.2 4.3 Hao, Y, Chi, Z, Dong, L and Wei, F (2022). Optimizing Prompts for Text-to-Image Generation. arXiv:2212.09611v1
- ↑ Ana, B (2022). Design your AI Art Generator Prompt Using ChatGPT. Towards AI. https://pub.towardsai.net/design-your-ai-art-generator-prompt-using-chatgpt-7a3dfddf6f76
- ↑ 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 Raj, G (2022). How to Write Good Prompts for AI Art Generators: Prompt Engineering Made Easy. Decentralized Creator. https://decentralizedcreator.com/write-good-prompts-for-ai-art-generators/
- ↑ 7.0 7.1 7.2 7.3 7.4 7.5 Witteveen, S and Andrews, M (2022). Investigating Prompt Engineering in Diffusion Models. arXiv:2211.15462v1 https://arxiv.org/pdf/2211.15462.pdf
- ↑ 8.0 8.1 8.2 Pavlichenko, N, Zhdanov and Ustalov, D (2022) Best Prompts for Text-to-Image Models and How to Find Them. arXiv:2209.11711v2
- ↑ 9.0 9.1 9.2 9.3 9.4 9.5 9.6 Shynkarenka, V (2020). Hacking Hacker News frontpage with GPT-3. Vasili Shunkarenka. https://vasilishynkarenka.com/gpt-3/
- ↑ 10.0 10.1 10.2 10.3 Robinson, R (2023). How to Write an Effective GPT-3 or GPT-4 Prompt- Zapier. https://zapier.com/blog/gpt-prompt/
- ↑ 11.0 11.1 11.2 11.3 Matt Nigh. ChatGPT3 Prompt Engineering. GitHub. https://github.com/mattnigh/ChatGPT3-Free-Prompt-List
- ↑ 12.0 12.1 Zerkova, A (2022). How to Create Effective Prompts for AI Image Generation. Re-thought. https://re-thought.com/how-to-create-effective-prompts-for-ai-image-generation/
- ↑ 13.0 13.1 13.2 Monigatti, L (2022). A Beginner’s Guide to Prompt Design for Text-to-Image Generative Models. Towards Data Science. https://towardsdatascience.com/a-beginners-guide-to-prompt-design-for-text-to-image-generative-models-8242e1361580
- ↑ 14.0 14.1 Nielsen, L (2022). An advanced guide to writing prompts for Midjourney ( text-to-image). Mlearning. https://medium.com/mlearning-ai/an-advanced-guide-to-writing-prompts-for-midjourney-text-to-image-aa12a1e33b6
- ↑ 15.0 15.1 Strikingloo (2022). Text to image art: Experiments and prompt guide for DALL-E Mini and other AI art models. Strikingloo. https://strikingloo.github.io/art-prompts
- ↑ 16.0 16.1 DreamStudio. Prompt guide. DreamStudio. https://beta.dreamstudio.ai/prompt-guide
- ↑ 17.0 17.1 The Jasper Whisperer (2022). Improve your AI text-to-image prompts with enhanced NLP. Bootcamp. https://bootcamp.uxdesign.cc/improve-your-ai-text-to-image-prompts-with-enhanced-nlp-fc804964747f
- ↑ 18.0 18.1 Jiang, Z, Xu, FF, Araki, J and Neubig, G (2020). How Can We Know What Language Models Know? https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00324/96460/How-Can-We-Know-What-Language-Models-Know