Jump to content

Prompt engineering for image generation: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 87: Line 87:
===Automatic prompt engineering===
===Automatic prompt engineering===


Hao et al. (2022) mention that the implementation of manual prompt engineering towards specific text-to-image models can be laborious and sometimes infeasible. The process of manually engineering prompts is often not transferrable between various model versions. Therefore, a systematic way to automatically align user intentions and various model-preferred prompts is necessary. To address this, a prompt adaptation framework for automatic prompt engineering via reinforcement learning was proposed. The method uses supervised fine-tuning on a small collection of manually engineered prompts to initialize the prompt policy network for reinforcement learning. The model is trained by exploring optimized prompts of user inputs, where the training objective is to maximize the reward, which is defined as a combination of relevance scores and aesthetic scores of generated images. The goal of the framework is to automatically perform prompt engineering that generates model-preferred prompts to obtain better output images while preserving the original intentions of the user. <ref name="”4”"></ref>
Hao et al. (2022) mention that the implementation of manual prompt engineering towards specific text-to-image models can be laborious and sometimes infeasible. The process of manually engineering prompts is often not transferrable between various model versions. Therefore, a systematic way to automatically align user intentions and various model-preferred prompts is necessary. To address this, a prompt adaptation framework for automatic prompt engineering via reinforcement learning was proposed. The method uses supervised fine-tuning on a small collection of manually engineered prompts to initialize the prompt policy network for reinforcement learning. The model is trained by exploring optimized prompts of user inputs, where the training objective is to maximize the reward, which is defined as a combination of relevance scores and aesthetic scores of generated images. The goal of the framework is to automatically perform prompt engineering that generates model-preferred prompts to obtain better output images while preserving the original intentions of the user. <ref name="”4”">Hao, Y, Chi, Z, Dong, L and Wei, F (2022). Optimizing Prompts for Text-to-Image Generation. arXiv:2212.09611v1</ref>


The resulting prompt optimization model, named PROMPTIST (figure 8), is built upon a pretrained language model, such as GPT, and is flexible to align human intentions and model-favored languages. Optimized prompts can generate more aesthetically pleasing images (figure 9). Experimental results show that the proposed method outperforms human prompt engineering and supervised fine-tuning in terms of automatic metrics and human evaluation. Although experiments are conducted on text-to-image models, the framework can be easily applied to other tasks. <ref name="”4”"></ref>
The resulting prompt optimization model, named PROMPTIST (figure 8), is built upon a pretrained language model, such as GPT, and is flexible to align human intentions and model-favored languages. Optimized prompts can generate more aesthetically pleasing images (figure 9). Experimental results show that the proposed method outperforms human prompt engineering and supervised fine-tuning in terms of automatic metrics and human evaluation. Although experiments are conducted on text-to-image models, the framework can be easily applied to other tasks. <ref name="”4”"></ref>