Prompt engineering, also known as in-context learning, is an emerging research area within Human-Computer Interaction (HCI) that involves the formal search for prompts to produce desired outcomes from AI models. Prompt engineering involves techniques that guide the behavior of Large language models (LLMs) towards specific goals without modifying the model's weights. As an experimental discipline, the impact of these prompting strategies can differ significantly across various models, requiring extensive trial and error along with heuristic approaches. This process involves selecting and composing sentences to achieve a certain result, such as a specific visual style in text-to-image models or a different tone in the response of a text-to-text one. Unlike the hard sciences of STEM fields, this is an evolving technique based on trial and error to produce effective AI outcomes. [1][2][3] Prompt engineers serve as translators between "human language" and "AI language," transforming an idea into words that the AI model can comprehend. [1]
The process of prompt engineering is similar to a conversation with the generative system, with practitioners adapting and refining prompts to improve outcomes. [2] It has emerged as a new form of interaction with models that have learned complex abstractions from consuming large amounts of data from the internet. These models have metalearning capabilities and can adapt their abstractions on the fly to fit new tasks, making it necessary to prompt them with specific knowledge and abstractions to perform well on new tasks. The term "prompt engineering" was coined by Gwern (writer and technologist), who evaluated GPT3's capabilities on creative fiction and suggested that a new course of interaction would be to figure out how to prompt the model to elicit specific knowledge and abstractions. [3]
In order to get the best results from these large and powerful generative models, prompt engineering is a critical skill that users must possess. Adding certain keywords and phrases to the textual input prompts known as "prompt modifiers" can improve the aesthetic qualities and subjective attractiveness of the generated images, for example. The process of prompt engineering is iterative and experimental in nature, where practitioners formulate prompts as probes into the generative models' latent space. There are various resources and guides available to novices to help them write effective input prompts for text-to-image generation systems, however, prompt engineering is still an emerging practice that requires extensive experimentation and trial and error. [1][2][3]
Manual prompt engineering is laborious, it may be infeasible in some situations, and the prompt results may vary between various model versions. [4] However, there have been developments in automated prompt generation which rephrases the input, making it more model-friendly. [5]
Therefore, this field is important for the generation of high-quality AI-generated outputs. Text-to-image models, in particular, face limitations in their text encoders, making prompt design even more crucial to produce aesthetically pleasing images with current models. [4] These models work based on caption matching techniques and are pre-trained using millions of text-image datasets. While a result will be generated for any prompt, the quality of the artwork is directly proportional to the quality of the prompt. [6]
Prompt template allows the prompt to use of variables. It allows the prompt to stay largely the same while being used with different input values.
Products
LangChain - library for combining language models with other components to build applications.
References
Oppenlaender, J. (2023). "A Taxonomy of Prompt Modifiers for Text-To-Image Generation." *Behaviour & Information Technology*, 43(7), 1-14.
Wang, Z.J., et al. (2023). "PromptChainer: Chaining Large Language Model Prompts through Visual Programming." *CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems*.
Branwen, G. (2020). "GPT-3 Creative Fiction." gwern.net.
Hao, Y., et al. (2022). "Optimizing Prompts for Text-to-Image Generation." *arXiv preprint*.
Strobelt, H., et al. (2022). "Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models." *IEEE TVCG*.
Pavlichenko, N., et al. (2022). "Best Prompts for Text-to-Image Models and How to Find Them." *Proceedings of the 46th International ACM SIGIR Conference*.
Brown, T., et al. (2020). "Language Models are Few-Shot Learners." *Advances in Neural Information Processing Systems (NeurIPS)*.
Schulhoff, S., et al. (2024). "The Prompt Report: A Systematic Survey of Prompting Techniques." *arXiv preprint* arXiv:2402.07927.
Ouyang, L., et al. (2022). "Training language models to follow instructions with human feedback." *Advances in Neural Information Processing Systems (NeurIPS)*.
Liu, N.F., et al. (2024). "Lost in the Middle: How Language Models Use Long Contexts." *Transactions of the Association for Computational Linguistics*, 12, 157-173.
Kojima, T., et al. (2022). "Large Language Models are Zero-Shot Reasoners." *Advances in Neural Information Processing Systems (NeurIPS)*.
Wei, J., et al. (2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." *Advances in Neural Information Processing Systems (NeurIPS)*. arXiv:2201.11903.
Meincke, L., Mollick, E., Mollick, L., & Shapiro, D. (2025). "The Decreasing Value of Chain of Thought in Prompting." Wharton Generative AI Labs.
Yao, S., et al. (2023). "Tree of Thoughts: Deliberate Problem Solving with Large Language Models." *Advances in Neural Information Processing Systems (NeurIPS)*. arXiv:2305.10601.
Wang, X., et al. (2022). "Self-Consistency Improves Chain of Thought Reasoning in Language Models." *arXiv preprint* arXiv:2203.11171.
Yao, S., et al. (2022). "ReAct: Synergizing Reasoning and Acting in Language Models." *arXiv preprint* arXiv:2210.03629.
Lewis, P., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." *Advances in Neural Information Processing Systems (NeurIPS)*.
Zheng, C., et al. (2023). "When 'A Helpful Assistant' Is Not Really Helpful: Personas in System Prompts Do Not Improve Performances of Large Language Models." *arXiv preprint* arXiv:2311.10054.
Suzgun, M. & Kalai, A. (2024). "Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding." *arXiv preprint* arXiv:2401.12954.
Khattab, O., et al. (2024). "DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines." *The Twelfth International Conference on Learning Representations (ICLR)*.
OpenAI (2024). "Introducing Structured Outputs in the API." openai.com.
OWASP (2025). "LLM01:2025 Prompt Injection." OWASP Gen AI Security Project.
Hendrycks, D., et al. (2021). "Measuring Massive Multitask Language Understanding." *ICLR 2021*.
PEEM (2025). "Prompt Engineering Evaluation Metrics for Interpretable Joint Evaluation of Prompts and Responses." *arXiv preprint* arXiv:2603.10477.