HomeWikiNatural language processingNatural language processingArtificial intelligenceMachine learningNatural Language Processing26 min readUpdated Mar 25, 2026EditHistory Terms See also: Natural Language Processing terms attention bag of words BERT (Bidirectional Encoder Representations from Transformers) bigram bidirectional bidirectional language model BLEU (Bilingual Evaluation Understudy) causal language model crash blossom decoder denoising embedding layer embedding space embedding vector encoder GPT (Generative Pre-trained Transformer) LaMDA (Language Model for Dialogue Applications) language model large language model masked language model meta-learning modality model parallelism multi-head self-attention multimodal model natural language understanding N-gram NLU pipelining self-attention (also called self-attention layer) sentiment analysis sequence-to-sequence task sparse feature sparse representation staged training token Transformer trigram unidirectional unidirectional language model word embedding Models See also: Natural Language Processing Models Related ArticlesTokenizationLarge_language_models, Machine_learning, Natural_Language_ProcessingRetrieval-augmented generationLarge_language_models, Machine_learning, Natural_Language_ProcessingTemperature (artificial intelligence)Large_language_models, Machine_learning, Natural_Language_ProcessingTop-p samplingLarge_language_models, Machine_learning, Natural_Language_ProcessingSemantic searchMachine_learning, Natural_Language_ProcessingContext engineeringArtificial_Intelligence, Machine_learning, Natural_Language_Processing