Browse Definitions :

An explanation of masked language models

In this video, TechTarget editor Jen English talks about masked language models.

Masked language models take generative AI to the next level.

Masked language models, or MLMs, have emerged as a breakthrough in natural language processing, revolutionizing how machines understand and generate human language.

These models are specifically designed to train language models such as transformers and can grasp the intricate nuances of language by predicting missing words within a given context.

Hugging Face is well-known for its access to a wide range of pretrained models, including masked language models such as BERT.

At the core of masked language models lies the concept of masking tokens. During training, certain words -- AKA tokens -- are masked intentionally and the model is tasked with predicting the correct word based on its surrounding context. This enables the model to learn word relationships, semantics and grammatical structures. For example, in the sentence, "The cat [blank] the tree," the model might predict the word "climbed" as the masked token.

Traditional or causal language models, such as GPT-2, GPT-3, T5 and GPT-Neo, are unidirectional and can only predict the next token in a sequence of tokens and attend to words on only one side of the masked token. However, MLMs are bidirectional and can attend to both left and right sides of masked tokens for making predictions.

MLMs such as BERT excel in language-related tasks, including text classification, named entity recognition and sentiment analysis due to their extensive training on large sets of diverse data.

Here are the key advantages of masked language models:

  • They can handle ambiguous language. MLMs can contextualize words based on their surrounding context, disambiguating homonyms or words with multiple meanings. This improves their ability to comprehend natural language and generate more coherent responses.
  • They are bidirectional. Unlike conventional language models that only consider either the left or the right side of the masked tokens to make predictions, MLMs attend to the surrounding words on both sides of the tokens. Their ability to access both preceding and succeeding words allows them to have a deeper comprehension of the semantics and sentence structure.
  • They can be fine-tuned for specific tasks. This enables efficient and effective adaptation to new domains or languages. This transfer-learning approach reduces the need for large amounts of labeled data and conserves computing power.
  • They offer a wide range of applications. With all the above features, MLMs open doors to a wide range of applications, from virtual assistants and sentiment analysis to machine translation and language translation, and beyond. For example, they can help virtual assistants understand user intent by predicting the missing or masked words in user queries.

Masked language models are pushing the boundaries of AI with their wide range of use cases. Are you using masked language models to elevate your AI applications? Share your thoughts in the comments below and be sure to hit that like button and subscribe.

Kinza Yasar is a technical writer for WhatIs with a degree in computer networking.

Networking
  • subnet (subnetwork)

    A subnet, or subnetwork, is a segmented piece of a larger network. More specifically, subnets are a logical partition of an IP ...

  • Transmission Control Protocol (TCP)

    Transmission Control Protocol (TCP) is a standard protocol on the internet that ensures the reliable transmission of data between...

  • secure access service edge (SASE)

    Secure access service edge (SASE), pronounced sassy, is a cloud architecture model that bundles together network and cloud-native...

Security
CIO
  • product development (new product development)

    Product development -- also called new product management -- is a series of steps that includes the conceptualization, design, ...

  • innovation culture

    Innovation culture is the work environment that leaders cultivate to nurture unorthodox thinking and its application.

  • technology addiction

    Technology addiction is an impulse control disorder that involves the obsessive use of mobile devices, the internet or video ...

HRSoftware
  • HireVue

    HireVue is an enterprise video interviewing technology provider of a platform that lets recruiters and hiring managers screen ...

  • Human Resource Certification Institute (HRCI)

    Human Resource Certification Institute (HRCI) is a U.S.-based credentialing organization offering certifications to HR ...

  • e-recruitment (e-recruiting)

    E-recruitment is an umbrella term for any electronic-based recruiting and recruitment management activity.

Customer Experience
  • digital marketing

    Digital marketing is the promotion and marketing of goods and services to consumers through digital channels and electronic ...

  • contact center schedule adherence

    Contact center schedule adherence is a standard metric used in business contact centers to determine whether contact center ...

  • customer retention

    Customer retention is a metric that measures customer loyalty, or an organization's ability to retain customers over time.

Close