BioBERT

Healthcare AI Large Language Models Transformer Models

21 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

19 citations

Revision

v3 · 4,163 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) is a domain-specific language model that adapts BERT to biomedicine by continuing its pre-training on large biomedical corpora, namely PubMed abstracts (about 4.5 billion words) and PubMed Central (PMC) full-text articles (about 13.5 billion words), on top of BERT's original general-domain corpus.^[1] Introduced in January 2019 and published in Bioinformatics in 2020, it was the first widely adopted biomedical-domain transformer language model and rapidly became the de facto baseline for biomedical natural language processing (BioNLP). The paper reports that BioBERT improved on the previous state of the art by 0.62% F1 on biomedical named entity recognition, 2.80% F1 on relation extraction, and 12.24% MRR on biomedical question answering, and concludes that "BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora."^[1]

BioBERT was developed by the Data Mining and Information Systems (DMIS) Lab at Korea University, led by Professor Jaewoo Kang, and released as open source via the dmis-lab/biobert repository.^[17] The model is distributed under several configurations (BioBERT-Base v1.0, v1.1, v1.2 and BioBERT-Large v1.1), with the most popular checkpoint being dmis-lab/biobert-v1.1 on Hugging Face.^[19]

BioBERT demonstrated that taking a strong general-purpose encoder and continuing pre-training on biomedical text could substantially improve performance on biomedical named entity recognition, relation extraction, and question answering, often without changing the underlying architecture or vocabulary.^[1] The result was a clear empirical case for domain-specific pre-training in technical fields where vocabulary, syntax, and semantics differ sharply from general English. The paper has been cited more than 8,000 times and spawned an entire family of biomedical and clinical BERT derivatives, including BlueBERT, SciBERT, ClinicalBERT, PubMedBERT, SapBERT, BioMegatron, GatorTron, and the generative BioGPT and BioMedLM.

What is BioBERT?

BioBERT is a BERT-based encoder whose weights have been specialized for biomedical text. Its authors describe it as "a domain-specific language representation model pre-trained on large-scale biomedical corpora," and note that "directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora."^[1] In other words, BioBERT exists because the everyday English that BERT learned from Wikipedia and books does not match the dense, specialized vocabulary of biomedical literature, so the model is re-exposed to millions of biomedical documents to close that gap.

Architecturally BioBERT is identical to BERT (see How was BioBERT trained?); the differences are entirely in the training data and the resulting weights. That design makes BioBERT a drop-in replacement for BERT in any standard fine-tuning pipeline, which is a large part of why it was adopted so quickly.

Background and motivation

By late 2018, BERT had set new state-of-the-art results on the GLUE benchmark and on a wide range of general-domain natural language understanding tasks. The original BERT, released by Google AI in October 2018, was pre-trained on the BooksCorpus (about 0.8 billion words) and English Wikipedia (about 2.5 billion words) using two self-supervised objectives: masked language modeling (MLM) and next sentence prediction (NSP).^[3] These corpora are dominated by everyday vocabulary and narrative prose.

Biomedical text is different. Words such as transcriptional, idiopathic, adenocarcinoma, BRCA1, acetylcholinesterase, and NF-κB appear constantly in biomedical literature but rarely in Wikipedia or novels. Entity names are dense and ambiguous (gene names overlap with everyday words, drug names are highly variable), sentences are long and syntactically complex, and the underlying semantic relationships often involve specialized scientific concepts. Out-of-the-box BERT performed poorly on biomedical tasks compared with task-specific BiLSTM-CRF systems trained on labeled biomedical corpora, despite BERT's general advantage on most other NLP tasks.^[1]

The DMIS Lab team hypothesized that the BERT framework could be adapted by simply continuing pre-training on biomedical text rather than re-training from scratch. This approach is far cheaper than from-scratch pre-training and inherits BERT's general linguistic knowledge as a starting point. The result was BioBERT.

Who created BioBERT and when was it released?

BioBERT was developed at the DMIS (Data Mining and Information Systems) Lab in the Department of Computer Science and Engineering at Korea University in Seoul, South Korea, under the supervision of Professor Jaewoo Kang. The paper authors are Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang.^[1]

Key dates and venues:

Event	Date
First arXiv preprint (v1)	25 January 2019 (arXiv:1901.08746)
Final arXiv revision (v4)	10 September 2019
Published in Bioinformatics	10 September 2019 (online), 2020 (print, 36(4):1234-1240)
Open source release on GitHub	2019 (`dmis-lab/biobert`)
Pre-trained weights mirrored	`naver/biobert-pretrained`

The code base is built on top of Google's reference TensorFlow implementation of BERT, with PyTorch ports added later via Hugging Face Transformers.^[17] Pre-training compute was contributed by Naver, the Korean search and AI company, which also hosts the pre-trained weights mirror.^[18]

What data was BioBERT pre-trained on?

BioBERT extends BERT by performing additional pre-training (also called continual or domain-incremental pre-training) on biomedical text. The corpora used are:^[1]

Corpus	Size (words)	Domain
English Wikipedia	2.5 billion	General (inherited from BERT)
BooksCorpus	0.8 billion	General (inherited from BERT)
PubMed abstracts	4.5 billion	Biomedical literature abstracts
PMC full-text articles	13.5 billion	Biomedical literature full text

PubMed is the United States National Library of Medicine's bibliographic database of biomedical citations, and PubMed Central (PMC) is the corresponding open-access repository of full-text journal articles. The combined biomedical corpus is roughly 18 billion words, an order of magnitude larger than the original BERT corpus, and is exclusively scientific.^[1]

How was BioBERT trained?

A distinctive design choice of BioBERT is that pre-training is initialized from the released BERT weights, not from scratch. The model then continues training on the biomedical corpora using the same MLM and NSP objectives that BERT used. Critically, BioBERT keeps the original BERT WordPiece vocabulary (cased, 28,996 tokens) rather than building a new biomedical vocabulary.^[1] This was a pragmatic decision: starting from BERT weights requires using BERT's tokenizer, and changing the vocabulary mid-training would invalidate the embedding table. The downside, addressed later by PubMedBERT, is that many biomedical terms get split into long sequences of subword pieces because they are absent from BERT's general-domain vocabulary.^[7]

Training used a maximum sequence length of 512 tokens and a batch size of 192 sequences. The hardware was eight NVIDIA V100 GPUs, and the longest run (BioBERT v1.1) took roughly 23 days.^[1] Several versioned checkpoints were released, differing in which combination of corpora was used and how many pre-training steps were taken.

Released versions

Version	Initialized from	Additional corpus	Steps	Vocabulary
BioBERT-Base v1.0 (+PubMed)	BERT-Base Cased	PubMed abstracts	200K	BERT-Base Cased
BioBERT-Base v1.0 (+PMC)	BERT-Base Cased	PMC full text	270K	BERT-Base Cased
BioBERT-Base v1.0 (+PubMed +PMC)	BERT-Base Cased	PubMed + PMC	470K	BERT-Base Cased
BioBERT-Base v1.1 (+PubMed)	BERT-Base Cased	PubMed abstracts	1M	BERT-Base Cased
BioBERT-Base v1.2 (+PubMed)	BERT-Base Cased	PubMed abstracts	1M	BERT-Base Cased (with LM head)
BioBERT-Large v1.1 (+PubMed)	BERT-Large Cased	PubMed abstracts	1M	Custom 30K vocab

Version v1.1 (Base, PubMed only, 1M steps) became the most widely cited and used checkpoint, and is the recommended default in most subsequent papers. Version v1.2 is functionally similar but ships with the language modeling head intact, which is convenient for users who want to do further pre-training or use the model for fill-mask inference.

Architecture

BioBERT inherits BERT's transformer encoder architecture without modification. The base configuration has 12 transformer encoder layers, hidden size 768, 12 self-attention heads, and roughly 110 million parameters. The large configuration has 24 layers, hidden size 1024, 16 attention heads, and around 340 million parameters. Each input is the WordPiece-tokenized text with a [CLS] classification token prepended and a [SEP] separator. The positional embeddings cap input length at 512 tokens, which is a real constraint for biomedical NLP because full PubMed abstracts often exceed this length and full-text articles certainly do.

Nothing in BioBERT's encoder distinguishes it from BERT in terms of structure. The differences are entirely in the data and the resulting weights, which is why BioBERT can be loaded into any standard BERT-compatible code with a different checkpoint path. This made adoption trivial in practice.

How well does BioBERT perform on biomedical NLP tasks?

The original BioBERT paper evaluated three categories of biomedical tasks: named entity recognition (NER), relation extraction (RE), and question answering (QA). The fine-tuning recipe is standard BERT-style: a linear classification head on top of the final layer, trained with cross-entropy loss on a small task-specific dataset. Across these three task families the paper reports gains over the prior state of the art of 0.62% F1 (NER), 2.80% F1 (RE), and 12.24% MRR (QA).^[1]

Named entity recognition results

BioBERT was evaluated on nine biomedical NER datasets covering four entity types (disease, drug/chemical, gene/protein, species). Reported entity-level F1 scores for BioBERT-Base v1.1 (+PubMed) are:^[1]

Dataset	Entity type	BioBERT v1.1 F1
NCBI Disease	Disease	89.71
BC5CDR-Disease	Disease	87.15
BC5CDR-Chemical	Chemical	93.47
BC4CHEMD	Chemical	92.36
BC2GM	Gene/Protein	84.72
JNLPBA	Gene/Protein	77.49
LINNAEUS	Species	88.24
Species-800	Species	74.06

The paper reports an average F1 improvement of 0.62 over the previous state of the art across these NER datasets.^[1] While the absolute gain is modest, the improvement is consistent across all datasets, and BioBERT achieved this with a single uniform model and minimal task-specific engineering.

Relation extraction results

Relation extraction was evaluated on three datasets (ChemProt for chemical-protein interactions, GAD for gene-disease associations, and EU-ADR for adverse drug reactions). BioBERT v1.1 (+PubMed) reports:^[1]

Dataset	Task	BioBERT v1.1 F1
ChemProt	Chemical-protein	76.46
GAD	Gene-disease	79.83
EU-ADR	Adverse drug reactions	79.74

The paper reports an average F1 improvement of 2.80 over the previous state of the art across the three RE datasets.^[1]

Question answering results

QA was evaluated on the BioASQ factoid task across three challenge years (4b, 5b, 6b). BioBERT v1.1 (+PubMed) reports:^[1]

Dataset	Strict Accuracy	Lenient Accuracy	MRR
BioASQ 4b	27.95	44.10	34.72
BioASQ 5b	46.00	60.00	51.64
BioASQ 6b	42.86	57.77	48.43

The paper reports an average MRR improvement of 12.24 over the previous state of the art across BioASQ.^[1] QA showed the largest absolute gain because earlier biomedical QA systems relied on hand-engineered features and were a poor fit for free-text answer extraction.

How does BioBERT compare to BERT and SciBERT?

BioBERT, BERT, and SciBERT are all built on the same transformer encoder, but they differ in what they were trained on and in whose vocabulary they use. BERT is the general-domain baseline. BioBERT and SciBERT are the two best-known early attempts to specialize that baseline for science, and they took opposite approaches.

BioBERT was initialized from BERT's weights and then continued pre-training on PubMed and PMC, reusing BERT's original WordPiece vocabulary unchanged.^[1] SciBERT (Beltagy, Lo, and Cohan, 2019) instead trained from scratch on a broader scientific corpus and built its own in-domain vocabulary. The SciBERT authors state that their model is "trained from scratch" and that they "construct SciVocab, a new WordPiece vocabulary on our scientific corpus using the SentencePiece library."^[4] SciBERT's corpus is 1.14 million papers from Semantic Scholar, 82% biomedical and 18% computer science, totaling about 3.17 billion tokens.^[4] The new vocabulary matters: the SciBERT paper reports that "the resulting token overlap between BaseVocab and SciVocab is 42%, illustrating a substantial difference in frequently used words between scientific and general domain texts."^[4]

Model	Year	Initialization	Pre-training corpus	Vocabulary
BERT	2018	From scratch	Wikipedia (2.5B) + BooksCorpus (0.8B)	BERT WordPiece (general)
BioBERT	2019	Continued from BERT	PubMed (4.5B) + PMC (13.5B)	BERT WordPiece (reused)
SciBERT	2019	From scratch	1.14M Semantic Scholar papers, ~3.17B tokens (82% biomed, 18% CS)	SciVocab (custom, 42% overlap with BERT)

The practical upshot: BioBERT is cheaper to produce and stays close to BERT's general knowledge, but it inherits a general-domain vocabulary that over-fragments biomedical terms. SciBERT spends more to train a custom vocabulary and from-scratch weights, and reported stronger results than BioBERT on some shared benchmarks such as BC5CDR and ChemProt. The later PubMedBERT model pushed the from-scratch, in-domain-vocabulary idea further and showed it generally beats continual pre-training when billions of words of in-domain text are available, which is now the conventional wisdom for high-resource technical domains.^[7]

Variants and successors

BioBERT triggered a wave of follow-up work that explored alternative pre-training corpora, vocabularies, architectures, and scales. The table below summarizes the most influential biomedical and clinical language models that came after BioBERT.

Model	Year	Authors / org	Pre-training data	Vocabulary	Notes
BERT	2018	Devlin et al., Google AI	Wikipedia + BooksCorpus	BERT-Base Cased/Uncased	General domain baseline
BioBERT	2019/2020	Lee et al., Korea University DMIS	BERT corpus + PubMed + PMC	BERT-Base Cased	Continued pre-training from BERT
BlueBERT	2019	Peng et al., NIH	PubMed + MIMIC-III	BERT-Base	Mixed biomedical + clinical
SciBERT	2019	Beltagy et al., AI2 (Allen Institute)	1.14M Semantic Scholar papers (82% biomed, 18% CS)	Custom scivocab	From-scratch with new vocab
ClinicalBERT (Bio_ClinicalBERT)	2019	Alsentzer et al., MIT/Harvard	MIMIC-III clinical notes (~880M words)	BERT-Base Cased	Initialized from BioBERT, fine-tuned on clinical text
ClinicalBERT (Huang)	2019	Huang et al., NYU	MIMIC-III discharge summaries	BERT-Base	Hospital readmission prediction
BioMed-RoBERTa	2020	Gururangan et al., AI2	2.68M biomed/CS papers	RoBERTa	Domain-adaptive pre-training (DAPT) on top of RoBERTa
BioMegatron	2020	Shin et al., NVIDIA	PubMed	Custom	345M-1.2B parameter biomedical model based on Megatron-LM
PubMedBERT	2021	Gu et al., Microsoft Research	PubMed abstracts (and full text variant)	Custom biomedical vocab	From-scratch, outperforms BioBERT on most tasks
SapBERT	2021	Liu et al., Cambridge LTL	UMLS synonyms	PubMedBERT vocab	Synonym-aware, used for entity linking
GatorTron	2022	Yang et al., University of Florida	90B words (82B clinical + PubMed + Wikipedia)	Custom	345M to 8.9B parameter clinical model
BioGPT	2023	Luo et al., Microsoft Research	PubMed (15M abstracts)	Custom	Generative GPT-2-style biomedical model
BioMedLM (PubMedGPT)	2022/2023	Stanford CRFM + MosaicML	PubMed via The Pile	Custom	2.7B parameter GPT model; 50.3% on MedQA
PMC-LLaMA	2023	Wu et al.	PubMed Central + medical books	LLaMA tokenizer	Continued pre-training of LLaMA on biomedical text
Med-PaLM / Med-PaLM 2	2023	Singhal et al., Google	PaLM with medical instruction tuning	PaLM	Generative; passed USMLE-style questions
Clinical ModernBERT	2025	Recent	Biomedical + clinical text	ModernBERT	Long-context biomedical encoder

A few comparisons are worth flagging. PubMedBERT showed that for a domain like biomedicine, where billions of words of in-domain text are freely available, training a vocabulary and weights from scratch on biomedical text alone outperforms continual pre-training from a general-domain checkpoint.^[7] This was a direct critique of BioBERT's design and is now the conventional wisdom for high-resource technical domains. SciBERT made a similar argument with a smaller scientific corpus and a custom vocabulary, and reported better results than BioBERT on BC5CDR and ChemProt despite a smaller training corpus.^[4] ClinicalBERT by Alsentzer et al. starts from BioBERT's weights and continues pre-training on MIMIC-III clinical notes, producing a model that outperforms BioBERT on clinical (as opposed to literature) tasks.^[5] Later models such as GatorTron and BioMedLM scaled up to billions of parameters, and the very latest generation of biomedical foundation models is generative (BioGPT, Med-PaLM, PMC-LLaMA), aligning with the broader shift in NLP from encoder-only models to decoder-only LLMs.

What is BioBERT used for?

BioBERT and its derivatives have been applied across the entire biomedical text-mining stack. Common applications include:

Biomedical literature mining and information extraction over PubMed.
Biomedical named entity recognition for diseases, chemicals, drugs, genes, proteins, and species.
Drug-drug interaction (DDI) extraction from clinical and pharmacology literature.
Adverse drug event (ADE) detection from medical text and social media for pharmacovigilance.
Protein-protein interaction (PPI) extraction from biomedical abstracts.
Gene-disease association mining for biomarker discovery.
Biomedical question answering, including BioASQ challenges.
Clinical decision support, with appropriate caution regarding patient safety.
Medical entity linking (mapping mentions to UMLS, MeSH, or SNOMED-CT concept identifiers), often combined with SapBERT.
Drug repurposing pipelines that mine literature for unexpected drug-target relationships.
Semantic search over PubMed using BioBERT embeddings as a feature extractor for retrieval.
Document classification for systematic review screening, where BioBERT can triage candidate abstracts.

Practical considerations

Fine-tuning BioBERT follows the standard BERT recipe: load the pre-trained checkpoint, attach a task-specific head, train for a few epochs with a small learning rate (commonly 2e-5 to 5e-5) on labeled biomedical data. The original repository provides reference fine-tuning scripts for NER, RE, and QA.^[17] The Hugging Face Transformers ecosystem hosts the canonical checkpoints (dmis-lab/biobert-v1.1, dmis-lab/biobert-base-cased-v1.2, and the large variant), and they slot into existing pipelines with no modification.^[19]

For parameter-efficient fine-tuning, BioBERT works with adapters and LoRA, which is helpful when GPU memory is tight or when many task-specific heads need to be maintained. BioBERT is also frequently used as a frozen feature extractor: the contextual embeddings from the final or penultimate layer are fed into downstream classifiers, retrieval systems, or clustering algorithms. This is especially common in production biomedical NLP systems where robustness and predictability matter more than the marginal gain from full fine-tuning.

The original tokenizer is BERT's WordPiece, so domain-specific terms get fragmented into multiple subwords. Users should be aware that token counts on biomedical text are higher than on general text for the same character length, which interacts with the 512-token sequence limit. For long documents, common workarounds include sliding-window inference, hierarchical models, or switching to long-context biomedical encoders such as Clinical-Longformer or Clinical ModernBERT.

Limitations

BioBERT has well-known limitations that have shaped subsequent research:

Vocabulary mismatch: because the tokenizer is BERT's general vocabulary, biomedical terms are over-fragmented. PubMedBERT showed that a from-scratch biomedical vocabulary materially helps performance on many tasks.^[7]
Domain-incremental rather than from-scratch: the choice to initialize from BERT carries some general-domain bias forward and was later shown to be suboptimal for high-resource domains.^[7]
Sequence length cap: 512 tokens is short for many biomedical applications, especially full-text articles and clinical narratives.
Encoder-only: BioBERT cannot generate text. Tasks framed as generation (summarization, question answering with free-form answers, dialog) require a different architecture.
Older transformer scale: 110M to 340M parameters is small by modern standards. GatorTron, BioMedLM, and Med-PaLM operate at billions to hundreds of billions of parameters and substantially outperform BioBERT on harder tasks.
English-only: PubMed and PMC are predominantly English, so BioBERT does not transfer to multilingual biomedical settings.
Static knowledge: like all pre-trained encoders, BioBERT's knowledge is frozen at training time. New literature requires re-pre-training to be incorporated.
Not a clinical-text expert: BioBERT is trained on biomedical literature, not on clinical notes (which are stylistically very different, with abbreviations and incomplete sentences). Models such as ClinicalBERT, BlueBERT, and GatorTron explicitly target the clinical setting and tend to perform better there.^[5]
No safety guarantees for clinical use: BioBERT outputs should not be used for direct patient care without expert validation, regulatory clearance, and appropriate monitoring.

Influence and adoption

BioBERT has been one of the most cited papers in biomedical NLP, with citation counts on Google Scholar exceeding 8,000 by 2024. Its influence shows up in three forms.

First, it established the domain-specific BERT paradigm. After BioBERT, every well-resourced technical domain saw an analogous model (LegalBERT, FinBERT, ScholarBERT, MathBERT, AraBERT, JuriBERT, and more). The recipe of "take BERT, continue pre-training on your domain's text" became standard practice, even after PubMedBERT showed the from-scratch alternative often does better.^[7]

Second, it became the default baseline. Almost every biomedical NLP paper published since 2019 reports BioBERT numbers as a comparison point, including the papers that propose alternative architectures. The Microsoft BLURB benchmark, introduced with PubMedBERT in 2021, formalized this by including BioBERT alongside several other models in a standardized leaderboard for biomedical NLP.^[7]

Third, it remains in active production use. Despite the rise of larger and generative models, BioBERT (and its close derivatives) is widely deployed in biomedical search, semantic indexing, clinical NLP pipelines, and systematic-review tooling. Its small size, predictable behavior, and strong performance on classification and tagging make it a reliable workhorse.

Recent context (2024-2026)

The biomedical NLP landscape has shifted considerably since BioBERT's release. Generative LLMs, including GPT-4, GPT-4o, Claude 3.5 and 3.7 Sonnet, Gemini 1.5 and 2.0, and the dedicated medical models Med-PaLM 2 and Med-Gemini, have become competitive or superior on many biomedical tasks, especially question answering, summarization, and reasoning. Large biomedical encoder-decoder and decoder-only models such as BioGPT, BioMedLM, PMC-LLaMA, and Meditron offer in-context learning capabilities that BioBERT lacks.

At the same time, BioBERT is still widely used in 2024 and 2025 for several reasons. Supervised fine-tuning of a small encoder remains cheaper and more reliable than zero-shot prompting of a frontier LLM for high-volume tasks like NER over millions of PubMed abstracts. Encoder embeddings remain better suited to dense retrieval than autoregressive models. Regulatory and reproducibility constraints in clinical settings often favor smaller open models with stable behavior over closed frontier APIs. And many existing biomedical NLP systems, especially in pharmaceutical companies and clinical informatics groups, are built on BioBERT or close relatives and would not be replaced lightly.

The broader trajectory is clearly toward larger generative biomedical models trained on combinations of PubMed, PMC, clinical notes, medical textbooks, and curated reasoning data. BioBERT's specific architecture is increasingly historical, but the principle it established (that domain-specific pre-training matters in biomedicine, and that PubMed and PMC are the right places to start) continues to shape every new biomedical foundation model.

References

Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. *Bioinformatics*, 36(4), 1234-1240. ↩
Lee, J., et al. (2019). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. arXiv:1901.08746.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT. ↩
Beltagy, I., Lo, K., & Cohan, A. (2019). SciBERT: A Pretrained Language Model for Scientific Text. EMNLP. ↩
Alsentzer, E., et al. (2019). Publicly Available Clinical BERT Embeddings. Clinical NLP Workshop. ↩
Peng, Y., Yan, S., & Lu, Z. (2019). Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets (BlueBERT). BioNLP.
Gu, Y., et al. (2021). Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing (PubMedBERT). *ACM Transactions on Computing for Healthcare*. ↩
Liu, Y., et al. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692.
Liu, F., et al. (2021). Self-Alignment Pretraining for Biomedical Entity Representations (SapBERT). NAACL.
Gururangan, S., et al. (2020). Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. ACL.
Shin, H.-C., et al. (2020). BioMegatron: Larger Biomedical Domain Language Model. EMNLP.
Yang, X., et al. (2022). GatorTron: A Large Clinical Language Model to Unlock Patient Information from Unstructured Electronic Health Records. arXiv:2203.03540 / *npj Digital Medicine*.
Luo, R., et al. (2023). BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining. *Briefings in Bioinformatics*.
Bolton, E., et al. (2024). BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text. arXiv:2403.18421.
Wu, C., et al. (2023). PMC-LLaMA: Towards Building Open-source Language Models for Medicine. arXiv:2304.14454.
Singhal, K., et al. (2023). Large Language Models Encode Clinical Knowledge (Med-PaLM). *Nature*.
DMIS Lab (Korea University). `dmis-lab/biobert` GitHub repository. ↩
Naver. `naver/biobert-pretrained` GitHub repository. ↩
Hugging Face. `dmis-lab/biobert-v1.1` model card. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

BioGPT Fill-Mask Models PubMedQA Token Classification Models

What is BioBERT?

Background and motivation

Who created BioBERT and when was it released?

What data was BioBERT pre-trained on?

How was BioBERT trained?

Released versions

Architecture

How well does BioBERT perform on biomedical NLP tasks?

Named entity recognition results

Relation extraction results

Question answering results

How does BioBERT compare to BERT and SciBERT?

Variants and successors

What is BioBERT used for?

Practical considerations

Limitations

Influence and adoption

Recent context (2024-2026)

See also

References

Improve this article

Related Articles

BioGPT

Med-PaLM

Med-PaLM 2

BERT

Rotary Position Embedding

PaLM

What links here

Related Articles

BioGPT

Med-PaLM

Med-PaLM 2

BERT

Rotary Position Embedding

PaLM

What links here