2018NLPTransformerPre-training
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT is a bidirectional encoder representation from Transformers that uses masked language modeling and next sentence prediction for pre-training.