login
Home / Papers / Transformers: State-of-the-Art Natural Language Processing

Transformers: State-of-the-Art Natural Language Processing

1172 Citations2020
Thomas Wolf, Lysandre Debut, Victor Sanh

No TL;DR found

Abstract

v4.10.0: LayoutLM-v2, LayoutXLM, BEiT LayoutLM-v2 and LayoutXLM Four new models are released as part of the LatourLM-v2 implementation: <code>LayoutLMv2ForSequenceClassification</code>, <code>LayoutLMv2Model</code>, <code>LayoutLMv2ForTokenClassification</code> and <code>LayoutLMv2ForQuestionAnswering</code>, in PyTorch. The LayoutLMV2 model was proposed in LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. LayoutLMV2 improves LayoutLM to obtain state-of-the-art results across several document image understanding benchmarks: Add LayoutLMv2 + LayoutXLM #12604 (@NielsRogge) Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=layoutlmv2 BEiT Three new models are released as part of the BEiT implementation: <code>BeitModel</code>, <code>BeitForMaskedImageModeling</code>, and <code>BeitForImageClassification</code>, in PyTorch. The BEiT model was proposed in BEiT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong and Furu Wei. Inspired by BERT, BEiT is the first paper that makes self-supervised pre-training of Vision Transformers (ViTs) outperform supervised pre-training. Rather than pre-training the model to predict the class of an image (as done in the original ViT paper), BEiT models are pre-trained to predict visual tokens from the codebook of OpenAI's DALL-E model given masked patches. Add BEiT #12994 (@NielsRogge) Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=beit Speech improvements The Wav2Vec2 and HuBERT models now have a sequence classification head available. Add Wav2Vec2 &amp; Hubert ForSequenceClassification #13153 (@anton-l) DeBERTa in TensorFlow (@kamalkraj) The DeBERTa and DeBERTa-v2 models have been converted from PyTorch to TensorFlow. Deberta tf #12972 (@kamalkraj) Deberta_v2 tf #13120 (@kamalkraj) Flax model additions EncoderDecoder, DistilBERT, and ALBERT, now have support in Flax! FlaxEncoderDecoder allowing Bert2Bert and Bert2GPT2 in Flax #13008 (@ydshieh) FlaxDistilBERT #13324 (@kamalkraj) FlaxAlBERT #13294 (@kamalkraj) TensorFlow examples A new example has been added in TensorFlow: multiple choice! Data collators have become framework agnostic and can now work for both TensorFlow and NumPy on top of PyTorch. Add TF multiple choice example #12865 (@Rocketknight1) TF/Numpy variants for all DataCollator classes #13105 (@Rocketknight1) Auto API refactor The Auto APIs have been disentangled from all the other mode modules of the Transformers library, so you can now safely import the Auto classes without importing all the models (and maybe getting errors if your setup is not compatible with one specific model). The actual model classes are only imported when needed. Disentangle auto modules from other modeling files #13023 (@sgugger) Fix AutoTokenizer when no fast tokenizer is available #13336 (@sgugger) Slight breaking change When loading some kinds of corrupted state dictionaries of models, the <code>PreTrainedModel.from_pretrained</code> method was sometimes silently ignoring weights. This has now become a real error. Fix from_pretrained with corrupted state_dict #12939 (@sgugger) General improvements and bugfixes Improving pipeline tests #12784 (@Narsil) Pin git python to &lt;3.1.19 #12858 (@patrickvonplaten) [tests] fix logging_steps requirements #12860 (@stas00) [Sequence Feature Extraction] Add truncation #12804 (@patrickvonplaten) add <code>classifier_dropout</code> to classification heads #12794 (@PhilipMay) Fix barrier for SM distributed #12853 (@sgugger) Add possibility to ignore imports in test_fecther #12801 (@sgugger) Add accelerate to examples requirements #12888 (@sgugger) Fix documentation of BigBird tokenizer #12889 (@sgugger) Better heuristic for token-classification pipeline. #12611 (@Narsil) Fix push_to_hub for TPUs #12895 (@sgugger) <code>Seq2SeqTrainer</code> set max_length and num_beams only when non None #12899 (@cchen-dialpad) [FLAX] Minor fixes in CLM example #12914 (@stefan-it) Correct validation_split_percentage argument from int (ex:5) to float (0.05) #12897 (@Elysium1436) Fix typo in the example of MobileBertForPreTraining #12919 (@buddhics) Add option to set max_len in run_ner #12929 (@sgugger) Fix QA examples for roberta tokenizer #12928 (@sgugger) Print defaults when using --help for scripts #12930 (@sgugger) Fix StoppingCriteria ABC signature #12918 (@willfrey) Add missing @classmethod decorators #12927 (@willfrey) fix distiller.py #12910 (@chutaklee) Update generation_logits_process.py #12901 (@willfrey) Update generation_logits_process.py #12900 (@willfrey) Update tokenization_auto.py #12896 (@willfrey) Fix docstring typo in tokenization_auto.py #12891 (@willfrey) [Flax] Correctly Add MT5 #12988 (@patrickvonplaten) ONNX v2 raises an Exception when using PyTorch &lt; 1.8.0 #12933 (@mfuntowicz) Moving feature-extraction pipeline to new testing scheme #12843 (@Narsil) Add CpmTokenizerFast #12938 (@JetRunner) fix typo in gradient_checkpointing arg #12855 (@21jun) Log Azure ML metrics only for rank 0 #12766 (@harshithapv) Add substep end callback method #12951 (@wulu473) Add multilingual documentation support #12952 (@JetRunner) Fix division by zero in NotebookProgressPar #12953 (@sgugger) [FLAX] Minor fixes in LM example #12947 (@stefan-it) Prevent <code>Trainer.evaluate()</code> crash when using only tensorboardX #12963 (@aphedges) Fix typo in example of DPRReader #12954 (@tadejsv) Place BigBirdTokenizer in sentencepiece-only objects #12975 (@sgugger) fix typo in example/text-classification README #12974 (@fullyz) Fix template for inputs docstrings #12976 (@sgugger) fix <code>Trainer.train(resume_from_checkpoint=False)</code> is causing an exception #12981 (@PhilipMay) Cast logits from bf16 to fp32 at the end of TF_T5 #12332 (@szutenberg) Update CANINE test #12453 (@NielsRogge) pad_to_multiple_of added to DataCollatorForWholeWordMask #12999 (@Aktsvigun) [Flax] Align jax flax device name #12987 (@patrickvonplaten) [Flax] Correct flax docs #12782 (@patrickvonplaten) T5: Create position related tensors directly on device instead of CPU #12846 (@armancohan) Skip ProphetNet test #12462 (@LysandreJik) Create perplexity.rst #13004 (@sashavor) GPT-Neo ONNX export #12911 (@michaelbenayoun) Update generate method - Fix floor_divide warning #13013 (@nreimers) [Flax] Correct pt to flax conversion if from base to head #13006 (@patrickvonplaten) [Flax T5] Speed up t5 training #13012 (@patrickvonplaten) FX submodule naming fix #13016 (@michaelbenayoun) T5 with past ONNX export #13014 (@michaelbenayoun) Fix ONNX test: Put smaller ALBERT model #13028 (@LysandreJik) Tpu tie weights #13030 (@sgugger) Use min version for huggingface-hub dependency #12961 (@lewtun) tfhub.de -&gt; tfhub.dev #12565 (@abhishekkrthakur) [Flax] Refactor gpt2 &amp; bert example docs #13024 (@patrickvonplaten) Add MBART to models exportable with ONNX #13049 (@LysandreJik) Add to ONNX docs #13048 (@LysandreJik) Fix small typo in M2M100 doc #13061 (@SaulLu) Add try-except for torch_scatter #13040 (@JetRunner) docs: add HuggingArtists to community notebooks #13050 (@AlekseyKorshuk) Fix ModelOutput instantiation form dictionaries #13067 (@sgugger) Roll out the test fetcher on push tests #13055 (@sgugger) Fix fallback of test_fetcher #13071 (@sgugger) Revert to all tests whil we debug what's wrong #13072 (@sgugger) Use original key for label in DataCollatorForTokenClassification #13057 (@ibraheem-moosa) [Doctest] Setup, quicktour and task_summary #13078 (@sgugger) Add VisualBERT demo notebook #12263 (@gchhablani) Install git #13091 (@LysandreJik) Fix classifier dropout in AlbertForMultipleChoice #13087 (@ibraheem-moosa) Doctests job #13088 (@LysandreJik) Fix VisualBert Embeddings #13017 (@gchhablani) Proper import for unittest.mock.patch #13085 (@sgugger) Reactive test fecthers on scheduled test with proper git install #13097 (@sgugger) Change a parameter name in FlaxBartForConditionalGeneration.decode() #13074 (@ydshieh) [Flax/JAX] Run jitted tests at every commit #13090 (@patrickvonplaten) Rely on huggingface_hub for common tools #13100 (@sgugger) [FlaxCLIP] allow passing params to image and text feature methods #13099 (@patil-suraj) Ci last fix #13103 (@sgugger) Improve type checker performance #13094 (@bschnurr) Fix VisualBERT docs #13106 (@gchhablani) Fix CircleCI nightly tests #13113 (@sgugger) Create py.typed #12893 (@willfrey) Fix flax gpt2 hidden states #13109 (@ydshieh) Moving fill-mask pipeline to new testing scheme #12943 (@Narsil) Fix omitted lazy import for xlm-prophetnet #13052 (@minwhoo) Fix classifier dropout in bertForMultipleChoice #13129 (@mandelbrot-walker) Fix frameworks table so it's alphabetical #13118 (@osanseviero) [Feature Processing Sequence] Remove duplicated code #13051 (@patrickvonplaten) Ci continue through smi failure #13140 (@LysandreJik) Fix missing <code>seq_len</code> in <code>electra</code> model when <code>inputs_embeds</code> is used. #13128 (@sararb) Optimizes ByT5 tokenizer #13119 (@Narsil) Add splinter #12955 (@oriram) [AutoFeatureExtractor] Fix loading of local folders if config.json exists #13166 (@patrickvonplaten) Fix generation docstrings regarding input_ids=None #12823 (@jvamvas) Update namespaces inside torch.utils.data to the latest. #13167 (@qqaatw) Fix the loss calculation of ProphetNet #13132 (@StevenTang1998) Fix LUKE tests #13183 (@NielsRogge) Add min and max question length options to TapasTokenizer #12803 (@NielsRogge) SageMaker: Fix sagemaker DDP &amp; metric logs #13181 (@philschmid) correcting group beam search function output score bug #13211 (@sourabh112) Change how "additional_special_tokens" argument in the ".from_pretrained" method of the tokenizer is taken into account #13056 (@SaulLu) remove unwanted control-flow code from DeBERTa-V2 #13145 (@kamalkraj) Fix load_tf_weights alias. #13159 (@qqa

Transformers: State-of-the-Art Natural Language Processing