release notes
release notes
Published 8/31/2021
MinorContains breaking changesFour new models are released as part of the LatourLM-v2 implementation: LayoutLMv2ForSequenceClassification, LayoutLMv2Model, LayoutLMv2ForTokenClassification and LayoutLMv2ForQuestionAnswering, in PyTorch.
The LayoutLMV2 model was proposed in LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. LayoutLMV2 improves LayoutLM to obtain state-of-the-art results across several document image understanding benchmarks:
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=layoutlmv2
Three new models are released as part of the BEiT implementation: BeitModel, BeitForMaskedImageModeling, and BeitForImageClassification, in PyTorch.
The BEiT model was proposed in BEiT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong and Furu Wei. Inspired by BERT, BEiT is the first paper that makes self-supervised pre-training of Vision Transformers (ViTs) outperform supervised pre-training. Rather than pre-training the model to predict the class of an image (as done in the original ViT paper), BEiT models are pre-trained to predict visual tokens from the codebook of OpenAI’s DALL-E model given masked patches.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=beit
The Wav2Vec2 and HuBERT models now have a sequence classification head available.
The DeBERTa and DeBERTa-v2 models have been converted from PyTorch to TensorFlow.
EncoderDecoder, DistilBERT, and ALBERT, now have support in Flax!
A new example has been added in TensorFlow: multiple choice! Data collators have become framework agnostic and can now work for both TensorFlow and NumPy on top of PyTorch.
The Auto APIs have been disentangled from all the other mode modules of the Transformers library, so you can now safely import the Auto classes without importing all the models (and maybe getting errors if your setup is not compatible with one specific model). The actual model classes are only imported when needed.
When loading some kinds of corrupted state dictionaries of models, the PreTrainedModel.from_pretrained method was sometimes silently ignoring weights. This has now become a real error.
Pin git python to <3.1.19 #12858 (@patrickvonplaten)
[Sequence Feature Extraction] Add truncation #12804 (@patrickvonplaten)
add classifier_dropout to classification heads #12794 (@PhilipMay)
Add possibility to ignore imports in test_fecther #12801 (@sgugger)
Better heuristic for token-classification pipeline. #12611 (@Narsil)
Seq2SeqTrainer set max_length and num_beams only when non None #12899 (@cchen-dialpad)
[FLAX] Minor fixes in CLM example #12914 (@stefan-it)
Correct validation_split_percentage argument from int (ex:5) to float (0.05) #12897 (@Elysium1436)
Fix typo in the example of MobileBertForPreTraining #12919 (@buddhics)
Print defaults when using --help for scripts #12930 (@sgugger)
Add missing @classmethod decorators #12927 (@willfrey)
fix distiller.py #12910 (@chutaklee)
Fix docstring typo in tokenization_auto.py #12891 (@willfrey)
[Flax] Correctly Add MT5 #12988 (@patrickvonplaten)
ONNX v2 raises an Exception when using PyTorch < 1.8.0 #12933 (@mfuntowicz)
Moving feature-extraction pipeline to new testing scheme #12843 (@Narsil)
Add CpmTokenizerFast #12938 (@JetRunner)
Log Azure ML metrics only for rank 0 #12766 (@harshithapv)
Add multilingual documentation support #12952 (@JetRunner)
Fix division by zero in NotebookProgressPar #12953 (@sgugger)
[FLAX] Minor fixes in LM example #12947 (@stefan-it)
Prevent Trainer.evaluate() crash when using only tensorboardX #12963 (@aphedges)
Place BigBirdTokenizer in sentencepiece-only objects #12975 (@sgugger)
fix typo in example/text-classification README #12974 (@fullyz)
fix Trainer.train(resume_from_checkpoint=False) is causing an exception #12981 (@PhilipMay)
Cast logits from bf16 to fp32 at the end of TF_T5 #12332 (@szutenberg)
Update CANINE test #12453 (@NielsRogge)
pad_to_multiple_of added to DataCollatorForWholeWordMask #12999 (@Aktsvigun)
[Flax] Align jax flax device name #12987 (@patrickvonplaten)
[Flax] Correct flax docs #12782 (@patrickvonplaten)
T5: Create position related tensors directly on device instead of CPU #12846 (@armancohan)
Skip ProphetNet test #12462 (@LysandreJik)
GPT-Neo ONNX export #12911 (@michaelbenayoun)
Update generate method - Fix floor_divide warning #13013 (@nreimers)
[Flax] Correct pt to flax conversion if from base to head #13006 (@patrickvonplaten)
[Flax T5] Speed up t5 training #13012 (@patrickvonplaten)
FX submodule naming fix #13016 (@michaelbenayoun)
T5 with past ONNX export #13014 (@michaelbenayoun)
Fix ONNX test: Put smaller ALBERT model #13028 (@LysandreJik)
Use min version for huggingface-hub dependency #12961 (@lewtun)
tfhub.de -> tfhub.dev #12565 (@abhishekkrthakur)
[Flax] Refactor gpt2 & bert example docs #13024 (@patrickvonplaten)
Add MBART to models exportable with ONNX #13049 (@LysandreJik)
Add to ONNX docs #13048 (@LysandreJik)
Add try-except for torch_scatter #13040 (@JetRunner)
docs: add HuggingArtists to community notebooks #13050 (@AlekseyKorshuk)
Fix ModelOutput instantiation form dictionaries #13067 (@sgugger)
Revert to all tests whil we debug what's wrong #13072 (@sgugger)
Use original key for label in DataCollatorForTokenClassification #13057 (@ibraheem-moosa)
[Doctest] Setup, quicktour and task_summary #13078 (@sgugger)
Add VisualBERT demo notebook #12263 (@gchhablani)
Install git #13091 (@LysandreJik)
Fix classifier dropout in AlbertForMultipleChoice #13087 (@ibraheem-moosa)
Doctests job #13088 (@LysandreJik)
Fix VisualBert Embeddings #13017 (@gchhablani)
Reactive test fecthers on scheduled test with proper git install #13097 (@sgugger)
Change a parameter name in FlaxBartForConditionalGeneration.decode() #13074 (@ydshieh)
[Flax/JAX] Run jitted tests at every commit #13090 (@patrickvonplaten)
[FlaxCLIP] allow passing params to image and text feature methods #13099 (@patil-suraj)
Fix VisualBERT docs #13106 (@gchhablani)
Moving fill-mask pipeline to new testing scheme #12943 (@Narsil)
Fix omitted lazy import for xlm-prophetnet #13052 (@minwhoo)
Fix classifier dropout in bertForMultipleChoice #13129 (@mandelbrot-walker)
Fix frameworks table so it's alphabetical #13118 (@osanseviero)
[Feature Processing Sequence] Remove duplicated code #13051 (@patrickvonplaten)
Ci continue through smi failure #13140 (@LysandreJik)
Fix missing seq_len in electra model when inputs_embeds is used. #13128 (@sararb)
[AutoFeatureExtractor] Fix loading of local folders if config.json exists #13166 (@patrickvonplaten)
Fix generation docstrings regarding input_ids=None #12823 (@jvamvas)
Update namespaces inside torch.utils.data to the latest. #13167 (@qqaatw)
Fix the loss calculation of ProphetNet #13132 (@StevenTang1998)
Fix LUKE tests #13183 (@NielsRogge)
Add min and max question length options to TapasTokenizer #12803 (@NielsRogge)
SageMaker: Fix sagemaker DDP & metric logs #13181 (@philschmid)
correcting group beam search function output score bug #13211 (@sourabh112)
Change how "additional_special_tokens" argument in the ".from_pretrained" method of the tokenizer is taken into account #13056 (@SaulLu)
remove unwanted control-flow code from DeBERTa-V2 #13145 (@kamalkraj)
Add RemBert to AutoTokenizer #13224 (@LysandreJik)
Allow local_files_only for fast pretrained tokenizers #13225 (@BramVanroy)
fix AutoModel.from_pretrained(..., torch_dtype=...) #13209 (@stas00)
Bump notebook from 6.1.5 to 6.4.1 in /examples/research_projects/lxmert #13226 (@dependabot[bot])
Remove side effects of disabling gradient computaiton #13257 (@LysandreJik)
Replace assert statement with if condition and raise ValueError #13263 (@nishprabhu)
Better notification service #13267 (@LysandreJik)
Fix failing Hubert test #13261 (@LysandreJik)
Add CLIP tokenizer to AutoTokenizer #13258 (@LysandreJik)
Some model_types cannot be in the mapping #13259 (@LysandreJik)
Add require flax to MT5 Flax test #13260 (@LysandreJik)
Migrating conversational pipeline tests to new testing format #13114 (@Narsil)
fix tokenizer_class_from_name for models with - in the name #13251 (@stas00)
Add error message concerning revision #13266 (@BramVanroy)
Move image-classification pipeline to new testing #13272 (@Narsil)
[Hotfix] Fixing the test (warnings was incorrect.) #13278 (@Narsil)
Moving question_answering tests to the new testing scheme. Had to tweak a little some ModelTesterConfig for pipelines. #13277 (@Narsil)
Moving summarization pipeline to new testing format. #13279 (@Narsil)
Moving table-question-answering pipeline to new testing. #13280 (@Narsil)
Moving table-question-answering pipeline to new testing #13281 (@Narsil)
Moving text2text-generation to new pipeline testing mecanism #13283 (@Narsil)
Add DINO conversion script #13265 (@NielsRogge)
Moving text-generation pipeline to new testing framework. #13285 (@Narsil)
Moving token-classification pipeline to new testing. #13286 (@Narsil)
examples: add keep_linebreaks option to CLM examples #13150 (@stefan-it)
Moving translation pipeline to new testing scheme. #13297 (@Narsil)
Fix BeitForMaskedImageModeling #13275 (@NielsRogge)
Moving zero-shot-classification pipeline to new testing. #13299 (@Narsil)
Fixing mbart50 with return_tensors argument too. #13301 (@Narsil)
[Flax] Correct all return tensors to numpy #13307 (@patrickvonplaten)
examples: only use keep_linebreaks when reading TXT files #13320 (@stefan-it)
Slow tests - run rag token in half precision #13304 (@patrickvonplaten)
[Slow tests] Disable Wav2Vec2 pretraining test for now #13303 (@patrickvonplaten)
Announcing the default model used by the pipeline (with a link). #13276 (@Narsil)
use float 16 in causal mask and masked bias #13194 (@hwijeen)
Improve documentation of pooler_output in ModelOutput #13228 (@navjotts)
neptune.ai logger: add ability to connect to a neptune.ai run #13319 (@fcakyon)
Update label2id in the model config for run_glue #13334 (@sgugger)
Fall back to observed_batch_size when the dataloader does not know the batch_size. #13188 (@mbforbes)
Fixes #12941 where use_auth_token not been set up early enough #13205 (@bennimmo)
Correct wrong function signatures on the docs website #13198 (@qqaatw)
Add missing module spec #13321 (@laurahanu)
Use DS callable API to allow hf_scheduler + ds_optimizer #13216 (@tjruwase)
[Testing] Add Flax Tests on GPU, Add Speech and Vision to Flax & TF tests #13313 (@patrickvonplaten)
Fixing a typo in the data_collator documentation #13309 (@Serhiy-Shekhovtsov)
Add GPT2ForTokenClassification #13290 (@tucan9389)
Doc mismatch fixed #13345 (@Apoorvgarg-creator)
Handle nested dict/lists of tensors as inputs in the Trainer #13338 (@sgugger)
Set missing seq_length variable when using inputs_embeds with ALBERT & Remove code duplication #13152 (@olenmg)
TF CLM example fix typo #13002 (@Rocketknight1)
Add generate kwargs to Seq2SeqTrainingArguments #13339 (@sgugger)
release notes
Published 8/31/2021
MinorContains breaking changesFour new models are released as part of the LatourLM-v2 implementation: LayoutLMv2ForSequenceClassification, LayoutLMv2Model, LayoutLMv2ForTokenClassification and LayoutLMv2ForQuestionAnswering, in PyTorch.
The LayoutLMV2 model was proposed in LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. LayoutLMV2 improves LayoutLM to obtain state-of-the-art results across several document image understanding benchmarks:
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=layoutlmv2
Three new models are released as part of the BEiT implementation: BeitModel, BeitForMaskedImageModeling, and BeitForImageClassification, in PyTorch.
The BEiT model was proposed in BEiT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong and Furu Wei. Inspired by BERT, BEiT is the first paper that makes self-supervised pre-training of Vision Transformers (ViTs) outperform supervised pre-training. Rather than pre-training the model to predict the class of an image (as done in the original ViT paper), BEiT models are pre-trained to predict visual tokens from the codebook of OpenAI’s DALL-E model given masked patches.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=beit
The Wav2Vec2 and HuBERT models now have a sequence classification head available.
The DeBERTa and DeBERTa-v2 models have been converted from PyTorch to TensorFlow.
EncoderDecoder, DistilBERT, and ALBERT, now have support in Flax!
A new example has been added in TensorFlow: multiple choice! Data collators have become framework agnostic and can now work for both TensorFlow and NumPy on top of PyTorch.
The Auto APIs have been disentangled from all the other mode modules of the Transformers library, so you can now safely import the Auto classes without importing all the models (and maybe getting errors if your setup is not compatible with one specific model). The actual model classes are only imported when needed.
When loading some kinds of corrupted state dictionaries of models, the PreTrainedModel.from_pretrained method was sometimes silently ignoring weights. This has now become a real error.
Pin git python to <3.1.19 #12858 (@patrickvonplaten)
[Sequence Feature Extraction] Add truncation #12804 (@patrickvonplaten)
add classifier_dropout to classification heads #12794 (@PhilipMay)
Add possibility to ignore imports in test_fecther #12801 (@sgugger)
Better heuristic for token-classification pipeline. #12611 (@Narsil)
Seq2SeqTrainer set max_length and num_beams only when non None #12899 (@cchen-dialpad)
[FLAX] Minor fixes in CLM example #12914 (@stefan-it)
Correct validation_split_percentage argument from int (ex:5) to float (0.05) #12897 (@Elysium1436)
Fix typo in the example of MobileBertForPreTraining #12919 (@buddhics)
Print defaults when using --help for scripts #12930 (@sgugger)
Add missing @classmethod decorators #12927 (@willfrey)
fix distiller.py #12910 (@chutaklee)
Fix docstring typo in tokenization_auto.py #12891 (@willfrey)
[Flax] Correctly Add MT5 #12988 (@patrickvonplaten)
ONNX v2 raises an Exception when using PyTorch < 1.8.0 #12933 (@mfuntowicz)
Moving feature-extraction pipeline to new testing scheme #12843 (@Narsil)
Add CpmTokenizerFast #12938 (@JetRunner)
Log Azure ML metrics only for rank 0 #12766 (@harshithapv)
Add multilingual documentation support #12952 (@JetRunner)
Fix division by zero in NotebookProgressPar #12953 (@sgugger)
[FLAX] Minor fixes in LM example #12947 (@stefan-it)
Prevent Trainer.evaluate() crash when using only tensorboardX #12963 (@aphedges)
Place BigBirdTokenizer in sentencepiece-only objects #12975 (@sgugger)
fix typo in example/text-classification README #12974 (@fullyz)
fix Trainer.train(resume_from_checkpoint=False) is causing an exception #12981 (@PhilipMay)
Cast logits from bf16 to fp32 at the end of TF_T5 #12332 (@szutenberg)
Update CANINE test #12453 (@NielsRogge)
pad_to_multiple_of added to DataCollatorForWholeWordMask #12999 (@Aktsvigun)
[Flax] Align jax flax device name #12987 (@patrickvonplaten)
[Flax] Correct flax docs #12782 (@patrickvonplaten)
T5: Create position related tensors directly on device instead of CPU #12846 (@armancohan)
Skip ProphetNet test #12462 (@LysandreJik)
GPT-Neo ONNX export #12911 (@michaelbenayoun)
Update generate method - Fix floor_divide warning #13013 (@nreimers)
[Flax] Correct pt to flax conversion if from base to head #13006 (@patrickvonplaten)
[Flax T5] Speed up t5 training #13012 (@patrickvonplaten)
FX submodule naming fix #13016 (@michaelbenayoun)
T5 with past ONNX export #13014 (@michaelbenayoun)
Fix ONNX test: Put smaller ALBERT model #13028 (@LysandreJik)
Use min version for huggingface-hub dependency #12961 (@lewtun)
tfhub.de -> tfhub.dev #12565 (@abhishekkrthakur)
[Flax] Refactor gpt2 & bert example docs #13024 (@patrickvonplaten)
Add MBART to models exportable with ONNX #13049 (@LysandreJik)
Add to ONNX docs #13048 (@LysandreJik)
Add try-except for torch_scatter #13040 (@JetRunner)
docs: add HuggingArtists to community notebooks #13050 (@AlekseyKorshuk)
Fix ModelOutput instantiation form dictionaries #13067 (@sgugger)
Revert to all tests whil we debug what's wrong #13072 (@sgugger)
Use original key for label in DataCollatorForTokenClassification #13057 (@ibraheem-moosa)
[Doctest] Setup, quicktour and task_summary #13078 (@sgugger)
Add VisualBERT demo notebook #12263 (@gchhablani)
Install git #13091 (@LysandreJik)
Fix classifier dropout in AlbertForMultipleChoice #13087 (@ibraheem-moosa)
Doctests job #13088 (@LysandreJik)
Fix VisualBert Embeddings #13017 (@gchhablani)
Reactive test fecthers on scheduled test with proper git install #13097 (@sgugger)
Change a parameter name in FlaxBartForConditionalGeneration.decode() #13074 (@ydshieh)
[Flax/JAX] Run jitted tests at every commit #13090 (@patrickvonplaten)
[FlaxCLIP] allow passing params to image and text feature methods #13099 (@patil-suraj)
Fix VisualBERT docs #13106 (@gchhablani)
Moving fill-mask pipeline to new testing scheme #12943 (@Narsil)
Fix omitted lazy import for xlm-prophetnet #13052 (@minwhoo)
Fix classifier dropout in bertForMultipleChoice #13129 (@mandelbrot-walker)
Fix frameworks table so it's alphabetical #13118 (@osanseviero)
[Feature Processing Sequence] Remove duplicated code #13051 (@patrickvonplaten)
Ci continue through smi failure #13140 (@LysandreJik)
Fix missing seq_len in electra model when inputs_embeds is used. #13128 (@sararb)
[AutoFeatureExtractor] Fix loading of local folders if config.json exists #13166 (@patrickvonplaten)
Fix generation docstrings regarding input_ids=None #12823 (@jvamvas)
Update namespaces inside torch.utils.data to the latest. #13167 (@qqaatw)
Fix the loss calculation of ProphetNet #13132 (@StevenTang1998)
Fix LUKE tests #13183 (@NielsRogge)
Add min and max question length options to TapasTokenizer #12803 (@NielsRogge)
SageMaker: Fix sagemaker DDP & metric logs #13181 (@philschmid)
correcting group beam search function output score bug #13211 (@sourabh112)
Change how "additional_special_tokens" argument in the ".from_pretrained" method of the tokenizer is taken into account #13056 (@SaulLu)
remove unwanted control-flow code from DeBERTa-V2 #13145 (@kamalkraj)
Add RemBert to AutoTokenizer #13224 (@LysandreJik)
Allow local_files_only for fast pretrained tokenizers #13225 (@BramVanroy)
fix AutoModel.from_pretrained(..., torch_dtype=...) #13209 (@stas00)
Bump notebook from 6.1.5 to 6.4.1 in /examples/research_projects/lxmert #13226 (@dependabot[bot])
Remove side effects of disabling gradient computaiton #13257 (@LysandreJik)
Replace assert statement with if condition and raise ValueError #13263 (@nishprabhu)
Better notification service #13267 (@LysandreJik)
Fix failing Hubert test #13261 (@LysandreJik)
Add CLIP tokenizer to AutoTokenizer #13258 (@LysandreJik)
Some model_types cannot be in the mapping #13259 (@LysandreJik)
Add require flax to MT5 Flax test #13260 (@LysandreJik)
Migrating conversational pipeline tests to new testing format #13114 (@Narsil)
fix tokenizer_class_from_name for models with - in the name #13251 (@stas00)
Add error message concerning revision #13266 (@BramVanroy)
Move image-classification pipeline to new testing #13272 (@Narsil)
[Hotfix] Fixing the test (warnings was incorrect.) #13278 (@Narsil)
Moving question_answering tests to the new testing scheme. Had to tweak a little some ModelTesterConfig for pipelines. #13277 (@Narsil)
Moving summarization pipeline to new testing format. #13279 (@Narsil)
Moving table-question-answering pipeline to new testing. #13280 (@Narsil)
Moving table-question-answering pipeline to new testing #13281 (@Narsil)
Moving text2text-generation to new pipeline testing mecanism #13283 (@Narsil)
Add DINO conversion script #13265 (@NielsRogge)
Moving text-generation pipeline to new testing framework. #13285 (@Narsil)
Moving token-classification pipeline to new testing. #13286 (@Narsil)
examples: add keep_linebreaks option to CLM examples #13150 (@stefan-it)
Moving translation pipeline to new testing scheme. #13297 (@Narsil)
Fix BeitForMaskedImageModeling #13275 (@NielsRogge)
Moving zero-shot-classification pipeline to new testing. #13299 (@Narsil)
Fixing mbart50 with return_tensors argument too. #13301 (@Narsil)
[Flax] Correct all return tensors to numpy #13307 (@patrickvonplaten)
examples: only use keep_linebreaks when reading TXT files #13320 (@stefan-it)
Slow tests - run rag token in half precision #13304 (@patrickvonplaten)
[Slow tests] Disable Wav2Vec2 pretraining test for now #13303 (@patrickvonplaten)
Announcing the default model used by the pipeline (with a link). #13276 (@Narsil)
use float 16 in causal mask and masked bias #13194 (@hwijeen)
Improve documentation of pooler_output in ModelOutput #13228 (@navjotts)
neptune.ai logger: add ability to connect to a neptune.ai run #13319 (@fcakyon)
Update label2id in the model config for run_glue #13334 (@sgugger)
Fall back to observed_batch_size when the dataloader does not know the batch_size. #13188 (@mbforbes)
Fixes #12941 where use_auth_token not been set up early enough #13205 (@bennimmo)
Correct wrong function signatures on the docs website #13198 (@qqaatw)
Add missing module spec #13321 (@laurahanu)
Use DS callable API to allow hf_scheduler + ds_optimizer #13216 (@tjruwase)
[Testing] Add Flax Tests on GPU, Add Speech and Vision to Flax & TF tests #13313 (@patrickvonplaten)
Fixing a typo in the data_collator documentation #13309 (@Serhiy-Shekhovtsov)
Add GPT2ForTokenClassification #13290 (@tucan9389)
Doc mismatch fixed #13345 (@Apoorvgarg-creator)
Handle nested dict/lists of tensors as inputs in the Trainer #13338 (@sgugger)
Set missing seq_length variable when using inputs_embeds with ALBERT & Remove code duplication #13152 (@olenmg)
TF CLM example fix typo #13002 (@Rocketknight1)
Add generate kwargs to Seq2SeqTrainingArguments #13339 (@sgugger)
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.