v4.3.0.rc1

release notes

Published 2/4/2021

Contains breaking changes

Release notes

Wav2Vec2 from facebook (@patrickvonplaten)

Two new models are released as part of the Wav2Vec2 implementation: Wav2Vec2Model and Wav2Vec2ForMaskedLM, in PyTorch.

Wav2Vec2 is a multi-modal model, combining speech and text. It's the first multi-modal model of its kind we welcome in Transformers.

The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=wav2vec2

Available notebooks:

https://colab.research.google.com/drive/18Ms6WjyjpsL-73Y2Vpagh9WdJvVEIoQL?usp=sharing

Contributions:

Wav2Vec2 #9659 (@patrickvonplaten)

Future Additions

Enable fine-tuning and pretraining for Wav2Vec2
Add example script with dependency to wav2letter/flashlight
Add Encoder-Decoder Wav2Vec2 model

ConvBERT

The ConvBERT model was proposed in ConvBERT: Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.

Six new models are released as part of the ConvBERT implementation: ConvBertModel, ConvBertForMaskedLM, ConvBertForSequenceClassification, ConvBertForTokenClassification, ConvBertForQuestionAnswering and ConvBertForMultipleChoice. These models are available both in PyTorch and TensorFlow.

Contributions:

ConvBERT Model #9717 (@abhishekkrthakur)
ConvBERT: minor fixes for conversion script #9937 (@stefan-it)
Fix GroupedLinearLayer in TF ConvBERT #9972 (@abhishekkrthakur)

BORT

The BORT model was proposed in Optimal Subarchitecture Extraction for BERT by Amazon's Adrian de Wynter and Daniel J. Perry. It is an optimal subset of architectural parameters for the BERT, which the authors refer to as “Bort”.

The BORT model can be loaded directly in the BERT architecture, therefore all BERT model heads are available for BORT.

Contributions:

ADD BORT #9813 (@stefan-it)

Trainer now supports Amazon SageMaker’s data parallel library (@sgugger)

When executing a script with Trainer using Amazon SageMaker and enabling SageMaker's data parallelism library, Trainer will automatically use the smdistributed library. All maintained examples have been tested with this functionality. Here is an overview of SageMaker data parallelism library.

When on SageMaker use their env variables for saves #9876 (@sgugger)

Community page

A new Community Page has been added to the docs. These contain all the notebooks contributed by the community, as well as some community projects built around Transformers. Feel free to open a PR if you want your project to be showcased!

Add a community page to the docs #9682 (@sgugger)

Additional model architectures

DeBERTa now has more model heads available.

Add DeBERTa head models #9691 (@NielsRogge)

BART, mBART, Marian, Pegasus and Blenderbot now have decoder-only model architectures. They can therefore be used in decoder-only settings.

BartForCausalLM analogs to ProphetNetForCausalLM #9128 (@sadakmed)

Breaking changes

None.

General improvements and bugfixes

Fix Trainer with a parallel model #9578 (@sgugger)
Switch metrics in run_ner to datasets #9567 (@sgugger)
Compliancy with tf-nightly #9570 (@jplu)
Make logs TF compliant #9565 (@jplu)
[setup.py] note on how to get to transformers exact dependencies from shell #9553 (@stas00)
Fix conda build #9589 (@LysandreJik)
BatchEncoding.to with device with tests #9584 (@LysandreJik)
Gradient accumulation for TFTrainer #9585 (@kiyoungkim1)
Upstream (and rename) sortish sampler #9574 (@sgugger)
[deepspeed doc] install issues + 1-gpu deployment #9582 (@stas00)
[TF Led] Fix wrong decoder attention mask behavior #9601 (@patrickvonplaten)
Remove unused token_type_ids in MPNet #9564 (@jplu)
Ignore lm_head decoder bias warning #9615 (@LysandreJik)
[deepspeed] --gradient_accumulation_steps fix #9622 (@stas00)
Remove duplicated extras["retrieval"] #9621 (@n1t0)
Fix: torch.utils.checkpoint.checkpoint attribute error. #9626 (@devrimcavusoglu)
Add head_mask/decoder_head_mask for BART #9569 (@stancld)
[Bart-like tests] Fix torch device for bart tests #9669 (@patrickvonplaten)
Fix DPRReaderTokenizer's attention_mask #9663 (@mkserge)
add mbart to automodel for masked lm #9673 (@patrickvonplaten)
Fix imports in conversion scripts #9674 (@sgugger)
Fix GPT conversion script #9676 (@sgugger)
Fix old Seq2SeqTrainer #9675 (@sgugger)
Update past_key_values in GPT-2 #9596 (@forest1988)
Update integrations.py #9652 (@max-yue)
Fix TF Flaubert and XLM #9661 (@jplu)
New run_seq2seq script #9605 (@sgugger)
Add separated decoder_head_mask for T5 Models #9634 (@stancld)
Fix model templates and use less than 119 chars #9684 (@sgugger)
Restrain tokenizer.model_max_length default #9681 (@sgugger)
Speed up RepetitionPenaltyLogitsProcessor (pytorch) #9600 (@LSinev)
Use datasets squad_v2 metric in run_qa #9677 (@sgugger)
Fix label datatype in TF Trainer #9616 (@jplu)
New TF embeddings (cleaner and faster) #9418 (@jplu)
Fix TF template #9697 (@jplu)
Add t5 convert to transformers-cli #9654 (@acul3)
Fix Funnel Transformer conversion script #9683 (@sgugger)
Add notebook #9696 (@NielsRogge)
Fix Trainer and Args to mention AdamW, not Adam. #9685 (@gchhablani)
[deepspeed] fix the backward for deepspeed #9705 (@stas00)
Fix WAND_DISABLED test #9703 (@sgugger)
[trainer] no --deepspeed and --sharded_ddp together #9712 (@stas00)
fix typo #9708 (@Muennighoff)
Temporarily deactivate TPU tests while we work on fixing them #9720 (@LysandreJik)
Allow text generation for ProphetNetForCausalLM #9707 (@guillaume-be)
[LED] Reduce Slow Test required GPU RAM from 16GB to 8GB #9723 (@patrickvonplaten)
[T5] Fix T5 model parallel tests #9721 (@patrickvonplaten)
fix T5 head mask in model_parallel #9726 (@patil-suraj)
Fix mixed precision in TF models #9163 (@jplu)
Changing model default for TableQuestionAnsweringPipeline. #9729 (@Narsil)
Fix TF s2s models #9478 (@jplu)
Fix memory regression in Seq2Seq example #9713 (@sgugger)
examples: fix XNLI url #9741 (@stefan-it)
Fix some TF slow tests #9728 (@jplu)
Fixes to run_seq2seq and instructions #9734 (@sgugger)
Add report_to training arguments to control the integrations used #9735 (@sgugger)
Fix a TF test #9755 (@jplu)
[fsmt] token_type_ids isn't used #9736 (@stas00)
Fix broken [Open in Colab] links (#9688) #9761 (@wilcoln)
Fix TFTrainer prediction output #9662 (@janinaj)
Use object store to pass trainer object to Ray Tune (makes it work with large models) #9749 (@krfricke)
Fix a typo in Trainer.hyperparameter_search docstring #9762 (@sorami)
[fsmt] onnx triu workaround #9738 (@stas00)
Fix model parallel definition in superclass #9787 (@LysandreJik)
Auto-resume training from checkpoint #9776 (@sgugger)
[PR/Issue templates] normalize, group, sort + add myself for deepspeed #9706 (@stas00)
[Flaky Generation Tests] Make sure that no early stopping is happening for beam search #9794 (@patrickvonplaten)
Fix broken links in the converting tf ckpt document #9791 (@forest1988)
Add head_mask/decoder_head_mask for TF BART models #9639 (@stancld)
Adding skip_special_tokens=True to FillMaskPipeline #9783 (@Narsil)
Improve pytorch examples for fp16 #9796 (@ak314)
Smdistributed trainer #9798 (@sgugger)
RagTokenForGeneration: Fixed parameter name for logits_processor #9790 (@michaelrglass)
Fix fine-tuning translation scripts #9809 (@mbiesialska)
Allow RAG to output decoder cross-attentions #9789 (@dblakely)
Commit the last step on world_process_zero in WandbCallback #9805 (@tristandeleu)
Fix a bug in run_glue.py (#9812) #9815 (@forest1988)
[LedFastTokenizer] Correct missing None statement #9828 (@patrickvonplaten)
[Setup.py] update jaxlib #9831 (@patrickvonplaten)
Add a test for TF mixed precision #9806 (@jplu)
Setup logging with a stdout handler #9816 (@sgugger)
Fix auto-resume training from checkpoint #9822 (@jncasey)
[MT5 Import init] Fix typo #9830 (@patrickvonplaten)
Adding a test to prevent late failure in the Table question answering pipeline. #9808 (@Narsil)
Remove a TF usage warning and rework the documentation #9756 (@jplu)
Delete a needless duplicate condition #9826 (@tomohideshibata)
Clean TF Bert #9788 (@jplu)
Add a flag for find_unused_parameters #9820 (@sgugger)
Fix TF template #9840 (@jplu)
Fix model templates #9842 (@LysandreJik)
Add tpu_zone and gcp_project in training_args_tf.py #9825 (@kiyoungkim1)
Labeled pull requests #9849 (@LysandreJik)
[GA forks] Test on every push #9851 (@LysandreJik)
When resuming training from checkpoint, Trainer loads model #9818 (@sgugger)
Allow --arg Value for booleans in HfArgumentParser #9823 (@sgugger)
[traner] fix --lr_scheduler_type choices #9800 (@stas00)
Pin memory in Trainer by default #9857 (@abhishekkrthakur)
Partial local tokenizer load #9807 (@LysandreJik)
Remove submodule #9868 (@LysandreJik)
Fixing flaky conversational test + flag it as a pipeline test. #9837 (@Narsil)
Fix computation of attention_probs when head_mask is provided. #9853 (@mfuntowicz)
Deprecate model_path in Trainer.train #9854 (@sgugger)
Remove redundant test_head_masking = True flags in test files #9858 (@stancld)
[docs] expand install instructions #9817 (@stas00)
on_log event should occur after the current log is written #9872 (@abhishekkrthakur)
pin_memory -> dataloader_pin_memory #9874 (@abhishekkrthakur)
Adding a new return_full_text parameter to TextGenerationPipeline. #9852 (@Narsil)
Clarify use of unk_token in slow tokenizers' docstrings #9875 (@ethch18)
Add XLA test #9848 (@jplu)
[seq2seq] correctly handle mt5 #9879 (@stas00)
[trainer] [deepspeed] refactor deepspeed setup devices #9880 (@stas00)
[doc] nested markup is invalid in rst #9898 (@stas00)
Clarify definition of seed argument in TrainingArguments #9903 (@lewtun)
TFBart lables consider both pad token and -100 #9847 (@kiyoungkim1)
Add head_mask and decoder_head_mask to FSMT #9819 (@stancld)
Doc title in the template #9910 (@sgugger)
[seq2seq] fix logger format for non-main process #9911 (@stas00)
[wandb] restore WANDB_DISABLED=true to disable wandb #9896 (@stas00)
Fit chinese wwm to new datasets #9887 (@wlhgtc)
Remove subclass for sortish sampler #9907 (@sgugger)
AdaFactor: avoid updating group["lr"] attributes #9751 (@ceshine)
[docs] fix auto model docs #9924 (@patil-suraj)
Add new model docs #9667 (@patrickvonplaten)
Fix bart conversion script #9923 (@patil-suraj)
Tensorflow doc changes on loss output size #9922 (@janjitse)
[Tokenizer Utils Base] Make pad function more flexible #9928 (@patrickvonplaten)
[Bart models] fix typo in naming #9944 (@patrickvonplaten)
ALBERT Tokenizer integration test #9943 (@LysandreJik)
Fix 9918 #9932 (@sgugger)
Bump numpy #9934 (@sgugger)
Use compute_loss in prediction_step #9935 (@sgugger)
Add head_mask and decoder_head_mask to PyTorch LED #9856 (@stancld)
[research proj] [lxmert] remove bleach dependency #9970 (@stas00)
Fix Longformer and LED #9942 (@jplu)
fix steps_in_epoch variable in trainer when using max_steps #9969 (@yylun)
[run_clm.py] fix getting extention #9977 (@patil-suraj)
Added integration tests for TensorFlow implementation of the ALBERT model #9976 (@spatil6)
TF DistilBERT integration tests #9975 (@spatil6)
Added integration tests for TensorFlow implementation of the mobileBERT #9978 (@spatil6)
Added integration tests for TensorFlow implementation of the MPNet model #9979 (@spatil6)
Added integration tests for Pytorch implementation of the ALBERT model #9980 (@spatil6)
distilbert: fix creation of sinusoidal embeddings #9917 (@stefan-it)
Add from_slow in fast tokenizers build and fixes some bugs #9987 (@sgugger)
Added Integration testing for Pytorch implementation of DistilBert model from issue #9948' #9995 (@danielpatrickhug)
Fix model templates #9999 (@LysandreJik)
[Proposal] Adding new encoder_no_repeat_ngram_size to generate. #9984 (@Narsil)
Remove unintentional "double" assignment in TF-BART like models #9997 (@stancld)
[trainer] a few fixes #9993 (@stas00)

Latest release

Version v5.5.0is out. See relase notes.

v4.3.0.rc1

release notes

Published 2/4/2021

Contains breaking changes

Release notes

Wav2Vec2 from facebook (@patrickvonplaten)

Two new models are released as part of the Wav2Vec2 implementation: Wav2Vec2Model and Wav2Vec2ForMaskedLM, in PyTorch.

Wav2Vec2 is a multi-modal model, combining speech and text. It's the first multi-modal model of its kind we welcome in Transformers.

The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=wav2vec2

Available notebooks:

https://colab.research.google.com/drive/18Ms6WjyjpsL-73Y2Vpagh9WdJvVEIoQL?usp=sharing

Contributions:

Wav2Vec2 #9659 (@patrickvonplaten)

Future Additions

Enable fine-tuning and pretraining for Wav2Vec2
Add example script with dependency to wav2letter/flashlight
Add Encoder-Decoder Wav2Vec2 model

ConvBERT

The ConvBERT model was proposed in ConvBERT: Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.

Contributions:

ConvBERT Model #9717 (@abhishekkrthakur)
ConvBERT: minor fixes for conversion script #9937 (@stefan-it)
Fix GroupedLinearLayer in TF ConvBERT #9972 (@abhishekkrthakur)

BORT

The BORT model can be loaded directly in the BERT architecture, therefore all BERT model heads are available for BORT.

Contributions:

ADD BORT #9813 (@stefan-it)

Trainer now supports Amazon SageMaker’s data parallel library (@sgugger)

When on SageMaker use their env variables for saves #9876 (@sgugger)

Community page

Add a community page to the docs #9682 (@sgugger)

Additional model architectures

DeBERTa now has more model heads available.

Add DeBERTa head models #9691 (@NielsRogge)

BART, mBART, Marian, Pegasus and Blenderbot now have decoder-only model architectures. They can therefore be used in decoder-only settings.

BartForCausalLM analogs to ProphetNetForCausalLM #9128 (@sadakmed)

Breaking changes

None.

General improvements and bugfixes

Fix Trainer with a parallel model #9578 (@sgugger)
Switch metrics in run_ner to datasets #9567 (@sgugger)
Compliancy with tf-nightly #9570 (@jplu)
Make logs TF compliant #9565 (@jplu)
[setup.py] note on how to get to transformers exact dependencies from shell #9553 (@stas00)
Fix conda build #9589 (@LysandreJik)
BatchEncoding.to with device with tests #9584 (@LysandreJik)
Gradient accumulation for TFTrainer #9585 (@kiyoungkim1)
Upstream (and rename) sortish sampler #9574 (@sgugger)
[deepspeed doc] install issues + 1-gpu deployment #9582 (@stas00)
[TF Led] Fix wrong decoder attention mask behavior #9601 (@patrickvonplaten)
Remove unused token_type_ids in MPNet #9564 (@jplu)
Ignore lm_head decoder bias warning #9615 (@LysandreJik)
[deepspeed] --gradient_accumulation_steps fix #9622 (@stas00)
Remove duplicated extras["retrieval"] #9621 (@n1t0)
Fix: torch.utils.checkpoint.checkpoint attribute error. #9626 (@devrimcavusoglu)
Add head_mask/decoder_head_mask for BART #9569 (@stancld)
[Bart-like tests] Fix torch device for bart tests #9669 (@patrickvonplaten)
Fix DPRReaderTokenizer's attention_mask #9663 (@mkserge)
add mbart to automodel for masked lm #9673 (@patrickvonplaten)
Fix imports in conversion scripts #9674 (@sgugger)
Fix GPT conversion script #9676 (@sgugger)
Fix old Seq2SeqTrainer #9675 (@sgugger)
Update past_key_values in GPT-2 #9596 (@forest1988)
Update integrations.py #9652 (@max-yue)
Fix TF Flaubert and XLM #9661 (@jplu)
New run_seq2seq script #9605 (@sgugger)
Add separated decoder_head_mask for T5 Models #9634 (@stancld)
Fix model templates and use less than 119 chars #9684 (@sgugger)
Restrain tokenizer.model_max_length default #9681 (@sgugger)
Speed up RepetitionPenaltyLogitsProcessor (pytorch) #9600 (@LSinev)
Use datasets squad_v2 metric in run_qa #9677 (@sgugger)
Fix label datatype in TF Trainer #9616 (@jplu)
New TF embeddings (cleaner and faster) #9418 (@jplu)
Fix TF template #9697 (@jplu)
Add t5 convert to transformers-cli #9654 (@acul3)
Fix Funnel Transformer conversion script #9683 (@sgugger)
Add notebook #9696 (@NielsRogge)
Fix Trainer and Args to mention AdamW, not Adam. #9685 (@gchhablani)
[deepspeed] fix the backward for deepspeed #9705 (@stas00)
Fix WAND_DISABLED test #9703 (@sgugger)
[trainer] no --deepspeed and --sharded_ddp together #9712 (@stas00)
fix typo #9708 (@Muennighoff)
Temporarily deactivate TPU tests while we work on fixing them #9720 (@LysandreJik)
Allow text generation for ProphetNetForCausalLM #9707 (@guillaume-be)
[LED] Reduce Slow Test required GPU RAM from 16GB to 8GB #9723 (@patrickvonplaten)
[T5] Fix T5 model parallel tests #9721 (@patrickvonplaten)
fix T5 head mask in model_parallel #9726 (@patil-suraj)
Fix mixed precision in TF models #9163 (@jplu)
Changing model default for TableQuestionAnsweringPipeline. #9729 (@Narsil)
Fix TF s2s models #9478 (@jplu)
Fix memory regression in Seq2Seq example #9713 (@sgugger)
examples: fix XNLI url #9741 (@stefan-it)
Fix some TF slow tests #9728 (@jplu)
Fixes to run_seq2seq and instructions #9734 (@sgugger)
Add report_to training arguments to control the integrations used #9735 (@sgugger)
Fix a TF test #9755 (@jplu)
[fsmt] token_type_ids isn't used #9736 (@stas00)
Fix broken [Open in Colab] links (#9688) #9761 (@wilcoln)
Fix TFTrainer prediction output #9662 (@janinaj)
Use object store to pass trainer object to Ray Tune (makes it work with large models) #9749 (@krfricke)
Fix a typo in Trainer.hyperparameter_search docstring #9762 (@sorami)
[fsmt] onnx triu workaround #9738 (@stas00)
Fix model parallel definition in superclass #9787 (@LysandreJik)
Auto-resume training from checkpoint #9776 (@sgugger)
[PR/Issue templates] normalize, group, sort + add myself for deepspeed #9706 (@stas00)
[Flaky Generation Tests] Make sure that no early stopping is happening for beam search #9794 (@patrickvonplaten)
Fix broken links in the converting tf ckpt document #9791 (@forest1988)
Add head_mask/decoder_head_mask for TF BART models #9639 (@stancld)
Adding skip_special_tokens=True to FillMaskPipeline #9783 (@Narsil)
Improve pytorch examples for fp16 #9796 (@ak314)
Smdistributed trainer #9798 (@sgugger)
RagTokenForGeneration: Fixed parameter name for logits_processor #9790 (@michaelrglass)
Fix fine-tuning translation scripts #9809 (@mbiesialska)
Allow RAG to output decoder cross-attentions #9789 (@dblakely)
Commit the last step on world_process_zero in WandbCallback #9805 (@tristandeleu)
Fix a bug in run_glue.py (#9812) #9815 (@forest1988)
[LedFastTokenizer] Correct missing None statement #9828 (@patrickvonplaten)
[Setup.py] update jaxlib #9831 (@patrickvonplaten)
Add a test for TF mixed precision #9806 (@jplu)
Setup logging with a stdout handler #9816 (@sgugger)
Fix auto-resume training from checkpoint #9822 (@jncasey)
[MT5 Import init] Fix typo #9830 (@patrickvonplaten)
Adding a test to prevent late failure in the Table question answering pipeline. #9808 (@Narsil)
Remove a TF usage warning and rework the documentation #9756 (@jplu)
Delete a needless duplicate condition #9826 (@tomohideshibata)
Clean TF Bert #9788 (@jplu)
Add a flag for find_unused_parameters #9820 (@sgugger)
Fix TF template #9840 (@jplu)
Fix model templates #9842 (@LysandreJik)
Add tpu_zone and gcp_project in training_args_tf.py #9825 (@kiyoungkim1)
Labeled pull requests #9849 (@LysandreJik)
[GA forks] Test on every push #9851 (@LysandreJik)
When resuming training from checkpoint, Trainer loads model #9818 (@sgugger)
Allow --arg Value for booleans in HfArgumentParser #9823 (@sgugger)
[traner] fix --lr_scheduler_type choices #9800 (@stas00)
Pin memory in Trainer by default #9857 (@abhishekkrthakur)
Partial local tokenizer load #9807 (@LysandreJik)
Remove submodule #9868 (@LysandreJik)
Fixing flaky conversational test + flag it as a pipeline test. #9837 (@Narsil)
Fix computation of attention_probs when head_mask is provided. #9853 (@mfuntowicz)
Deprecate model_path in Trainer.train #9854 (@sgugger)
Remove redundant test_head_masking = True flags in test files #9858 (@stancld)
[docs] expand install instructions #9817 (@stas00)
on_log event should occur after the current log is written #9872 (@abhishekkrthakur)
pin_memory -> dataloader_pin_memory #9874 (@abhishekkrthakur)
Adding a new return_full_text parameter to TextGenerationPipeline. #9852 (@Narsil)
Clarify use of unk_token in slow tokenizers' docstrings #9875 (@ethch18)
Add XLA test #9848 (@jplu)
[seq2seq] correctly handle mt5 #9879 (@stas00)
[trainer] [deepspeed] refactor deepspeed setup devices #9880 (@stas00)
[doc] nested markup is invalid in rst #9898 (@stas00)
Clarify definition of seed argument in TrainingArguments #9903 (@lewtun)
TFBart lables consider both pad token and -100 #9847 (@kiyoungkim1)
Add head_mask and decoder_head_mask to FSMT #9819 (@stancld)
Doc title in the template #9910 (@sgugger)
[seq2seq] fix logger format for non-main process #9911 (@stas00)
[wandb] restore WANDB_DISABLED=true to disable wandb #9896 (@stas00)
Fit chinese wwm to new datasets #9887 (@wlhgtc)
Remove subclass for sortish sampler #9907 (@sgugger)
AdaFactor: avoid updating group["lr"] attributes #9751 (@ceshine)
[docs] fix auto model docs #9924 (@patil-suraj)
Add new model docs #9667 (@patrickvonplaten)
Fix bart conversion script #9923 (@patil-suraj)
Tensorflow doc changes on loss output size #9922 (@janjitse)
[Tokenizer Utils Base] Make pad function more flexible #9928 (@patrickvonplaten)
[Bart models] fix typo in naming #9944 (@patrickvonplaten)
ALBERT Tokenizer integration test #9943 (@LysandreJik)
Fix 9918 #9932 (@sgugger)
Bump numpy #9934 (@sgugger)
Use compute_loss in prediction_step #9935 (@sgugger)
Add head_mask and decoder_head_mask to PyTorch LED #9856 (@stancld)
[research proj] [lxmert] remove bleach dependency #9970 (@stas00)
Fix Longformer and LED #9942 (@jplu)
fix steps_in_epoch variable in trainer when using max_steps #9969 (@yylun)
[run_clm.py] fix getting extention #9977 (@patil-suraj)
Added integration tests for TensorFlow implementation of the ALBERT model #9976 (@spatil6)
TF DistilBERT integration tests #9975 (@spatil6)
Added integration tests for TensorFlow implementation of the mobileBERT #9978 (@spatil6)
Added integration tests for TensorFlow implementation of the MPNet model #9979 (@spatil6)
Added integration tests for Pytorch implementation of the ALBERT model #9980 (@spatil6)
distilbert: fix creation of sinusoidal embeddings #9917 (@stefan-it)
Add from_slow in fast tokenizers build and fixes some bugs #9987 (@sgugger)
Added Integration testing for Pytorch implementation of DistilBert model from issue #9948' #9995 (@danielpatrickhug)
Fix model templates #9999 (@LysandreJik)
[Proposal] Adding new encoder_no_repeat_ngram_size to generate. #9984 (@Narsil)
Remove unintentional "double" assignment in TF-BART like models #9997 (@stancld)
[trainer] a few fixes #9993 (@stas00)

v4.3.0.rc1

Wav2Vec2 from facebook (@patrickvonplaten)

ConvBERT

BORT

Trainer now supports Amazon SageMaker’s data parallel library (@sgugger)

Community page

Additional model architectures

Breaking changes

General improvements and bugfixes

huggingface/transformers

v4.3.0.rc1

Wav2Vec2 from facebook (@patrickvonplaten)

ConvBERT

BORT

Trainer now supports Amazon SageMaker’s data parallel library (@sgugger)

Community page

Additional model architectures

Breaking changes

General improvements and bugfixes

v4.3.0.rc1

Wav2Vec2 from facebook (@patrickvonplaten)

ConvBERT

BORT

Trainer now supports Amazon SageMaker’s data parallel library (@sgugger)

Community page

Additional model architectures

Breaking changes

General improvements and bugfixes

huggingface/transformers