v4.20.0

release notes

Published 6/16/2022

MinorContains breaking changes

Release notes

Big model inference

You can now use the big model inference of Accelerate directly in any call to from_pretrained by specifying device_map="auto" (or your own device_map). It will automatically load the model taking advantage of your GPU(s) then offloading what doesn't fit in RAM, or even on the hard drive if you don't have RAM. Your model can then be used normally for inference without anything else to do.

from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained(
  "bigscience/T0pp", revision="sharded", device_map="auto"
)

Use Accelerate in from_pretrained for big model inference by @sgugger in #17341

BLOOM

The BLOOM model has been proposed with its various versions through the BigScience Workshop. The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on different 46 languages including code.

BLOOM by @younesbelkada in #17474

CvT

The Convolutional vision Transformer (CvT) improves the Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

Add CvT by @NielsRogge and @AnugunjNaman in #17299

GPT Neo-X

GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile, whose weights are made freely and openly available to the public through a permissive license. GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models.

Adding GPT-NeoX-20B by @zphang in #16659

LayoutLMv3

LayoutLMv3 simplifies LayoutLMv2 by using patch embeddings (as in ViT) instead of leveraging a CNN backbone, and pre-trains the model on 3 objectives: masked language modeling (MLM), masked image modeling (MIM) and word-patch alignment (WPA).

Add LayoutLMv3 by @NielsRogge in #17060

LeViT

LeViT improves the Vision Transformer (ViT) in performance and efficiency by a few architectural differences such as activation maps with decreasing resolutions in Transformers and the introduction of an attention bias to integrate positional information.

Adding LeViT Model by Facebook by @AnugunjNaman in #17466

LongT5

LongT5 model is an extension of T5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. It is capable of handling input sequences of a length up to 16,384 tokens.

Add LongT5 model by @stancld in #16792

M-CTC-T

The M-CTC-T model is a 1B-param transformer encoder, with a CTC head over 8065 character labels and a language identification head over 60 language ID labels. It is trained on Common Voice (version 6.1, December 2020 release) and VoxPopuli. After training on Common Voice and VoxPopuli, the model is trained on Common Voice only. The labels are unnormalized character-level transcripts (punctuation and capitalization are not removed). The model takes as input Mel filterbank features from a 16Khz audio signal.

M-CTC-T Model by @cwkeam in #16402

Trajectory Transformer

This Transformer is used for deep reinforcement learning. To use it, you need to create sequences from actions, states and rewards from all previous timesteps. This model will treat all these elements together as one big sequence (a trajectory).

Add trajectory transformer by @CarlCochet in #17141

Wav2Vec2-Conformer

The Wav2Vec2-Conformer is an updated version of fairseq S2T: Fast Speech-to-Text. It requires more parameters than Wav2Vec2, but also yields an improved word error rate.

[Wav2Vec2Conformer] Official release by @patrickvonplaten in #17709
Add Wav2Vec2Conformer by @patrickvonplaten in #16812

TensorFlow implementations

Data2VecVision for semantic segmentation, OPT and Swin are now available in TensorFlow.

Add TFData2VecVision for semantic segmentation by @sayakpaul in #17271
Opt in flax and tf by @ArthurZucker in #17388
Add Tensorflow Swin model by @amyeroberts in #16988

Flax implementations

OPT is now available in Flax.

Opt in flax and tf by @ArthurZucker in #17388

Documentation translation in Italian and Portuguese

A community effort has been started to translate the documentation in two new languages: Italian and Portuguese.

Translation/italian: added pipeline_tutorial.mdx [Issue: #17459] by @nickprock in #17507
Add installation.mdx Italian translation by @mfumanelli in #17530
Setup for Italian translation and add quicktour.mdx translation by @mfumanelli in #17472
Adding the Portuguese version of the tasks/token_classification.mdx documentation by @jonatasgrosman in #17492
Adding the Portuguese version of the tasks/sequence_classification.mdx documentation by @jonatasgrosman in #17352
[ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial by @Fellip15 in #17076
Added translation of installation.mdx to Portuguese Issue #16824 by @rzimmerdev in #16979

Improvements and bugfixes

Sort the model doc Toc Alphabetically by @sgugger in #17723
normalize keys_to_ignore by @stas00 in #17722
CLI: Add flag to push TF weights directly into main by @gante in #17720
Update requirements.txt by @jeffra in #17719
Revert "Change push CI to run on workflow_run event by @ydshieh in #17692)"
Documentation: RemBERT fixes by @stefan-it in #17641
Change push CI to run on workflow_run event by @ydshieh in #17692
fix tolerance for a bloom slow test by @younesbelkada in #17634
[LongT5] disable model parallel test by @patil-suraj in #17702
FX function refactor by @michaelbenayoun in #17625
Add BloomForSequenceClassification and BloomForTokenClassification classes by @haileyschoelkopf in #17639
Swin main layer by @amyeroberts in #17693
Include a comment to reflect Amy's contributions by @sayakpaul in #17689
Rag end2end new by @shamanez in #17650
[LongT5] Rename checkpoitns by @patrickvonplaten in #17700
Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference by @jianan-gu in #17153
Fix doc builder Dockerfile by @ydshieh in #17435
Add FP16 Support for SageMaker Model Parallel by @haohanchen-yagao in #17386
enable cpu distribution training using mpirun by @sywangyi in #17570
Add Ray's scope to training arguments by @BramVanroy in #17629
Update modeling_gpt_neox.py by @willfrey in #17575
Fix dtype getter by @sgugger in #17668
explicitly set utf8 for Windows by @BramVanroy in #17664
Fixed documentation typo, parameter name is evaluation_strategy, not eval_strategy by @sainttttt in #17669
Add Visual Question Answering (VQA) pipeline by @sijunhe in #17286
Fix typo in adding_a_new_model README by @ayushtues in #17679
Avoid GPU OOM for a TF Rag test by @ydshieh in #17638
fix typo from emtpy to empty by @domenicrosati in #17643
[Generation Test] Make fast test actually fast by @patrickvonplaten in #17661
[Data2Vec] Speed up test by @patrickvonplaten in #17660
[BigBirdFlaxTests] Make tests slow by @patrickvonplaten in #17658
update README.md by @loubnabnl in #17657
🐛 Properly raise RepoNotFoundError when not authenticated by @SBrandeis in #17651
Fixes #17128 . by @mygithubid1 in #17356
Fix dtype getters by @sgugger in #17656
Add skip logic for attentions test - Levit by @amyeroberts in #17633
Enable crop_center method to handle (W, H, C) images by @alaradirik in #17626
Move Clip image utils to image_utils.py by @alaradirik in #17628
Skip tests until bug is fixed. by @sgugger in #17646
Translation/autoclass by @mfumanelli in #17615
didn't exist in pt-1.9 by @stas00 in #17644
convert assertion to raised exception in debertav2 by @sam-h-bean in #17619
Pre-build DeepSpeed by @ydshieh in #17607
[modeling_utils] torch_dtype/auto floating dtype fixes by @stas00 in #17614
Running a pipeline of float16. by @Narsil in #17637
fix use_amp rename after pr 17138 by @stas00 in #17636
Fix very long job failure text in Slack report by @ydshieh in #17630
Adding top_k argument to text-classification pipeline. by @Narsil in #17606
Mention in the doc we drop support for fairscale by @sgugger in #17610
Use shape_list to safely get shapes for Swin by @amyeroberts in #17591
Add ONNX support for ConvNeXT by @regisss in #17627
Add ONNX support for ResNet by @regisss in #17585
has_attentions - consistent test skipping logic and tf tests by @amyeroberts in #17495
CLI: Print all different tensors on exception by @gante in #17612
TF: Merge PT and TF behavior for Bart when no decoder_input_ids are passed by @gante in #17593
Fix telemetry URL by @sgugger in #17608
CLI: Properly detect encoder-decoder models by @gante in #17605
Fix link for community notebooks by @ngoquanghuy99 in #17602
Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch by @jianan-gu in #17138
fix train_new_from_iterator in the case of byte-level tokenizers by @SaulLu in #17549
Explicit versions in docker files by @ydshieh in #17586
CLI: add stricter automatic checks to pt-to-tf by @gante in #17588
fix by @ydshieh in #17589
quicktour.mdx en -> pt translation by @vitorfrois in #17074
Fx support for Deberta-v[1-2], Hubert and LXMERT by @michaelbenayoun in #17539
Add examples telemetry by @sgugger in #17552
Fix gendered sentence in Spanish translation by @omarespejel in #17558
Fix circular import in onnx.utils by @sgugger in #17577
Use latest stable PyTorch/DeepSpeed for Push & Scheduled CI by @ydshieh in #17417
Remove circular imports in layoutlm/init.py by @regisss in #17576
Add magic method to our TF models to convert datasets with column inference by @Rocketknight1 in #17160
[deepspeed / testing] reset global state by @stas00 in #17553
Remove RuntimeErrors for NaN-checking in 20B by @zphang in #17563
fix integration test levit by @AnugunjNaman in #17555
[deepspeed] fix load_best_model test by @stas00 in #17550
Update index.mdx by @BritneyMuller in #17547
Clean imports to fix test_fetcher by @sgugger in #17531
Update run_glue_no_trainer.py by @bofenghuang in #17546
Fix all offload and MP tests by @sgugger in #17533
Fix bug - layer names and activation from previous refactor by @amyeroberts in #17524
Add support for Perceiver ONNX export by @deutschmn in #17213
Allow from transformers import TypicalLogitsWarper by @teticio in #17477
Add Gated-SiLU to T5 by @DanielHesslow in #17420
Update URL for Hub PR docs by @lewtun in #17532
fix OPT-Flax CI tests by @ArthurZucker in #17512
[trainer/deepspeed] load_best_model (reimplement re-init) by @stas00 in #17151
Implemented loss for training AudioFrameClassification by @MorenoLaQuatra in #17513
Update configuration_auto.py by @kamalkraj in #17527
Check list of models in the main README and sort it by @sgugger in #17517
Fix when Accelerate is not installed by @sgugger in #17518
Clean README in post release job as well. by @sgugger in #17519
Fix CI tests hang forever by @ydshieh in #17471
Print more library versions in CI by @ydshieh in #17384
Split push CI into 2 workflows by @ydshieh in #17369
Fix Tapas tests by @ydshieh in #17510
CLI: tool to convert PT into TF weights and open hub PR by @gante in #17497
Fix flakey no-trainer test by @muellerzr in #17515
Deal with the error when task is regression by @fireindark707 in #16330
Fix CTRL tests by @ydshieh in #17508
Fix LayoutXLMProcessorTest by @ydshieh in #17506
Debug LukeForMaskedLM by @Ryou0634 in #17499
Fix MP and CPU offload tests for Funnel and GPT-Neo by @sgugger in #17503
Exclude Databricks from notebook env by @sgugger in #17496
Fix tokenizer type annotation in pipeline(...) by @willfrey in #17500
Refactor classes to inherit from nn.Module instead of nn.Sequential by @amyeroberts in #17493
Fix wav2vec2 export onnx model with attention_mask error by @nilboy in #16004
Add warning when using older version of torch for ViltFeatureExtractor by @xhluca in #16756
Fix typo of variable names for key and query projection layer by @Kyeongpil in #17155
Fixed wrong error message for missing weight file by @123jimin in #17216
Add OnnxConfig for SqueezeBert iss17314 by @Ruihua-Fang in #17315
[GPT2Tokenizer] Fix GPT2 with bos token by @patrickvonplaten in #17498
[Json configs] Make json prettier for all saved tokenizer files & ensure same json format for all processors (tok + feat_extract) by @patrickvonplaten in #17457
Accumulate tokens into batches in PreTrainedTokenizerBase.add_tokens() by @Witiko in #17119
Add HF.co for PRs / Issues regarding specific model checkpoints by @patrickvonplaten in #17485
Fix checkpoint name by @ydshieh in #17484
Docker image build in parallel by @ydshieh in #17434
Added XLM onnx config by @nandwalritik in #17030
Disk offload fix by @sgugger in #17428
TF: GPT-2 generation supports left-padding by @gante in #17426
Fix ViTMAEModelTester by @ydshieh in #17470
[Generate] Fix output scores greedy search by @patrickvonplaten in #17442
Fix nits by @omarespejel in #17349
Fx support for multiple model architectures by @michaelbenayoun in #17393
typo IBERT in repr quant_mode by @scratchmex in #17398
Fix typo (remove parenthesis) by @mikcnt in #17415
Improve notrainer examples by @pacman100 in #17449
[OPT] Fix bos token id default by @patrickvonplaten in #17441
Fix model parallelism test by @sgugger in #17439
Pin protobouf that breaks TensorBoard in PyTorch by @sgugger in #17440
Spanish translation of the file preprocessing.mdx by @yharyarias in #16299
Spanish translation of the files sagemaker.mdx and image_classification.mdx by @SimplyJuanjo in #17262
Added es version of bertology.mdx doc by @jQuinRivero in #17255
Wav2vec2 finetuning shared file system by @patrickvonplaten in #17423
fix link in performance docs by @lvwerra in #17419
Add link to Hub PR docs in model cards by @lewtun in #17421
Upd AutoTokenizer.from_pretrained doc examples by @c00k1ez in #17416
Support compilation via Torchdynamo, AOT Autograd, NVFuser by @anijain2305 in #17308
Add test for new model parallelism features by @sgugger in #17401
Make check_init script more robust and clean inits by @sgugger in #17408
Fix README localizer script by @sgugger in #17407
Fix expected value for OPT test test_inference_no_head by @ydshieh in #17395
Clean up CLIP tests by @NielsRogge in #17380
Enabling imageGPT auto feature extractor. by @Narsil in #16871
Add support for device_map="auto" to OPT by @sgugger in #17382
OPTForCausalLM lm_head input size should be config.word_embed_proj_dim by @vfbd in #17225
Traced models serialization and torchscripting fix by @michaelbenayoun in #17206
Fix Comet ML integration by @mxschmdt in #17381
Fix cvt docstrings by @AnugunjNaman in #17367
Correct & Improve Doctests for LayoutLMv2 by @gnolai in #17168
Fix CodeParrot training script by @loubnabnl in #17291
Fix a typo relative_postion_if_large -> relative_position_if_large by @stancld in #17366
Pin dill to fix examples by @sgugger in #17368
[Test OPT] Add batch generation test opt by @patrickvonplaten in #17359
Fix bug in Wav2Vec2 pretrain example by @ddobokki in #17326
fix for 17292 by @nadahlberg in #17293
[Generation] Fix Transition probs by @patrickvonplaten in #17311
[OPT] Run test in lower precision on GPU by @patrickvonplaten in #17353
Adding batch_size test to QA pipeline. by @Narsil in #17330
[BC] Fixing usage of text pairs by @Narsil in #17324
[tests] fix copy-n-paste error by @stas00 in #17312
Fix ci_url might be None by @ydshieh in #17332
fix by @ydshieh in #17337
Fix metric calculation in examples and setup tests to run on multi-gpu for no_trainer scripts by @muellerzr in #17331
docs for typical decoding by @jadermcs in #17186
Not send successful report by @ydshieh in #17329
Fix test_t5_decoder_model_past_large_inputs by @ydshieh in #17320
Add onnx export cuda support by @JingyaHuang in #17183
Add Information Gain Filtration algorithm by @mraunak in #16953
Fix typo by @kamalkraj in #17328
remove by @ydshieh in #17325
Accepting real pytorch device as arguments. by @Narsil in #17318
Updating the docs for max_seq_len in QA pipeline by @Narsil in #17316
[T5] Fix init in TF and Flax for pretraining by @patrickvonplaten in #17294
Add type hints for ProphetNet (Pytorch) by @jQuinRivero in #17223
fix by @patrickvonplaten in #17310
[LED] fix global_attention_mask not being passed for generation and docs clarification about grad checkpointing by @caesar-one in #17112
Add support for pretraining recurring span selection to Splinter by @jvcop in #17247
Add PR author in CI report + merged by info by @ydshieh in #17298
Fix dummy creation script by @sgugger in #17304
Doctest longformer by @KMFODA in #16441
[Test] Fix W2V-Conformer integration test by @patrickvonplaten in #17303
Improve mismatched sizes management when loading a pretrained model by @regisss in #17257
correct opt by @patrickvonplaten in #17301
Rewrite TensorFlow train_step and test_step by @Rocketknight1 in #17057
Fix tests of mixed precision now that experimental is deprecated by @Rocketknight1 in #17300
fix retribert's test_torch_encode_plus_sent_to_model by @SaulLu in #17231
[ConvNeXT] Fix drop_path_rate by @NielsRogge in #17280
Fix wrong PT/TF categories in CI report by @ydshieh in #17272
Fix missing job action button in CI report by @ydshieh in #17270
Fix test_model_parallelization by @lkm2835 in #17249
[Tests] Fix slow opt tests by @patrickvonplaten in #17282
docs(transformers): fix typo by @k-zehnder in #17263
logging documentation update by @sanderland in #17174
Use the PR URL in CI report by @ydshieh in #17269
Fix FlavaForPreTrainingIntegrationTest CI test by @ydshieh in #17232
Better error in the Auto API when a dep is missing by @sgugger in #17289
Make TrainerHyperParameterSigOptIntegrationTest slow test by @ydshieh in #17288
Automatically sort auto mappings by @sgugger in #17250
Mlflowcallback fix nonetype error by @orieg in #17171
Align logits and labels in OPT by @MichelBartels in #17237
Remove next sentence prediction from supported ONNX tasks by @lewtun in #17276
CodeParrot data pretokenization by @loubnabnl in #16932
Update codeparrot data preprocessing by @loubnabnl in #16944
Updated checkpoint support for Sagemaker Model Parallel by @cavdard in #17219
fixed bug in run_mlm_flax_stream.py by @KennethEnevoldsen in #17203
[doc] performance/scalability revamp by @stas00 in #15723
TF - Fix convnext classification example by @gante in #17261
Fix obvious typos in flax decoder impl by @cloudhan in #17279
Guide to create custom models in Spanish by @ignacioct in #17158
Translated version of model_sharing.mdx doc to spanish by @Gerard-170 in #16184
Add PR title to push CI report by @ydshieh in #17246
Fix push CI channel by @ydshieh in #17242
install dev. version of accelerate by @ydshieh in #17243
Fix Trainer for Datasets that don't have dict items by @sgugger in #17239
Handle copyright in add-new-model-like by @sgugger in #17218
fix --gpus option for docker by @ydshieh in #17235
Update self-push workflow by @ydshieh in #17177
OPT - fix docstring and improve tests slighly by @patrickvonplaten in #17228
OPT-fix by @younesbelkada in #17229
Fix typo in bug report template by @fxmarty in #17178
Black preview by @sgugger in #17217
update BART docs by @patil-suraj in #17212
Add test to ensure models can take int64 inputs by @Rocketknight1 in #17210

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@sayakpaul
- Include a comment to reflect Amy's contributions (#17689)
- Add TFData2VecVision for semantic segmentation (#17271)
@jianan-gu
- Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference (#17153)
- Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch (#17138)
@stancld
- Add LongT5 model (#16792)
- Fix a typo relative_postion_if_large -> relative_position_if_large (#17366)
@mfumanelli
- Translation/autoclass (#17615)
- Add installation.mdx Italian translation (#17530)
- Setup for Italian translation and add quicktour.mdx translation (#17472)
@cwkeam
- M-CTC-T Model (#16402)
@zphang
- Remove RuntimeErrors for NaN-checking in 20B (#17563)
- Adding GPT-NeoX-20B (#16659)
@AnugunjNaman
- fix integration test levit (#17555)
- Adding LeViT Model by Facebook (#17466)
- Fix cvt docstrings (#17367)
@yharyarias
- Spanish translation of the file preprocessing.mdx (#16299)
@mraunak
- Add Information Gain Filtration algorithm (#16953)
@rzimmerdev
- Added translation of installation.mdx to Portuguese Issue #16824 (#16979)

Latest release

Version v5.5.0is out. See relase notes.