v4.31.0

release notes

Published 7/18/2023

MinorContains breaking changes

Release notes

New models

Llama v2

Llama 2 was proposed in LLaMA: Open Foundation and Fine-Tuned Chat Models by Hugo Touvron et al. It builds upon the Llama architecture adding Grouped Query Attention for efficient inference.

Add support for Llama 2 by @ArthurZucker in #24891

Musicgen

The MusicGen model was proposed in the paper Simple and Controllable Music Generation by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.

MusicGen is a single stage auto-regressive Transformer model capable of generating high-quality music samples conditioned on text descriptions or audio prompts. The text descriptions are passed through a frozen text encoder model to obtain a sequence of hidden-state representations. MusicGen is then trained to predict discrete audio tokens, or audio codes, conditioned on these hidden-states. These audio tokens are then decoded using an audio compression model, such as EnCodec, to recover the audio waveform.

Through an efficient token interleaving pattern, MusicGen does not require a self-supervised semantic representation of the text/audio prompts, thus eliminating the need to cascade multiple models to predict a set of codebooks (e.g. hierarchically or upsampling). Instead, it is able to generate all the codebooks in a single forward pass.

Add Musicgen by @sanchit-gandhi in #24109

Bark

Bark is a transformer-based text-to-speech model proposed by Suno AI in suno-ai/bark.

Add bark by @ylacombe in #24086

MMS

The MMS model was proposed in Scaling Speech Technology to 1,000+ Languages by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli

Add MMS CTC Fine-Tuning by @patrickvonplaten in #24281

EnCodec

The EnCodec neural codec model was proposed in High Fidelity Neural Audio Compression by Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi.

Add EnCodec model by @hollance in #23655

InstructBLIP

The InstructBLIP model was proposed in InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the BLIP-2 architecture for visual instruction tuning.

Add InstructBLIP by @NielsRogge in #23460

Umt5

The UMT5 model was proposed in UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.

[Umt5] Add google's umt5 to transformers by @ArthurZucker in #24477

MRA

The MRA model was proposed in Multi Resolution Analysis (MRA) for Approximate Self-Attention by Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, and Vikas Singh.

Add Multi Resolution Analysis (MRA) by @novice03 in #24513

ViViT

The Vivit model was proposed in ViViT: A Video Vision Transformer by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid. The paper proposes one of the first successful pure-transformer based set of models for video understanding.

Add ViViT by @jegork in #22518

Python 3.7

The last version to support Python 3.7 was 4.30.x, as it reached end-of-life on June 27, 2023 and is no longer supported by the Python Software Foundation.

⚠️ Time to say goodbye to py37 by @ydshieh in #24091

PyTorch 1.9

The last version to support PyTorch 1.9 was 4.30.x. As it has been more than 2 years, and we're looking forward to using features available in PyTorch 1.10 and up, we do not support PyTorch 1.9 for v4.31 and up.

Byebye pytorch 1.9 by @ydshieh in #24080

RoPE scaling

This PR adds RoPE scaling to the LLaMa and GPTNeoX families of models. It allows us to extrapolate and go beyond the original maximum sequence length (e.g. 2048 tokens on LLaMA), without fine-tuning. It offers two strategies:

Linear scaling
Dynamic NTK scaling

Llama/GPTNeoX: add RoPE scaling by @gante in #24653

Agents

Tools now return a type that is specific to agents. This type can return a serialized version of itself (a string), that either points to a file on-disk or to the object's content. This should make interaction with text-based systems much simpler.

Tool types by @LysandreJik in #24032

Tied weights load

Models with potentially tied weights dropped off some keys from the state dict even when the weights were not tied. This has now been fixed and more generally, the whole experience of loading a model with state dict that don't match exactly should be improved in this release.

Tied weights load by @sgugger in #24310
Clean load keys by @sgugger in #24505

Whisper word-level timestamps

This PR adds a method of predicting timestamps at the word (or even token) level, by analyzing the cross-attentions and applying dynamic time warping.

add word-level timestamps to Whisper by @hollance in #23205

Auto model addition

A new auto model is added, AutoModelForTextEncoding. It is to be used when you want to extract the text encoder from an encoder-decoder architecture.

[AutoModel] Add AutoModelForTextEncoding by @sanchit-gandhi in #24305

Model deprecation

Transformers is growing a lot and to ease a bit the burden of maintenance on our side, we have taken the decision to deprecate models that are not used a lot. Those models will never actually disappear from the library, but we will stop testing them or accepting PRs modifying them. (enfin ça The criteria to identify models to deprecate was less than 1,000 unique downloads in the last 30 days for models that are at least one year old. The list of deprecated models is:

BORT
M-CTC-T
MMBT
RetriBERT
TAPEX
Trajectory Transformer
VAN

Deprecate models by @sgugger in #24787

Breaking changes

Fixes an issue with stripped spaces for the T5 family tokenizers. If this impacts negatively inference/training with your models, please let us know by opening an issue.

⚠️⚠️[T5Tokenize] Fix T5 family tokenizers⚠️⚠️ by @ArthurZucker in #24565

Bugfixes and improvements

add trust_remote_code option to CLI download cmd by @radames in #24097
Fix typo in Llama docstrings by @Kh4L in #24020
Avoid GPT-2 daily CI job OOM (in TF tests) by @ydshieh in #24106
[Lllama] Update tokenization code to ensure parsing of the special tokens [core] by @ArthurZucker in #24042
PLAM => PaLM by @xingener in #24129
[bnb] Fix bnb config json serialization by @younesbelkada in #24137
Correctly build models and import call_context for older TF versions by @Rocketknight1 in #24138
Generate: PT's top_p enforces min_tokens_to_keep when it is 1 by @gante in #24111
fix bugs with trainer by @pacman100 in #24134
Fix TF Rag OOM issue by @ydshieh in #24122
Fix SAM OOM issue on CI by @ydshieh in #24125
Fix XGLM OOM on CI by @ydshieh in #24123
[SAM] Fix sam slow test by @younesbelkada in #24140
[lamaTokenizerFast] Update documentation by @ArthurZucker in #24132
[BlenderBotSmall] Update doc example by @ArthurZucker in #24092
Fix Pipeline CI OOM issue by @ydshieh in #24124
[documentation] grammatical fixes in image_classification.mdx by @LiamSwayne in #24141
Fix typo in streamers.py by @freddiev4 in #24144
[tests] fix bitsandbytes import issue by @stas00 in #24151
Avoid OOM in doctest CI by @ydshieh in #24139
Fix Wav2Vec2 CI OOM by @ydshieh in #24190
Fix push to hub by @NielsRogge in #24187
Change ProgressCallback to use dynamic_ncols=True by @gmlwns2000 in #24101
[i18n]Translated "attention.mdx" to korean by @kihoon71 in #23878
Generate: force caching on the main model, in assisted generation by @gante in #24177
Fix device issue in OpenLlamaModelTest::test_model_parallelism by @ydshieh in #24195
Update GPTNeoXLanguageGenerationTest by @ydshieh in #24193
typo: fix typos in CONTRIBUTING.md and deepspeed.mdx by @zsj9509 in #24184
Generate: detect special architectures when loaded from PEFT by @gante in #24198
🌐 [i18n-KO] Translated tasks_summary.mdx to Korean by @kihoon71 in #23977
🚨🚨🚨 Replace DataLoader logic for Accelerate in Trainer, remove unneeded tests 🚨🚨🚨 by @muellerzr in #24028
Fix _load_pretrained_model by @SunMarc in #24200
Fix steps bugs in no trainer examples by @Ethan-yt in #24197
Skip RWKV test in past CI by @ydshieh in #24204
Remove unnecessary aten::to overhead in llama by @fxmarty in #24203
Update WhisperForAudioClassification doc example by @ydshieh in #24188
Finish dataloader integration by @muellerzr in #24201
Add the number of model test failures to slack CI report by @ydshieh in #24207
fix: TextIteratorStreamer cannot work with pipeline by @yuanwu2017 in #23641
Update (TF)SamModelIntegrationTest by @ydshieh in #24199
Improving error message when using use_safetensors=True. by @Narsil in #24232
Safely import pytest in testing_utils.py by @amyeroberts in #24241
fix overflow when training mDeberta in fp16 by @sjrl in #24116
deprecate use_mps_device by @pacman100 in #24239
Tied params cleanup by @sgugger in #24211
[Time Series] use mean scaler when scaling is a boolean True by @kashif in #24237
TF: standardize test_model_common_attributes for language models by @gante in #23457
Generate: GenerationConfig can overwrite attributes at from_pretrained time by @gante in #24238
Add torch >=1.12 requirement for Tapas by @ydshieh in #24251
Update urls in warnings for rich rendering by @IvanReznikov in #24136
Fix how we detect the TF package by @Rocketknight1 in #24255
Stop storing references to bound methods via tf.function by @Rocketknight1 in #24146
Skip GPT-J fx tests for torch < 1.12 by @ydshieh in #24256
docs wrt using accelerate launcher with trainer by @pacman100 in #24250
update FSDP save and load logic by @pacman100 in #24249
Fix URL in comment for contrastive loss function by @taepd in #24271
QA doc: import torch before it is used by @ByronHsu in #24228
Skip some TQAPipelineTests tests in past CI by @ydshieh in #24267
TF: CTRL with native embedding layers by @gante in #23456
Adapt Wav2Vec2 conversion for MMS lang identification by @patrickvonplaten in #24234
Update check of core deps by @sgugger in #24277
Pix2StructImageProcessor requires torch>=1.11.0 by @ydshieh in #24270
Fix Debertav2 embed_proj by @WissamAntoun in #24205
Clean up old Accelerate checks by @sgugger in #24279
Fix bug in slow tokenizer conversion, make it a lot faster by @stephantul in #24266
Fix check_config_attributes: check all configuration classes by @ydshieh in #24231
Fix LLaMa beam search when using parallelize by @FeiWang96 in #24224
remove unused is_decoder parameter in DetrAttention by @JayL0321 in #24226
Split common test from core tests by @sgugger in #24284
[fix] bug in BatchEncoding.getitem by @flybird1111 in #24293
Fix image segmentation tool bug by @amyeroberts in #23897
[Docs] Improve docs for MMS loading of other languages by @patrickvonplaten in #24292
Update README_zh-hans.md by @CooperFu in #24181
deepspeed init during eval fix by @pacman100 in #24298
[EnCodec] Changes for 32kHz ckpt by @sanchit-gandhi in #24296
[Docs] Fix the paper URL for MMS model by @hitchhicker in #24302
Update tokenizer_summary.mdx (grammar) by @belladoreai in #24286
Beam search type by @jprivera44 in #24288
Make can_generate as class method by @ydshieh in #24299
Update test versions on README.md by @sqali in #24307
[SwitchTransformers] Fix return values by @ArthurZucker in #24300
Fix functional TF Whisper and modernize tests by @Rocketknight1 in #24301
Big TF test cleanup by @Rocketknight1 in #24282
Fix ner average grouping with no groups by @Narsil in #24319
Fix ImageGPT doc example by @amyeroberts in #24317
Add test for proper TF input signatures by @Rocketknight1 in #24320
Adding ddp_broadcast_buffers argument to Trainer by @TevenLeScao in #24326
error bug on saving distributed optim state when using data parallel by @xshaun in #24108
🌐 [i18n-KO] Fixed tutorial/preprocessing.mdx by @sim-so in #24156
pin apex to a speicifc commit (for DeepSpeed CI docker image) by @ydshieh in #24351
byebye Hub connection timeout by @ydshieh in #24350
Clean up disk sapce during docker image build for transformers-pytorch-gpu by @ydshieh in #24346
Fix KerasMetricCallback: pass generate_kwargs even if use_xla_generation is False by @Kripner in #24333
Fix device issue in SwitchTransformers by @ydshieh in #24352
Update MMS integration docs by @vineelpratap in #24311
Make AutoFormer work with previous torch version by @ydshieh in #24357
Fix ImageGPT doctest by @amyeroberts in #24353
Fix link to documentation in Install from Source by @SoyGema in #24336
docs: add BentoML to awesome-transformers by @aarnphm in #24344
[Doc Fix] Fix model name path in the transformers doc for AutoClasses by @riteshghorse in #24329
Fix the order in GPTNeo's docstring by @qgallouedec in #24358
Respect explicitly set framework parameter in pipeline by @denis-ismailaj in #24322
Allow passing kwargs through to TFBertTokenizer by @Rocketknight1 in #24324
Fix resuming PeftModel checkpoints in Trainer by @llohann-speranca in #24274
TensorFlow CI fixes by @Rocketknight1 in #24360
Update tiny models for pipeline testing. by @ydshieh in #24364
[modelcard] add audio classification to task list by @sanchit-gandhi in #24363
[Whisper] Make tests faster by @sanchit-gandhi in #24105
Rename test to be more accurate by @sgugger in #24374
Add a check in ImageToTextPipeline._forward by @ydshieh in #24373
[Tokenizer doc] Clarification about add_prefix_space by @ArthurZucker in #24368
style: add BitsAndBytesConfig repr function by @aarnphm in #24331
Better test name and enable pipeline test for pix2struct by @ydshieh in #24377
Skip a tapas (tokenization) test in past CI by @ydshieh in #24378
[Whisper Docs] Nits by @ArthurZucker in #24367
[GPTNeoX] Nit in config by @ArthurZucker in #24349
[Wav2Vec2 - MMS] Correct directly loading adapters weights by @patrickvonplaten in #24335
Migrate doc files to Markdown. by @sgugger in #24376
Update deprecated torch.ger by @kit1980 in #24387
[docs] Fix NLLB-MoE links by @stevhliu in #24388
Add ffmpeg for doc_test_job on CircleCI by @ydshieh in #24397
byebye Hub connection timeout - Recast by @ydshieh in #24399
fix type annotation for debug arg by @Bearnardd in #24033
[Trainer] Fix optimizer step on PyTorch TPU by @cowanmeg in #24389
Fix gradient checkpointing + fp16 autocast for most models by @younesbelkada in #24247
Clean up dist import by @muellerzr in #24402
Check auto mappings could be imported via from transformers by @ydshieh in #24400
Remove redundant code from TrainingArgs by @muellerzr in #24401
Explicit arguments in from_pretrained by @ydshieh in #24306
[ASR pipeline] Check for torchaudio by @sanchit-gandhi in #23953
TF safetensors reduced mem usage by @Rocketknight1 in #24404
Skip test_conditional_generation_pt_pix2struct in Past CI (torch < 1.11) by @ydshieh in #24417
[bnb] Fix bnb serialization issue with new release by @younesbelkada in #24416
Revert "Fix gradient checkpointing + fp16 autocast for most models" by @younesbelkada in #24420
Fix save_cache version in config.yml by @ydshieh in #24419
Update RayTune doc link for Hyperparameter tuning by @JoshuaEPSamuel in #24422
TF CI fix for Segformer by @Rocketknight1 in #24426
Refactor hyperparameter search backends by @alexmojaki in #24384
Clarify batch size displayed when using DataParallel by @sgugger in #24430
Save site-packages as cache in CircleCI job by @ydshieh in #24424
[llama] Fix comments in weights converter by @weimingzha0 in #24436
[Trainer] Fix .to call on 4bit models by @younesbelkada in #24444
fix the grad_acc issue at epoch boundaries by @pacman100 in #24415
Replace python random with torch.rand to enable dynamo.export by @BowenBao in #24434
Fix typo by @siryuon in #24440
Fix some TFWhisperModelIntegrationTests by @ydshieh in #24428
fixes issue when saving fsdp via accelerate's FSDP plugin by @pacman100 in #24446
Allow dict input for audio classification pipeline by @sanchit-gandhi in #23445
Update JukeboxConfig.from_pretrained by @ydshieh in #24443
Improved keras imports by @Rocketknight1 in #24448
add missing alignment_heads to Whisper integration test by @hollance in #24487
Fix tpu_metrics_debug by @cowanmeg in #24452
Update AlbertModel type annotation by @amyeroberts in #24450
[pipeline] Fix str device issue by @younesbelkada in #24396
when resume from peft checkpoint, the model should be trainable by @sywangyi in #24463
deepspeed z1/z2 state dict fix by @pacman100 in #24489
Update InstructBlipModelIntegrationTest by @ydshieh in #24490
Update token_classification.md by @condor-cp in #24484
Add support for for loops in python interpreter by @sgugger in #24429
[InstructBlip] Add accelerate support for instructblip by @younesbelkada in #24488
Compute dropout_probability only in training mode by @ydshieh in #24486
Fix 'local_rank' AttiributeError in Trainer class by @mocobeta in #24297
Compute dropout_probability only in training mode (SpeechT5) by @ydshieh in #24498
Fix link in utils by @SoyGema in #24501
🚨🚨 Fix group beam search by @hukuda222 in #24407
Generate: group_beam_search requires diversity_penalty>0.0 by @gante in #24456
Generate: min_tokens_to_keep has to be >= 1 by @gante in #24453
Fix TypeError: Object of type int64 is not JSON serializable by @xiaoli in #24340
Fix poor past ci by @ydshieh in #24485
🌐 [i18n-KO] Translated tflite.mdx to Korean by @0525hhgus in #24435
use accelerate autocast in jit eval path, since mix precision logic is… by @sywangyi in #24460
Update hyperparameter_search.py by @pacman100 in #24515
[T5] Add T5ForQuestionAnswering and MT5ForQuestionAnswering by @sjrl in #24481
set model to training mode before accelerate.prepare by @sywangyi in #24520
Update huggingface_hub commit sha by @ydshieh in #24527
Find module name in an OS-agnostic fashion by @sgugger in #24526
Fix LR scheduler based on bs from auto bs finder by @muellerzr in #24521
[Mask2Former] Remove SwinConfig by @NielsRogge in #24259
Allow backbones not in backbones_supported - Maskformer Mask2Former by @amyeroberts in #24532
Fix Typo by @tony9402 in #24530
Finishing tidying keys to ignore on load by @sgugger in #24535
Add bitsandbytes support for gpt2 models by @DarioSucic in #24504
⚠️ Time to say goodbye to py37 by @ydshieh in #24091
Unpin DeepSpeed and require DS >= 0.9.3 by @ydshieh in #24541
Allow for warn_only selection in enable_full_determinism by @Frank995 in #24496
Fix typing annotations for FSDP and DeepSpeed in TrainingArguments by @mryab in #24549
Update PT/TF weight conversion after #24030 by @ydshieh in #24547
Update EncodecIntegrationTest by @ydshieh in #24553
[gpt2-int8] Add gpt2-xl int8 test by @younesbelkada in #24543
Fix processor init bug if image processor undefined by @amyeroberts in #24554
[InstructBlip] Add instruct blip int8 test by @younesbelkada in #24555
Update PT/Flax weight conversion after #24030 by @ydshieh in #24556
Make PT/Flax tests could be run on GPU by @ydshieh in #24557
Update masked_language_modeling.md by @condor-cp in #24560
Fixed OwlViTModel inplace operations by @pasqualedem in #24529
Update old existing feature extractor references by @amyeroberts in #24552
Fix Typo by @tony9402 in #24559
Fix annotations by @tony9402 in #24571
Docs: 4 bit doc corrections by @gante in #24572
Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments" by @sgugger in #24574
Update some torchscript tests after #24505 by @ydshieh in #24566
Removal of deprecated vision methods and specify deprecation versions by @amyeroberts in #24570
Fix ESM models buffers by @sgugger in #24576
Check all objects are equally in the main __init__ file by @ydshieh in #24573
Fix annotations by @tony9402 in #24582
fix peft ckpts not being pushed to hub by @pacman100 in #24578
Udate link to RunHouse hardware setup documentation. by @BioGeek in #24590
Show a warning for missing attention masks when pad_token_id is not None by @hackyon in #24510
Make (TF) CI faster (test only a subset of model classes) by @ydshieh in #24592
Speed up TF tests by reducing hidden layer counts by @Rocketknight1 in #24595
[several models] improve readability by @stas00 in #24585
Use protobuf 4 by @ydshieh in #24599
Limit Pydantic to V1 in dependencies by @lig in #24596
🌐 [i18n-KO] Translated perplexity.mdx to Korean by @HanNayeoniee in #23850
[Time-Series] Added blog-post to tips by @elisim in #24482
Pin Pillow for now by @ydshieh in #24633
Fix loading dataset docs link in run_translation.py example by @SoyGema in #24594
Generate: multi-device support for contrastive search by @gante in #24635
Generate: force cache with inputs_embeds forwarding by @gante in #24639
precompiled_charsmap checking before adding to the normalizers' list for XLNetTokenizerFast conversion. by @shahad-mahmud in #24618
Fix audio feature extractor deps by @sanchit-gandhi in #24636
llama fp16 torch.max bug fix by @prathikr in #24561
documentation_tests.txt - sort filenames alphabetically by @amyeroberts in #24647
Update warning messages reffering to post_process_object_detection by @rafaelpadilla in #24649
Add finetuned_from property in the autogenerated model card by @sgugger in #24528
Make warning disappear for remote code in pipelines by @sgugger in #24603
Fix EncodecModelTest::test_multi_gpu_data_parallel_forward by @ydshieh in #24663
Fix VisionTextDualEncoderIntegrationTest by @ydshieh in #24661
Add is_torch_mps_available function to utils by @NripeshN in #24660
Unpin huggingface_hub by @ydshieh in #24667
Fix model referenced and results in documentation. Model mentioned was inaccessible by @rafaelpadilla in #24609
Add Nucleotide Transformer notebooks and restructure notebook list by @Rocketknight1 in #24669
LlamaTokenizer should be picklable by @icyblade in #24681
Add dropouts to GPT-NeoX by @ZHAOTING in #24680
DeepSpeed/FSDP ckpt saving utils fixes and FSDP training args fixes by @pacman100 in #24591
Avoid import sentencepiece_model_pb2 in utils.__init__.py by @ydshieh in #24689
Fix integration with Accelerate and failing test by @muellerzr in #24691
[MT5] Fix CONFIG_MAPPING issue leading it to load umt5 class by @ArthurZucker in #24678
Fix flaky test_for_warning_if_padding_and_no_attention_mask by @ydshieh in #24706
Whisper: fix prompted max length by @gante in #24666
Enable conversational pipeline for GPTSw3Tokenizer by @saattrupdan in #24648
[T5] Adding model_parallel = False to T5ForQuestionAnswering and MT5ForQuestionAnswering by @sjrl in #24684
Docs: change some input_ids doc reference from BertTokenizer to AutoTokenizer by @gante in #24730
add link to accelerate doc by @SunMarc in #24601
[Patch-t5-tokenizer] Patches the changes on T5 to make sure previous behaviour is still valide for beginning of words by @ArthurZucker in #24622
Fix typo in LocalAgent by @jamartin9 in #24736
fix: Text splitting in the BasicTokenizer by @connor-henderson in #22280
Docs: add kwargs type to fix formatting by @gante in #24733
add gradient checkpointing for distilbert by @jordane95 in #24719
Skip keys not in the state dict when finding mismatched weights by @sgugger in #24749
Fix non-deterministic Megatron-LM checkpoint name by @janEbert in #24674
[InstructBLIP] Fix bos token of LLaMa checkpoints by @NielsRogge in #24492
Skip some slow tests for doctesting in PRs (Circle)CI by @ydshieh in #24753
Fix lr scheduler not being reset on reruns by @muellerzr in #24758
🐛 Handle empty gen_kwargs for seq2seq trainer prediction_step function by @gkumbhat in #24759
Allow existing configs to be registered by @sgugger in #24760
Unpin protobuf in docker file (for daily CI) by @ydshieh in #24761
Fix eval_accumulation_steps leading to incorrect metrics by @muellerzr in #24756
Add MobileVitV2 to doctests by @amyeroberts in #24771
Docs: Update logit processors call docs by @gante in #24729
Replacement of 20 asserts with exceptions by @Baukebrenninkmeijer in #24757
Update default values of bos/eos token ids in CLIPTextConfig by @ydshieh in #24773
Fix pad across processes dim in trainer and not being able to set the timeout by @muellerzr in #24775
gpt-bigcode: avoid zero_ to support Core ML by @pcuenca in #24755
Remove WWT from README by @LysandreJik in #24672
Rm duplicate pad_across_processes by @muellerzr in #24780
Revert "Unpin protobuf in docker file (for daily CI)" by @ydshieh in #24800
Removing unnecessary device=device in modeling_llama.py by @Liyang90 in #24696
[fix] Change the condition of ValueError in "convert_checkpoint_from_transformers_to_megatron" by @SeongBeomLEE in #24769
[DOC] Clarify relationshi load_best_model_at_end and save_total_limit by @BramVanroy in #24614
Upgrade jax/jaxlib/flax pin versions by @ydshieh in #24791
Fix MobileVitV2 doctest checkpoint by @amyeroberts in #24805
Skip torchscript tests for MusicgenForConditionalGeneration by @ydshieh in #24782
Generate: add SequenceBiasLogitsProcessor by @gante in #24334
Add accelerate version in transformers-cli env by @amyeroberts in #24806
Fix typo 'submosules' by @dymil in #24809
Remove Falcon docs for the release until TGI is ready by @Rocketknight1 in #24808
Update setup.py to be compatible with pipenv by @georgiemathews in #24789
Use _BaseAutoModelClass's register method by @fadynakhla in #24810
Run hub tests by @sgugger in #24807
Copy code when using local trust remote code by @sgugger in #24785
Fixing double use_auth_token.pop (preventing private models from being visible). by @Narsil in #24812
set correct model input names for gptsw3tokenizer by @DarioSucic in #24788
Check models used for common tests are small by @sgugger in #24824
[🔗 Docs] Fixed Incorrect Migration Link by @kadirnar in #24793
deprecate sharded_ddp training argument by @statelesshz in #24825
🌐 [i18n-KO] Translated custom_tools.mdx to Korean by @sim-so in #24580
Remove unused code in GPT-Neo by @namespace-Pt in #24826
Add Multimodal heading and Document question answering in task_summary.mdx by @y3sar in #23318
Fix is_vision_available by @ydshieh in #24853
Fix comments for _merge_heads by @bofenghuang in #24855
fix broken links in READMEs by @younesbelkada in #24861
Add TAPEX to the list of deprecated models by @sgugger in #24859
Fix token pass by @sgugger in #24862

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@hollance
- [WIP] add EnCodec model (#23655)
- add word-level timestamps to Whisper (#23205)
- add missing alignment_heads to Whisper integration test (#24487)
@sim-so
- 🌐 [i18n-KO] Fixed tutorial/preprocessing.mdx (#24156)
- 🌐 [i18n-KO] Translated custom_tools.mdx to Korean (#24580)
@novice03
- Add Multi Resolution Analysis (MRA) (New PR) (#24513)
@jegork
- Add ViViT (#22518)