v4.32.0

release notes

Published 8/22/2023

MinorContains breaking changes

Release notes

IDEFICS

The IDEFICS model was proposed in OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents by Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh

IDEFICS is the first open state-of-the-art visual language model at the 80B scale!

The model accepts arbitrary sequences of image and text and produces text, similarly to a multimodal ChatGPT.

Blogpost: hf.co/blog/idefics Playground: HuggingFaceM4/idefics_playground

new model: IDEFICS via HuggingFaceM4 by @stas00 in #24796

MPT

MPT has been added and is now officially supported within Transformers. The repositories from MosaicML have been updated to work best with the model integration within Transformers.

[MPT] Add MosaicML's MPT model to transformers by @ArthurZucker & @younesbelkada in #24629

GPTQ Integration

GPTQ quantization is now supported in Transformers, through the optimum library. The backend relies on the auto_gptq library, from which we use the GPTQ and QuantLinear classes.

See below for an example of the API, quantizing a model using the new GPTQConfig configuration utility.

from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
model_name = "facebook/opt-125m"

tokenizer = AutoTokenizer.from_pretrained(model_name)
config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer,  group_size=128, desc_act=False)
# works also with device_map (cpu offload works but not disk offload)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, quantization_config=config)

Most models under TheBloke namespace with the suffix GPTQ should be supported, for example, to load a GPTQ quantized model on TheBloke/Llama-2-13B-chat-GPTQ simply run (after installing latest optimum and auto-gptq libraries):

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TheBloke/Llama-2-13B-chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

For more information about this feature, we recommend taking a look at the following announcement blogpost: https://huggingface.co/blog/gptq-integration

GPTQ integration by @SunMarc in #25062

Pipelines

A new pipeline, dedicated to text-to-audio and text-to-speech models, has been added to Transformers. It currently supports the 3 text-to-audio models integrated into transformers: SpeechT5ForTextToSpeech, MusicGen and Bark.

See below for an example:

from transformers import pipeline

classifier = pipeline(model="suno/bark")
output = pipeline("Hey it's HuggingFace on the phone!")

audio = output["audio"]
sampling_rate = output["sampling_rate"]

Add Text-To-Speech pipeline by @ylacombe in #24952

Classifier-Free Guidance decoding

Classifier-Free Guidance decoding is a text generation technique developed by EleutherAI, announced in this paper. With this technique, you can increase prompt adherence in generation. You can also set it up with negative prompts, ensuring your generation doesn't go in specific directions. See its docs for usage instructions.

add CFG for .generate() by @Vermeille in #24654

Task guides

A new task guide going into Visual Question Answering has been added to Transformers.

VQA task guide by @MKhalusova in #25244

Model deprecation

We continue the deprecation of models that was introduced in https://github.com/huggingface/transformers/pull/24787.

By deprecating, we indicate that we will stop maintaining such models, but there is no intention of actually removing those models and breaking support for them (they might one day move into a separate repo/on the Hub, but we would still add the necessary imports to make sure backward compatibility stays). The main point is that we stop testing those models. The usage of the models drives this choice and aims to ease the burden on our CI so that it may be used to focus on more critical aspects of the library.

Deprecate unused OpenLlama architecture by @tomaarsen in #24922

Translation Efforts

There are ongoing efforts to translate the transformers' documentation in other languages. These efforts are driven by groups independent to Hugging Face, and their work is greatly appreciated further to lower the barrier of entry to ML and Transformers.

If you'd like to kickstart such an effort or help out on an existing one, please feel free to reach out by opening an issue.

🌐 [i18n-KO] Translatedtasks/document_question_answering.md to Korean by @jungnerd in #24588
🌐 [i18n-KO] Fixed Korean and English quicktour.md by @wonhyeongseo in #24664
🌐 [i18n-KO] Updated Korean serialization.md by @wonhyeongseo in #24686
🌐 [i18n-KO] Translated performance.md to Korean by @augustinLib in #24883
🌐 [i18n-KO] Translated testing.md to Korean by @Sunmin0520 in #24900
🌐 [i18n-KO] Translated perf_train_cpu.md to Korean by @seank021 in #24911
🌐 [i18n-KO] Translated <tf_xla>.md to Korean by @54data in #24904
🌐 [i18n-KO] Translated perf_hardware.md to Korean by @augustinLib in #24966
🌐 [i18n-KO] Translated hpo_train.md to Korean by @harheem in #24968
🌐 [i18n-KO] Translated perf_infer_cpu.md to Korean by @junejae in #24920
🌐 [i18n-KO] Translated pipeline_webserver.md to Korean by @kihoon71 in #24828
🌐 [i18n-KO] Translated transformers_agents.md to Korean by @sim-so in #24881
🌐 [i18n-KO] Translated perf_infer_gpu_many.md to Korean by @heuristicwave in #24943
🌐 [i18n-KO] Translated perf_infer_gpu_one.md to Korean by @eenzeenee in #24978
🌐 [i18n-KO] Translated add_tensorflow_model.md to Korean by @keonju2 in #25017
🌐 [i18n-KO] Translated perf_train_cpu_many.md to Korean by @nuatmochoi in #24923
🌐 [i18n-KO] Translated add_new_model.md to Korean by @mjk0618 in #24957
🌐 [i18n-KO] Translated model_summary.md to Korean by @0525hhgus in #24625
🌐 [i18n-KO] Translated philosophy.md to Korean by @TaeYupNoh in #25010
🌐 [i18n-KO] Translated perf_train_tpu_tf.md to Korean by @0525hhgus in #25433
🌐 [i18n-KO] Translated docs: ko: pr_checks.md to Korean by @sronger in #24987

Explicit input data format for image processing

Addition of input_data_format argument to image transforms and ImageProcessor methods, allowing the user to explicitly set the data format of the images being processed. This enables processing of images with non-standard number of channels e.g. 4 or removes error which occur when the data format was inferred but the channel dimension was ambiguous.

import numpy as np
from transformers import ViTImageProcessor

img = np.random.randint(0, 256, (4, 6, 3))
image_processor = ViTImageProcessor()
inputs = image_processor(img, image_mean=0, image_std=1, input_data_format="channels_first")

Input data format by @amyeroberts in #25464
Add input_data_format argument, image transforms by @amyeroberts in #25462

Documentation clarification about efficient inference through `torch.scaled_dot_product_attention` & Flash Attention

Users are not aware that it is possible to force dispatch torch.scaled_dot_product_attention method from torch to use Flash Attention kernels. This leads to considerable speedup and memory saving, and is also compatible with quantized models. We decided to make this explicit to users in the documentation.

[Docs / BetterTransformer ] Added more details about flash attention + SDPA : https://github.com/huggingface/transformers/pull/25265

In a nutshell, one can just run:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m").to("cuda")

# convert the model to BetterTransformer
model.to_bettertransformer()

input_text = "Hello my dog is cute and"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
    outputs = model.generate(**inputs)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

to enable Flash-attenion in their model. However, this feature does not support padding yet.

FSDP and DeepSpeed Changes

Users will no longer encounter CPU RAM OOM when using FSDP to train very large models in multi-gpu or multi-node multi-gpu setting. Users no longer have to pass fsdp_transformer_layer_cls_to_wrap as the code now use _no_split_modules by default which is available for most of the popular models. DeepSpeed Z3 init now works properly with Accelerate Launcher + Trainer.

add util for ram efficient loading of model when using fsdp by @pacman100 in #25107
fix fsdp checkpointing issues by @pacman100 in #24926
fsdp fixes and enhancements by @pacman100 in #24980
fix deepspeed load best model at end when the model gets sharded by @pacman100 in #25057
resolving zero3 init when using accelerate config with Trainer by @pacman100 in #25227
fix z3 init when using accelerate launcher by @pacman100 in #25589

Breaking changes

Default optimizer in the `Trainer` class

The default optimizer in the Trainer class has been updated to be adam_torch rather than our own adam_hf, as the official Torch optimizer is more robust and fixes some issues.

In order to keep the old behavior, ensure that you pass "adamw_hf" as the optim value in your TrainingArguments.

🚨🚨🚨Change default from adamw_hf to adamw_torch 🚨🚨🚨 by @muellerzr in #25109

ViVit and EfficientNet rescale bugfix

There was an issue with the definition of the rescale of values with ViVit and EfficientNet. These have been fixed, but will result in different model outputs for both of these models. To understand the change and see what needs to be done to obtain previous results, please take a look at the following PR.

🚨🚨🚨 Fix rescale ViVit Efficientnet by @amyeroberts in #25174
🚨🚨🚨 Vivit update default rescale_factor value by @amyeroberts in #25547

Removing softmax for the image classification EfficientNet class

The EfficientNetForImageClassification model class did not follow conventions and added a softmax to the model logits. This was removed so that it respects the convention set by other models.

In order to obtain previous results, pass the model logits through a softmax.

🚨🚨🚨 Remove softmax for EfficientNetForImageClassification 🚨🚨🚨 by @amyeroberts in #25501

Bug fixes with SPM models

Some SPM models had issues with their management of added tokens. Namely the Llama and T5, among others, were behaving incorrectly. These have been updated in https://github.com/huggingface/transformers/pull/25224.

An option to obtain the previous behavior was added through the legacy flag, as explained in the PR linked above.

🚨🚨🚨 [SPM] Finish fix spm models 🚨🚨🚨 by @ArthurZucker in #25224

Bugfixes and improvements

Disable ipex env var if false by @muellerzr in #24885
Check for accelerate env var when doing CPU only by @muellerzr in #24890
Avoid some pipeline tasks to use use_cache=True by @ydshieh in #24893
Update tested versions in READMEs by @EliahKagan in #24895
Fix test_model_parallelism for FalconModel by @ydshieh in #24914
Fixed issue where ACCELERATE_USE_CPU="False" results in bool(True) by @madhavajay in #24907
fix typo in BARK_PRETRAINED_MODEL_ARCHIVE_LIST by @21jun in #24902
Fix minor llama2.md model doc typos by @tmc in #24909
[Llama2] replace self.pretraining_tp with self.config.pretraining_tp by @younesbelkada in #24906
[doc] image_processing_vilt.py wrong default documented by @stas00 in #24931
Add multi-label text classification support to pytorch example by @ranchlai in #24770
replace no_cuda with use_cpu in test_pytorch_examples by @statelesshz in #24944
Generate: sequence bias can handle same terminations by @gante in #24822
Update processing_vision_text_dual_encoder.py by @premsa in #24950
Fix main_input_name in src/transformers/keras_callbacks.py by @ydshieh in #24916
[DOCS] Example for LogitsProcessor class by @shauray8 in #24848
fix type annotations for arguments in training_args by @shauray8 in #24550
[RWKV] Add Gradient Checkpointing support for RWKV by @younesbelkada in #24955
Change logic for logging in the examples by @muellerzr in #24956
Contrastive Search peak memory reduction by @blbadger in #24120
Fallback for missing attribute Parameter.ds_numel by @apoorvkh in #24942
fix fsdp checkpointing issues by @pacman100 in #24926
fix: cast input pixels to appropriate dtype for image_to_text pipelines by @JimAllanson in #24947
fsdp fixes and enhancements by @pacman100 in #24980
Fix missing spaces in system prompt of Llama2 tokenizer by @chenjoya in #24930
[LlamaConfig] Nit: pad token should be None by default by @ArthurZucker in #24958
Remove tokenizers from the doc table by @sgugger in #24963
Avoid importing all models when instantiating a pipeline by @sgugger in #24960
Fix type annotation for deepspeed training arg by @sgugger in #24988
Use main_input_name for include_inputs_for_metrics by @sgugger in #24993
Fix llama tokenization doctest by @ydshieh in #24990
[bnb] Add simple check for bnb import by @younesbelkada in #24995
[Llama] remove persistent inv_freq tensor by @ArthurZucker in #24998
improve from_pretrained for zero3 multi gpus mode by @1ytic in #24964
Move template doc file to md by @sgugger in #25004
[check_config_docstrings.py] improve diagnostics by @stas00 in #25012
[logging.py] set default stderr path if None by @ArthurZucker in #25033
fix(integrations): store serialized TrainingArgs to wandb.config without sanitization. by @parambharat in #25035
[docs] Performance docs tidy up, part 1 by @MKhalusova in #23963
Support GatedRepoError + use raise from by @Wauplin in #25034
Better handling missing SYS in llama conversation tokenizer by @ichernev in #24997
Add dispatch_batches to training arguments by @muellerzr in #25038
Fix typo in LlamaTokenizerFast docstring example by @sbrunk in #25018
Make more test models smaller by @sgugger in #25005
Pvt model by @Xrenya in #24720
compute_loss in trainer failing to label shift for PEFT model when label smoothing enabled. by @njbrake in #25044
[8bit] Fix 8bit corner case with Blip2 8bit by @younesbelkada in #25047
Better error message when signal is not supported on OS by @sgugger in #25049
[RWKV] Add note in doc on RwkvStoppingCriteria by @ArthurZucker in #25055
Generate - add beam indices output in contrained beam search by @gante in #25042
[Docs] fix rope_scaling doc string by @kashif in #25072
Fix last models for common tests that are too big. by @sgugger in #25058
fix: add TOC anchor link by @eenzeenee in #25066
Set TF32 flag for PyTorch cuDNN backend by @XuehaiPan in #25075
Fix broken link in README_hd.md by @susnato in #25067
replace per_gpu_eval_batch_size with per_device_eval_batch_size in readme of multiple-choice task by @statelesshz in #25078
[generate] Only warn users if the generation_config's max_length is set to the default value by @ArthurZucker in #25030
Fix: repeat per sample for SAM image embeddings by @xk-huang in #25074
[DOCS] add example NoBadWordsLogitsProcessor by @SoyGema in #25046
Allow generic composite models to pass more kwargs by @ydshieh in #24927
[ ForSequenceClassification] Support left padding by @ArthurZucker in #24979
[TF] Also apply patch to support left padding by @ArthurZucker in #25085
Edit err message and comment in test_model_is_small by @connor-henderson in #25087
[ PreTrainedTokenizerFast] Keep properties from fast tokenizer by @ArthurZucker in #25053
Hotfix for failing MusicgenForConditionalGeneration tests by @ydshieh in #25091
[T5, MT5, UMT5] Add [T5, MT5, UMT5]ForSequenceClassification by @sjrl in #24726
Fix doctest by @ydshieh in #25031
fix tied_params for meta tensor by @SunMarc in #25101
documentation for llama2 models by @shauray8 in #25102
Fix PvtModelIntegrationTest::test_inference_fp16 by @ydshieh in #25106
Add descriptive docstring to TemperatureLogitsWarper by @nablabits in #24892
fix "UserWarning: Creating a tensor from a list of numpy.ndarrays is … by @liucw2012 in #24772
update use_auth_token -> token by @ydshieh in #25083
Fix past CI after #24334 by @ydshieh in #25113
Move common image processing methods to BaseImageProcessor by @amyeroberts in #25089
Fix ViT docstring regarding default dropout values. by @ebezzam in #25118
MaskFormer - enable return_dict in order to compile by @amyeroberts in #25052
Move center_crop to BaseImageProcessor by @amyeroberts in #25122
fix deepspeed load best model at end when the model gets sharded by @pacman100 in #25057
fix delete all checkpoints when save_total_limit is set to 1 by @Pbihao in #25136
[T5/LlamaTokenizer] default legacy to None to not always warn by @ArthurZucker in #25131
Clarify 4/8 bit loading log message by @BramVanroy in #25134
[MptConfig] support from pretrained args by @ArthurZucker in #25116
Add offload support to Bark by @ylacombe in #25037
More token things by @ydshieh in #25146
Add bloom flax by @sanchit-gandhi in #25094
Add new model in doc table of content by @sgugger in #25148
Fix .push_to_hub and cleanup get_full_repo_name usage by @Wauplin in #25120
Add test when downloading from gated repo by @Wauplin in #25039
override .cuda() to check if model is already quantized by @ranchlai in #25166
Represent query_length in a different way to solve jit issue by @jiqing-feng in #25164
make run_generation more generic for other devices by @statelesshz in #25133
added compiled model support for inference by @markovalexander in #25124
Update use_auth_token -> token in example scripts by @ydshieh in #25167
[Mpt] Fix mpt slow test by @younesbelkada in #25170
[InstructBlip] Fix instructblip slow test by @younesbelkada in #25171
Fix beam search to sample at least 1 non eos token by @yonigottesman in #25103)
[MusicGen] Fix integration tests by @sanchit-gandhi in #25169
Musicgen: CFG is manually added by @gante in #25173
Better error message in _prepare_output_docstrings by @ydshieh in #25202
[PreTrainedModel] Wrap cuda and to method correctly by @younesbelkada in #25206
Fix all_model_classes in FlaxBloomGenerationTest by @ydshieh in #25211
[quantization.md] fix by @stas00 in #25190
[pipeline] revisit device check for pipeline by @younesbelkada in #25207
Update tiny model info. and pipeline testing by @ydshieh in #25213
Fix docker image build failure by @ydshieh in #25214
make build_mpt_alibi_tensor a method of MptModel so that deepspeed co… by @sywangyi in #25193
[Pix2Struct] Fix pix2struct cross attention by @younesbelkada in #25200
[Docs/quantization] Clearer explanation on how things works under the hood. + remove outdated info by @younesbelkada in #25216
[MPT] Add require_bitsandbytes on MPT integration tests by @younesbelkada in #25201
[Detr] Fix detr BatchNorm replacement issue by @younesbelkada in #25230
Move rescale dtype recasting to match torchvision ToTensor by @amyeroberts in #25229
Fix set of model parallel in the Trainer when no GPUs are available by @sgugger in #25239
fix get_keys_to_not_convert() to return correct modules for full precision inference by @ranchlai in #25105
add pathname and line number to logging formatter in debug mode by @ranchlai in #25203
Add token arugment in example scripts by @ydshieh in #25172
resolving zero3 init when using accelerate config with Trainer by @pacman100 in #25227
Update rescale tests - cast to float after rescaling to reflect #25229 by @amyeroberts in #25259
Fix some bugs for two stage training of deformable detr by @jypjypjypjyp in #25045
[DOCS] Add example and modified docs of EtaLogitsWarper by @ashishthomaschempolil in #25125
Fix return_dict_in_generate bug in InstructBlip generate function by @eohomegrownapps in #25246
Remove pytest_options={"rA": None} in CI by @ydshieh in #25263
recommend DeepSpeed's Argument Parsing documentation by @BurnzZ in #25268
[MMS] Fix mms by @patrickvonplaten in #25267
CI with num_hidden_layers=2 🚀🚀🚀 by @ydshieh in #25266
CI with pytest_num_workers=8 for torch/tf jobs by @ydshieh in #25274
Docs: Update list of report_to logging integrations in docstring by @tomaarsen in #25281
Update InstructBLIP & Align values after rescale update by @amyeroberts in #25209
Docs: separate generate section by @gante in #25235
Update bark doc by @ylacombe in #25234
add generate method to SpeechT5ForTextToSpeech by @ylacombe in #25233
Add timeout parameter to load_image function by @rolisz in #25184
[JAX] Bump min version by @sanchit-gandhi in #25286
[small] llama2.md typo by @H-Huang in #25295
Fix typo: Roberta -> RoBERTa by @MrGeislinger in #25302
Move usage of deprecated logging.warn to logging.warning by @PeterJCLaw in #25310
Give more memory in test_disk_offload by @sgugger in #25315
Generate: get generation mode as an enum by @gante in #25292
Add offline mode for agents by @sgugger in #25226
Deal with nested configs better in base class by @sgugger in #25237
Document check copies by @sgugger in #25291
Make bark could have tiny model by @ydshieh in #25290
Document toc check and doctest check scripts by @sgugger in #25319
[Whisper] Better error message for outdated generation config by @sanchit-gandhi in #25298
Remove jnp.DeviceArray since it is deprecated. by @mariecwhite in #24875
Update TF pin in docker image by @ydshieh in #25343
Generalize CFG to allow for positive prompts by @oobabooga in #25339
Loosen output shape restrictions on GPT-style models by @calpt in #25188
Allow trust_remote_code in example scripts by @Jackmin801 in #25248
Generate: remove Marian hack by @gante in #25294
Fix more offload edge cases by @ydshieh in #25342
Migrate Trainer from Repository to upload_folder by @sgugger in #25095
Adding more information in help parser on train_file and validation_file by @pphuc25 in #25324
[DOCS] Add NoRepeatNGramLogitsProcessor Example for LogitsProcessor class by @Rishab26 in #25186
Docs: Added benchmarks for torch.compile() for vision models by @merveenoyan in #24748
Add mask2former fp16 support by @pedrohml in #25093
[DOCS] Add descriptive docstring to MinNewTokensLength by @nablabits in #25196
Register ModelOutput subclasses as supported torch.utils._pytree nodes by @ringohoffman in #25358
Fix test_model_parallelism by @ydshieh in #25359
Add warning for missing attention mask when pad tokens are detected by @hackyon in #25345
[ASR Pipeline] Clarify return timestamps by @sanchit-gandhi in #25344
MaskFormer, Mask2Former - replace einsum for tracing by @amyeroberts in #25297
Load state in else by @muellerzr in #25318
Fix token in example template by @ydshieh in #25351
Enable tests to run on third-party devcies by @statelesshz in #25327
Fix torch_job worker(s) crashing by @ydshieh in #25374
Generate: add config-level validation by @gante in #25381
Fix missing usage of token by @ydshieh in #25382
Use small config for OneFormerModelTest.test_model_with_labels by @ydshieh in #25383
Add copied from for image processor methods by @amyeroberts in #25121
change version by @SunMarc in #25387
[DOCS] Add example for TopPLogitsWarper by @chiral-carbon in #25361
16059 - Add missing type hints for ASTModel by @nablabits in #25364
rm useless condition since the previous condition contains it. by @jiqing-feng in #25403
Fix path for dynamic module creation by @sgugger in #25402
YOLOS - Revert default return_pixel_mask value by @amyeroberts in #25404
Docs: introduction to generation with LLMs by @gante in #25240
Generate: length validation by @gante in #25384
Improve training args by @statelesshz in #25401
Generate: generation config validation fixes in docs by @gante in #25405
16059 - Add extra type hints for AltCLIPModel by @nablabits in #25399
Generate: lower severity of parameterization checks by @gante in #25407
Update Bark generation configs and tests by @ylacombe in #25409
aligned sample_beam output selection with beam_search by @hukuda222 in #25375
Enable passing number of channels when inferring data format by @amyeroberts in #25412
Bark: flexible generation config overload by @gante in #25414
[DINOv2] Update pooler output by @NielsRogge in #25392
Doc checks by @sgugger in #25408
Generation: strict generation config validation at save time by @gante in #25411
[WavLM] Fix Arxiv link and authors by @sanchit-gandhi in #25415
Generate: Load generation config when device_map is passed by @gante in #25413
Fix rendering for torch.compile() docs by @merveenoyan in #25432
Add examples to tests to run when setup.py is modified by @ydshieh in #25437
Fix issue with ratio evaluation steps and auto find batch size by @muellerzr in #25436
docs: add LLaMA-Efficient-Tuning to awesome-transformers by @statelesshz in #25441
Fix for #25437 by @ydshieh in #25454
Refactor image processor testers by @amyeroberts in #25450
Switch Transformers: remove overwritten beam sample test by @gante in #25458
Reuse the cache created for latest main on PRs/branches if setup.py is not modified by @ydshieh in #25445
Update run_translation.py broken link example Pytoch by @SoyGema in #25461
Add input_data_format argument, image transforms by @amyeroberts in #25462
Mark flaky tests by @amyeroberts in #25463
Revert "Reuse the cache created for latest main on PRs/branches" by @ydshieh in #25466
import required torch and numpy libraries by @eze1376 in #25483
fix : escape key of start_token from special characters before search end_token in token2json function of DonutProcessor by @nour-elkamel in #25472
Remove logging code in TF Longformer that fails to compile by @Rocketknight1 in #25496
Add type hints to Blip2QFormer, BigBirdForQA and ConditionalDetr family models by @nablabits in #25488
Set can_generate for SpeechT5ForTextToSpeech by @ylacombe in #25493
MaskFormer post_process_instance_segmentation bug fix convert out side of loop by @amyeroberts in #25497
fix gptq nits by @SunMarc in #25500
Conditional DETR type hint fix by @Rocketknight1 in #25505
Check for case where auxiliary_head is None in UperNetPreTrainedModel by @mmurray in #25514
add repr to the BitsAndBytesConfig class by @ranchlai in #25517
Make training args fully immutable by @muellerzr in #25435
Use dynamic past key-values shape in TF-Whisper by @Rocketknight1 in #25523
[TYPO] fix typo/format in quicktour.md by @lishukan in #25519
Fix nested configs of Jukebox by @sgugger in #25533
Marian: post-hack-fix correction by @gante in #25459
Document the test fetcher by @sgugger in #25521
Generate: fix default max length warning by @gante in #25539
fix vit hybrid test by @SunMarc in #25543
Fix MaskFormerModelIntegrationTest OOM by @ydshieh in #25544
More frozen args by @muellerzr in #25540
Input data format by @amyeroberts in #25464
[ASR Pipeline] Fix init with timestamps by @sanchit-gandhi in #25438
More utils doc by @sgugger in #25457
Update trainer.py by @yundai424 in #25553
Add documentation to dynamic module utils by @sgugger in #25534
Fix MPT CI by @ydshieh in #25548
Fix torch.fx tests on nightly CI by @ydshieh in #25549
YOLOS - reset default return_pixel_mask value by @amyeroberts in #25559
Skip test_onnx_runtime_optimize for now by @ydshieh in #25560
[Docs] Fix un-rendered images by @younesbelkada in #25561
Adds TRANSFORMERS_TEST_DEVICE by @vvvm23 in #25506
Skip test_beam_search_xla_generate_simple for T5 by @ydshieh in #25566
[resize_embedding] Introduce pad_to_multiple_of and guidance by @ArthurZucker in #25088
[SwitchTransformers] Remove unused module by @ArthurZucker in #25427
Inconsistency in PreTrainedModel.resize_token_embeddings When ZeRO3 Is Enabled by @sinamoeini in #25394
[NllbMoe] Update code to properly support loss computation by @ArthurZucker in #25429
[Tests] Fix failing 8bit test by @younesbelkada in #25564
Revert "change version by @SunMarc in #25387)"
add util for ram efficient loading of model when using fsdp by @pacman100 in #25107
Skip test_contrastive_generate for TFXLNet by @ydshieh in #25574
add warning for 8bit optimizers by @SunMarc in #25575
Fix typo in example code by @amelietamreymond in #25583
Suggestions on Pipeline_webserver by @kihoon71 in #25570
[Docs / BetterTransformer ] Added more details about flash attention + SDPA by @younesbelkada in #25265
Added missing parenthesis in call to is_fsdp_enabled by @marma in #25585
Replaces calls to .cuda with .to(torch_device) in tests by @vvvm23 in #25571
[split_special_tokens] Add support for split_special_tokens argument to encode by @ArthurZucker in #25081
[Llama] remove prompt and fix prefix finetuning by @ArthurZucker in #25565
[Time series Informer] fix dtype of cumsum by @kashif in #25431
fix z3 init when using accelerate launcher by @pacman100 in #25589
[TokenizerFast] Fix setting prefix space in init by @ArthurZucker in #25563
Make TTS automodels importable by @osanseviero in #25595
reattach hooks when using resize_token_embeddings by @SunMarc in #25596
Ignore all exceptions from signal in dynamic code by @sgugger in #25623
Fix PEFT integration failures on nightly CI by @younesbelkada in #25624
Run doctest for new files by @ydshieh in #25588
Fix test_modeling_mpt typo in model id by @JuanFKurucz in #25606

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@ranchlai
- Add multi-label text classification support to pytorch example (#24770)
- override .cuda() to check if model is already quantized (#25166)
- fix get_keys_to_not_convert() to return correct modules for full precision inference (#25105)
- add pathname and line number to logging formatter in debug mode (#25203)
- add repr to the BitsAndBytesConfig class (#25517)
@wonhyeongseo
- 🌐 [i18n-KO] Fixed Korean and English quicktour.md (#24664)
- 🌐 [i18n-KO] Updated Korean serialization.md (#24686)
@Sunmin0520
- 🌐 [i18n-KO] Translated testing.md to Korean (#24900)
@Xrenya
- Pvt model (#24720)
@susnato
- Fix broken link in README_hd.md (#25067)
- Add Pop2Piano (#21785)
@sjrl
- [T5, MT5, UMT5] Add [T5, MT5, UMT5]ForSequenceClassification (#24726)
@Jackmin801
- Allow trust_remote_code in example scripts (#25248)
@mjk0618
- 🌐 [i18n-KO] Translated add_new_model.md to Korean (#24957)

Latest release

Version v5.5.0is out. See relase notes.

v4.32.0

release notes

Published 8/22/2023

MinorContains breaking changes

Release notes

IDEFICS

IDEFICS is the first open state-of-the-art visual language model at the 80B scale!

The model accepts arbitrary sequences of image and text and produces text, similarly to a multimodal ChatGPT.

Blogpost: hf.co/blog/idefics Playground: HuggingFaceM4/idefics_playground

new model: IDEFICS via HuggingFaceM4 by @stas00 in #24796

MPT

MPT has been added and is now officially supported within Transformers. The repositories from MosaicML have been updated to work best with the model integration within Transformers.

[MPT] Add MosaicML's MPT model to transformers by @ArthurZucker & @younesbelkada in #24629

GPTQ Integration

GPTQ quantization is now supported in Transformers, through the optimum library. The backend relies on the auto_gptq library, from which we use the GPTQ and QuantLinear classes.

See below for an example of the API, quantizing a model using the new GPTQConfig configuration utility.

from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
model_name = "facebook/opt-125m"

tokenizer = AutoTokenizer.from_pretrained(model_name)
config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer,  group_size=128, desc_act=False)
# works also with device_map (cpu offload works but not disk offload)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, quantization_config=config)

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TheBloke/Llama-2-13B-chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

For more information about this feature, we recommend taking a look at the following announcement blogpost: https://huggingface.co/blog/gptq-integration

GPTQ integration by @SunMarc in #25062

Pipelines

See below for an example:

from transformers import pipeline

classifier = pipeline(model="suno/bark")
output = pipeline("Hey it's HuggingFace on the phone!")

audio = output["audio"]
sampling_rate = output["sampling_rate"]

Add Text-To-Speech pipeline by @ylacombe in #24952

Classifier-Free Guidance decoding

add CFG for .generate() by @Vermeille in #24654

Task guides

A new task guide going into Visual Question Answering has been added to Transformers.

VQA task guide by @MKhalusova in #25244

Model deprecation

We continue the deprecation of models that was introduced in https://github.com/huggingface/transformers/pull/24787.

Deprecate unused OpenLlama architecture by @tomaarsen in #24922

Translation Efforts

If you'd like to kickstart such an effort or help out on an existing one, please feel free to reach out by opening an issue.

🌐 [i18n-KO] Translatedtasks/document_question_answering.md to Korean by @jungnerd in #24588
🌐 [i18n-KO] Fixed Korean and English quicktour.md by @wonhyeongseo in #24664
🌐 [i18n-KO] Updated Korean serialization.md by @wonhyeongseo in #24686
🌐 [i18n-KO] Translated performance.md to Korean by @augustinLib in #24883
🌐 [i18n-KO] Translated testing.md to Korean by @Sunmin0520 in #24900
🌐 [i18n-KO] Translated perf_train_cpu.md to Korean by @seank021 in #24911
🌐 [i18n-KO] Translated <tf_xla>.md to Korean by @54data in #24904
🌐 [i18n-KO] Translated perf_hardware.md to Korean by @augustinLib in #24966
🌐 [i18n-KO] Translated hpo_train.md to Korean by @harheem in #24968
🌐 [i18n-KO] Translated perf_infer_cpu.md to Korean by @junejae in #24920
🌐 [i18n-KO] Translated pipeline_webserver.md to Korean by @kihoon71 in #24828
🌐 [i18n-KO] Translated transformers_agents.md to Korean by @sim-so in #24881
🌐 [i18n-KO] Translated perf_infer_gpu_many.md to Korean by @heuristicwave in #24943
🌐 [i18n-KO] Translated perf_infer_gpu_one.md to Korean by @eenzeenee in #24978
🌐 [i18n-KO] Translated add_tensorflow_model.md to Korean by @keonju2 in #25017
🌐 [i18n-KO] Translated perf_train_cpu_many.md to Korean by @nuatmochoi in #24923
🌐 [i18n-KO] Translated add_new_model.md to Korean by @mjk0618 in #24957
🌐 [i18n-KO] Translated model_summary.md to Korean by @0525hhgus in #24625
🌐 [i18n-KO] Translated philosophy.md to Korean by @TaeYupNoh in #25010
🌐 [i18n-KO] Translated perf_train_tpu_tf.md to Korean by @0525hhgus in #25433
🌐 [i18n-KO] Translated docs: ko: pr_checks.md to Korean by @sronger in #24987

Explicit input data format for image processing

import numpy as np
from transformers import ViTImageProcessor

img = np.random.randint(0, 256, (4, 6, 3))
image_processor = ViTImageProcessor()
inputs = image_processor(img, image_mean=0, image_std=1, input_data_format="channels_first")

Input data format by @amyeroberts in #25464
Add input_data_format argument, image transforms by @amyeroberts in #25462

Documentation clarification about efficient inference through `torch.scaled_dot_product_attention` & Flash Attention

[Docs / BetterTransformer ] Added more details about flash attention + SDPA : https://github.com/huggingface/transformers/pull/25265

In a nutshell, one can just run:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m").to("cuda")

# convert the model to BetterTransformer
model.to_bettertransformer()

input_text = "Hello my dog is cute and"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
    outputs = model.generate(**inputs)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

to enable Flash-attenion in their model. However, this feature does not support padding yet.

FSDP and DeepSpeed Changes

add util for ram efficient loading of model when using fsdp by @pacman100 in #25107
fix fsdp checkpointing issues by @pacman100 in #24926
fsdp fixes and enhancements by @pacman100 in #24980
fix deepspeed load best model at end when the model gets sharded by @pacman100 in #25057
resolving zero3 init when using accelerate config with Trainer by @pacman100 in #25227
fix z3 init when using accelerate launcher by @pacman100 in #25589

Breaking changes

Default optimizer in the `Trainer` class

The default optimizer in the Trainer class has been updated to be adam_torch rather than our own adam_hf, as the official Torch optimizer is more robust and fixes some issues.

In order to keep the old behavior, ensure that you pass "adamw_hf" as the optim value in your TrainingArguments.

🚨🚨🚨Change default from adamw_hf to adamw_torch 🚨🚨🚨 by @muellerzr in #25109

ViVit and EfficientNet rescale bugfix

🚨🚨🚨 Fix rescale ViVit Efficientnet by @amyeroberts in #25174
🚨🚨🚨 Vivit update default rescale_factor value by @amyeroberts in #25547

Removing softmax for the image classification EfficientNet class

The EfficientNetForImageClassification model class did not follow conventions and added a softmax to the model logits. This was removed so that it respects the convention set by other models.

In order to obtain previous results, pass the model logits through a softmax.

🚨🚨🚨 Remove softmax for EfficientNetForImageClassification 🚨🚨🚨 by @amyeroberts in #25501

Bug fixes with SPM models

An option to obtain the previous behavior was added through the legacy flag, as explained in the PR linked above.

🚨🚨🚨 [SPM] Finish fix spm models 🚨🚨🚨 by @ArthurZucker in #25224

Bugfixes and improvements

Disable ipex env var if false by @muellerzr in #24885
Check for accelerate env var when doing CPU only by @muellerzr in #24890
Avoid some pipeline tasks to use use_cache=True by @ydshieh in #24893
Update tested versions in READMEs by @EliahKagan in #24895
Fix test_model_parallelism for FalconModel by @ydshieh in #24914
Fixed issue where ACCELERATE_USE_CPU="False" results in bool(True) by @madhavajay in #24907
fix typo in BARK_PRETRAINED_MODEL_ARCHIVE_LIST by @21jun in #24902
Fix minor llama2.md model doc typos by @tmc in #24909
[Llama2] replace self.pretraining_tp with self.config.pretraining_tp by @younesbelkada in #24906
[doc] image_processing_vilt.py wrong default documented by @stas00 in #24931
Add multi-label text classification support to pytorch example by @ranchlai in #24770
replace no_cuda with use_cpu in test_pytorch_examples by @statelesshz in #24944
Generate: sequence bias can handle same terminations by @gante in #24822
Update processing_vision_text_dual_encoder.py by @premsa in #24950
Fix main_input_name in src/transformers/keras_callbacks.py by @ydshieh in #24916
[DOCS] Example for LogitsProcessor class by @shauray8 in #24848
fix type annotations for arguments in training_args by @shauray8 in #24550
[RWKV] Add Gradient Checkpointing support for RWKV by @younesbelkada in #24955
Change logic for logging in the examples by @muellerzr in #24956
Contrastive Search peak memory reduction by @blbadger in #24120
Fallback for missing attribute Parameter.ds_numel by @apoorvkh in #24942
fix fsdp checkpointing issues by @pacman100 in #24926
fix: cast input pixels to appropriate dtype for image_to_text pipelines by @JimAllanson in #24947
fsdp fixes and enhancements by @pacman100 in #24980
Fix missing spaces in system prompt of Llama2 tokenizer by @chenjoya in #24930
[LlamaConfig] Nit: pad token should be None by default by @ArthurZucker in #24958
Remove tokenizers from the doc table by @sgugger in #24963
Avoid importing all models when instantiating a pipeline by @sgugger in #24960
Fix type annotation for deepspeed training arg by @sgugger in #24988
Use main_input_name for include_inputs_for_metrics by @sgugger in #24993
Fix llama tokenization doctest by @ydshieh in #24990
[bnb] Add simple check for bnb import by @younesbelkada in #24995
[Llama] remove persistent inv_freq tensor by @ArthurZucker in #24998
improve from_pretrained for zero3 multi gpus mode by @1ytic in #24964
Move template doc file to md by @sgugger in #25004
[check_config_docstrings.py] improve diagnostics by @stas00 in #25012
[logging.py] set default stderr path if None by @ArthurZucker in #25033
fix(integrations): store serialized TrainingArgs to wandb.config without sanitization. by @parambharat in #25035
[docs] Performance docs tidy up, part 1 by @MKhalusova in #23963
Support GatedRepoError + use raise from by @Wauplin in #25034
Better handling missing SYS in llama conversation tokenizer by @ichernev in #24997
Add dispatch_batches to training arguments by @muellerzr in #25038
Fix typo in LlamaTokenizerFast docstring example by @sbrunk in #25018
Make more test models smaller by @sgugger in #25005
Pvt model by @Xrenya in #24720
compute_loss in trainer failing to label shift for PEFT model when label smoothing enabled. by @njbrake in #25044
[8bit] Fix 8bit corner case with Blip2 8bit by @younesbelkada in #25047
Better error message when signal is not supported on OS by @sgugger in #25049
[RWKV] Add note in doc on RwkvStoppingCriteria by @ArthurZucker in #25055
Generate - add beam indices output in contrained beam search by @gante in #25042
[Docs] fix rope_scaling doc string by @kashif in #25072
Fix last models for common tests that are too big. by @sgugger in #25058
fix: add TOC anchor link by @eenzeenee in #25066
Set TF32 flag for PyTorch cuDNN backend by @XuehaiPan in #25075
Fix broken link in README_hd.md by @susnato in #25067
replace per_gpu_eval_batch_size with per_device_eval_batch_size in readme of multiple-choice task by @statelesshz in #25078
[generate] Only warn users if the generation_config's max_length is set to the default value by @ArthurZucker in #25030
Fix: repeat per sample for SAM image embeddings by @xk-huang in #25074
[DOCS] add example NoBadWordsLogitsProcessor by @SoyGema in #25046
Allow generic composite models to pass more kwargs by @ydshieh in #24927
[ ForSequenceClassification] Support left padding by @ArthurZucker in #24979
[TF] Also apply patch to support left padding by @ArthurZucker in #25085
Edit err message and comment in test_model_is_small by @connor-henderson in #25087
[ PreTrainedTokenizerFast] Keep properties from fast tokenizer by @ArthurZucker in #25053
Hotfix for failing MusicgenForConditionalGeneration tests by @ydshieh in #25091
[T5, MT5, UMT5] Add [T5, MT5, UMT5]ForSequenceClassification by @sjrl in #24726
Fix doctest by @ydshieh in #25031
fix tied_params for meta tensor by @SunMarc in #25101
documentation for llama2 models by @shauray8 in #25102
Fix PvtModelIntegrationTest::test_inference_fp16 by @ydshieh in #25106
Add descriptive docstring to TemperatureLogitsWarper by @nablabits in #24892
fix "UserWarning: Creating a tensor from a list of numpy.ndarrays is … by @liucw2012 in #24772
update use_auth_token -> token by @ydshieh in #25083
Fix past CI after #24334 by @ydshieh in #25113
Move common image processing methods to BaseImageProcessor by @amyeroberts in #25089
Fix ViT docstring regarding default dropout values. by @ebezzam in #25118
MaskFormer - enable return_dict in order to compile by @amyeroberts in #25052
Move center_crop to BaseImageProcessor by @amyeroberts in #25122
fix deepspeed load best model at end when the model gets sharded by @pacman100 in #25057
fix delete all checkpoints when save_total_limit is set to 1 by @Pbihao in #25136
[T5/LlamaTokenizer] default legacy to None to not always warn by @ArthurZucker in #25131
Clarify 4/8 bit loading log message by @BramVanroy in #25134
[MptConfig] support from pretrained args by @ArthurZucker in #25116
Add offload support to Bark by @ylacombe in #25037
More token things by @ydshieh in #25146
Add bloom flax by @sanchit-gandhi in #25094
Add new model in doc table of content by @sgugger in #25148
Fix .push_to_hub and cleanup get_full_repo_name usage by @Wauplin in #25120
Add test when downloading from gated repo by @Wauplin in #25039
override .cuda() to check if model is already quantized by @ranchlai in #25166
Represent query_length in a different way to solve jit issue by @jiqing-feng in #25164
make run_generation more generic for other devices by @statelesshz in #25133
added compiled model support for inference by @markovalexander in #25124
Update use_auth_token -> token in example scripts by @ydshieh in #25167
[Mpt] Fix mpt slow test by @younesbelkada in #25170
[InstructBlip] Fix instructblip slow test by @younesbelkada in #25171
Fix beam search to sample at least 1 non eos token by @yonigottesman in #25103)
[MusicGen] Fix integration tests by @sanchit-gandhi in #25169
Musicgen: CFG is manually added by @gante in #25173
Better error message in _prepare_output_docstrings by @ydshieh in #25202
[PreTrainedModel] Wrap cuda and to method correctly by @younesbelkada in #25206
Fix all_model_classes in FlaxBloomGenerationTest by @ydshieh in #25211
[quantization.md] fix by @stas00 in #25190
[pipeline] revisit device check for pipeline by @younesbelkada in #25207
Update tiny model info. and pipeline testing by @ydshieh in #25213
Fix docker image build failure by @ydshieh in #25214
make build_mpt_alibi_tensor a method of MptModel so that deepspeed co… by @sywangyi in #25193
[Pix2Struct] Fix pix2struct cross attention by @younesbelkada in #25200
[Docs/quantization] Clearer explanation on how things works under the hood. + remove outdated info by @younesbelkada in #25216
[MPT] Add require_bitsandbytes on MPT integration tests by @younesbelkada in #25201
[Detr] Fix detr BatchNorm replacement issue by @younesbelkada in #25230
Move rescale dtype recasting to match torchvision ToTensor by @amyeroberts in #25229
Fix set of model parallel in the Trainer when no GPUs are available by @sgugger in #25239
fix get_keys_to_not_convert() to return correct modules for full precision inference by @ranchlai in #25105
add pathname and line number to logging formatter in debug mode by @ranchlai in #25203
Add token arugment in example scripts by @ydshieh in #25172
resolving zero3 init when using accelerate config with Trainer by @pacman100 in #25227
Update rescale tests - cast to float after rescaling to reflect #25229 by @amyeroberts in #25259
Fix some bugs for two stage training of deformable detr by @jypjypjypjyp in #25045
[DOCS] Add example and modified docs of EtaLogitsWarper by @ashishthomaschempolil in #25125
Fix return_dict_in_generate bug in InstructBlip generate function by @eohomegrownapps in #25246
Remove pytest_options={"rA": None} in CI by @ydshieh in #25263
recommend DeepSpeed's Argument Parsing documentation by @BurnzZ in #25268
[MMS] Fix mms by @patrickvonplaten in #25267
CI with num_hidden_layers=2 🚀🚀🚀 by @ydshieh in #25266
CI with pytest_num_workers=8 for torch/tf jobs by @ydshieh in #25274
Docs: Update list of report_to logging integrations in docstring by @tomaarsen in #25281
Update InstructBLIP & Align values after rescale update by @amyeroberts in #25209
Docs: separate generate section by @gante in #25235
Update bark doc by @ylacombe in #25234
add generate method to SpeechT5ForTextToSpeech by @ylacombe in #25233
Add timeout parameter to load_image function by @rolisz in #25184
[JAX] Bump min version by @sanchit-gandhi in #25286
[small] llama2.md typo by @H-Huang in #25295
Fix typo: Roberta -> RoBERTa by @MrGeislinger in #25302
Move usage of deprecated logging.warn to logging.warning by @PeterJCLaw in #25310
Give more memory in test_disk_offload by @sgugger in #25315
Generate: get generation mode as an enum by @gante in #25292
Add offline mode for agents by @sgugger in #25226
Deal with nested configs better in base class by @sgugger in #25237
Document check copies by @sgugger in #25291
Make bark could have tiny model by @ydshieh in #25290
Document toc check and doctest check scripts by @sgugger in #25319
[Whisper] Better error message for outdated generation config by @sanchit-gandhi in #25298
Remove jnp.DeviceArray since it is deprecated. by @mariecwhite in #24875
Update TF pin in docker image by @ydshieh in #25343
Generalize CFG to allow for positive prompts by @oobabooga in #25339
Loosen output shape restrictions on GPT-style models by @calpt in #25188
Allow trust_remote_code in example scripts by @Jackmin801 in #25248
Generate: remove Marian hack by @gante in #25294
Fix more offload edge cases by @ydshieh in #25342
Migrate Trainer from Repository to upload_folder by @sgugger in #25095
Adding more information in help parser on train_file and validation_file by @pphuc25 in #25324
[DOCS] Add NoRepeatNGramLogitsProcessor Example for LogitsProcessor class by @Rishab26 in #25186
Docs: Added benchmarks for torch.compile() for vision models by @merveenoyan in #24748
Add mask2former fp16 support by @pedrohml in #25093
[DOCS] Add descriptive docstring to MinNewTokensLength by @nablabits in #25196
Register ModelOutput subclasses as supported torch.utils._pytree nodes by @ringohoffman in #25358
Fix test_model_parallelism by @ydshieh in #25359
Add warning for missing attention mask when pad tokens are detected by @hackyon in #25345
[ASR Pipeline] Clarify return timestamps by @sanchit-gandhi in #25344
MaskFormer, Mask2Former - replace einsum for tracing by @amyeroberts in #25297
Load state in else by @muellerzr in #25318
Fix token in example template by @ydshieh in #25351
Enable tests to run on third-party devcies by @statelesshz in #25327
Fix torch_job worker(s) crashing by @ydshieh in #25374
Generate: add config-level validation by @gante in #25381
Fix missing usage of token by @ydshieh in #25382
Use small config for OneFormerModelTest.test_model_with_labels by @ydshieh in #25383
Add copied from for image processor methods by @amyeroberts in #25121
change version by @SunMarc in #25387
[DOCS] Add example for TopPLogitsWarper by @chiral-carbon in #25361
16059 - Add missing type hints for ASTModel by @nablabits in #25364
rm useless condition since the previous condition contains it. by @jiqing-feng in #25403
Fix path for dynamic module creation by @sgugger in #25402
YOLOS - Revert default return_pixel_mask value by @amyeroberts in #25404
Docs: introduction to generation with LLMs by @gante in #25240
Generate: length validation by @gante in #25384
Improve training args by @statelesshz in #25401
Generate: generation config validation fixes in docs by @gante in #25405
16059 - Add extra type hints for AltCLIPModel by @nablabits in #25399
Generate: lower severity of parameterization checks by @gante in #25407
Update Bark generation configs and tests by @ylacombe in #25409
aligned sample_beam output selection with beam_search by @hukuda222 in #25375
Enable passing number of channels when inferring data format by @amyeroberts in #25412
Bark: flexible generation config overload by @gante in #25414
[DINOv2] Update pooler output by @NielsRogge in #25392
Doc checks by @sgugger in #25408
Generation: strict generation config validation at save time by @gante in #25411
[WavLM] Fix Arxiv link and authors by @sanchit-gandhi in #25415
Generate: Load generation config when device_map is passed by @gante in #25413
Fix rendering for torch.compile() docs by @merveenoyan in #25432
Add examples to tests to run when setup.py is modified by @ydshieh in #25437
Fix issue with ratio evaluation steps and auto find batch size by @muellerzr in #25436
docs: add LLaMA-Efficient-Tuning to awesome-transformers by @statelesshz in #25441
Fix for #25437 by @ydshieh in #25454
Refactor image processor testers by @amyeroberts in #25450
Switch Transformers: remove overwritten beam sample test by @gante in #25458
Reuse the cache created for latest main on PRs/branches if setup.py is not modified by @ydshieh in #25445
Update run_translation.py broken link example Pytoch by @SoyGema in #25461
Add input_data_format argument, image transforms by @amyeroberts in #25462
Mark flaky tests by @amyeroberts in #25463
Revert "Reuse the cache created for latest main on PRs/branches" by @ydshieh in #25466
import required torch and numpy libraries by @eze1376 in #25483
fix : escape key of start_token from special characters before search end_token in token2json function of DonutProcessor by @nour-elkamel in #25472
Remove logging code in TF Longformer that fails to compile by @Rocketknight1 in #25496
Add type hints to Blip2QFormer, BigBirdForQA and ConditionalDetr family models by @nablabits in #25488
Set can_generate for SpeechT5ForTextToSpeech by @ylacombe in #25493
MaskFormer post_process_instance_segmentation bug fix convert out side of loop by @amyeroberts in #25497
fix gptq nits by @SunMarc in #25500
Conditional DETR type hint fix by @Rocketknight1 in #25505
Check for case where auxiliary_head is None in UperNetPreTrainedModel by @mmurray in #25514
add repr to the BitsAndBytesConfig class by @ranchlai in #25517
Make training args fully immutable by @muellerzr in #25435
Use dynamic past key-values shape in TF-Whisper by @Rocketknight1 in #25523
[TYPO] fix typo/format in quicktour.md by @lishukan in #25519
Fix nested configs of Jukebox by @sgugger in #25533
Marian: post-hack-fix correction by @gante in #25459
Document the test fetcher by @sgugger in #25521
Generate: fix default max length warning by @gante in #25539
fix vit hybrid test by @SunMarc in #25543
Fix MaskFormerModelIntegrationTest OOM by @ydshieh in #25544
More frozen args by @muellerzr in #25540
Input data format by @amyeroberts in #25464
[ASR Pipeline] Fix init with timestamps by @sanchit-gandhi in #25438
More utils doc by @sgugger in #25457
Update trainer.py by @yundai424 in #25553
Add documentation to dynamic module utils by @sgugger in #25534
Fix MPT CI by @ydshieh in #25548
Fix torch.fx tests on nightly CI by @ydshieh in #25549
YOLOS - reset default return_pixel_mask value by @amyeroberts in #25559
Skip test_onnx_runtime_optimize for now by @ydshieh in #25560
[Docs] Fix un-rendered images by @younesbelkada in #25561
Adds TRANSFORMERS_TEST_DEVICE by @vvvm23 in #25506
Skip test_beam_search_xla_generate_simple for T5 by @ydshieh in #25566
[resize_embedding] Introduce pad_to_multiple_of and guidance by @ArthurZucker in #25088
[SwitchTransformers] Remove unused module by @ArthurZucker in #25427
Inconsistency in PreTrainedModel.resize_token_embeddings When ZeRO3 Is Enabled by @sinamoeini in #25394
[NllbMoe] Update code to properly support loss computation by @ArthurZucker in #25429
[Tests] Fix failing 8bit test by @younesbelkada in #25564
Revert "change version by @SunMarc in #25387)"
add util for ram efficient loading of model when using fsdp by @pacman100 in #25107
Skip test_contrastive_generate for TFXLNet by @ydshieh in #25574
add warning for 8bit optimizers by @SunMarc in #25575
Fix typo in example code by @amelietamreymond in #25583
Suggestions on Pipeline_webserver by @kihoon71 in #25570
[Docs / BetterTransformer ] Added more details about flash attention + SDPA by @younesbelkada in #25265
Added missing parenthesis in call to is_fsdp_enabled by @marma in #25585
Replaces calls to .cuda with .to(torch_device) in tests by @vvvm23 in #25571
[split_special_tokens] Add support for split_special_tokens argument to encode by @ArthurZucker in #25081
[Llama] remove prompt and fix prefix finetuning by @ArthurZucker in #25565
[Time series Informer] fix dtype of cumsum by @kashif in #25431
fix z3 init when using accelerate launcher by @pacman100 in #25589
[TokenizerFast] Fix setting prefix space in init by @ArthurZucker in #25563
Make TTS automodels importable by @osanseviero in #25595
reattach hooks when using resize_token_embeddings by @SunMarc in #25596
Ignore all exceptions from signal in dynamic code by @sgugger in #25623
Fix PEFT integration failures on nightly CI by @younesbelkada in #25624
Run doctest for new files by @ydshieh in #25588
Fix test_modeling_mpt typo in model id by @JuanFKurucz in #25606

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@ranchlai
- Add multi-label text classification support to pytorch example (#24770)
- override .cuda() to check if model is already quantized (#25166)
- fix get_keys_to_not_convert() to return correct modules for full precision inference (#25105)
- add pathname and line number to logging formatter in debug mode (#25203)
- add repr to the BitsAndBytesConfig class (#25517)
@wonhyeongseo
- 🌐 [i18n-KO] Fixed Korean and English quicktour.md (#24664)
- 🌐 [i18n-KO] Updated Korean serialization.md (#24686)
@Sunmin0520
- 🌐 [i18n-KO] Translated testing.md to Korean (#24900)
@Xrenya
- Pvt model (#24720)
@susnato
- Fix broken link in README_hd.md (#25067)
- Add Pop2Piano (#21785)
@sjrl
- [T5, MT5, UMT5] Add [T5, MT5, UMT5]ForSequenceClassification (#24726)
@Jackmin801
- Allow trust_remote_code in example scripts (#25248)
@mjk0618
- 🌐 [i18n-KO] Translated add_new_model.md to Korean (#24957)

v4.32.0

IDEFICS

MPT

GPTQ Integration

Pipelines

Classifier-Free Guidance decoding

Task guides

Model deprecation

Translation Efforts

Explicit input data format for image processing

Documentation clarification about efficient inference through torch.scaled_dot_product_attention & Flash Attention

FSDP and DeepSpeed Changes

Breaking changes

Default optimizer in the Trainer class

ViVit and EfficientNet rescale bugfix

Removing softmax for the image classification EfficientNet class

Bug fixes with SPM models

Bugfixes and improvements

Significant community contributions

v4.32.0

IDEFICS

MPT

GPTQ Integration

Pipelines

Classifier-Free Guidance decoding

Task guides

Model deprecation

Translation Efforts

Explicit input data format for image processing

Documentation clarification about efficient inference through torch.scaled_dot_product_attention & Flash Attention

FSDP and DeepSpeed Changes

Breaking changes

Default optimizer in the Trainer class

ViVit and EfficientNet rescale bugfix

Removing softmax for the image classification EfficientNet class

Bug fixes with SPM models

Bugfixes and improvements

Significant community contributions

v4.32.0

IDEFICS

MPT

GPTQ Integration

Pipelines

Classifier-Free Guidance decoding

Task guides

Model deprecation

Translation Efforts

Explicit input data format for image processing

Documentation clarification about efficient inference through torch.scaled_dot_product_attention & Flash Attention

FSDP and DeepSpeed Changes

Breaking changes

Default optimizer in the Trainer class

ViVit and EfficientNet rescale bugfix

Removing softmax for the image classification EfficientNet class

Bug fixes with SPM models

Bugfixes and improvements

Significant community contributions

Documentation clarification about efficient inference through `torch.scaled_dot_product_attention` & Flash Attention

Default optimizer in the `Trainer` class

Documentation clarification about efficient inference through `torch.scaled_dot_product_attention` & Flash Attention

Default optimizer in the `Trainer` class

Documentation clarification about efficient inference through `torch.scaled_dot_product_attention` & Flash Attention

Default optimizer in the `Trainer` class