v4.42.0

release notes

Published 6/27/2024

MinorContains breaking changes

Release notes

New model additions

Gemma-2

The Gemma2 model was proposed in Gemma2: Open Models Based on Gemini Technology and Research by Gemma2 Team, Google. Gemma2 models are trained on 6T tokens, and released with 2 versions, 2b and 7b.

The abstract from the paper is the following:

This work introduces Gemma2, a new family of open language models demonstrating strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma2 outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of our model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations

Add gemma 2 by @ArthurZucker in #31659

RTDETR

The RT-DETR model was proposed in DETRs Beat YOLOs on Real-time Object Detection by Wenyu Lv, Yian Zhao, Shangliang Xu, Jinman Wei, Guanzhong Wang, Cheng Cui, Yuning Du, Qingqing Dang, Yi Liu.

RT-DETR is an object detection model that stands for “Real-Time DEtection Transformer.” This model is designed to perform object detection tasks with a focus on achieving real-time performance while maintaining high accuracy. Leveraging the transformer architecture, which has gained significant popularity in various fields of deep learning, RT-DETR processes images to identify and locate multiple objects within them.

New model support RTDETR by @SangbumChoi in #29077

InstructBlip

The InstructBLIP model was proposed in InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the BLIP-2 architecture for visual instruction tuning.

InstructBLIP uses the same architecture as BLIP-2 with a tiny but important difference: it also feeds the text prompt (instruction) to the Q-Former.

Add video modality for InstrucBLIP by @zucchini-nlp in #30182

LlaVa NeXT Video

The LLaVa-NeXT-Video model was proposed in LLaVA-NeXT: A Strong Zero-shot Video Understanding Model by Yuanhan Zhang, Bo Li, Haotian Liu, Yong Jae Lee, Liangke Gui, Di Fu, Jiashi Feng, Ziwei Liu, Chunyuan Li. LLaVa-NeXT-Video improves upon LLaVa-NeXT by fine-tuning on a mix if video and image dataset thus increasing the model’s performance on videos.

LLaVA-NeXT surprisingly has strong performance in understanding video content in zero-shot fashion with the AnyRes technique that it uses. The AnyRes technique naturally represents a high-resolution image into multiple images. This technique is naturally generalizable to represent videos because videos can be considered as a set of frames (similar to a set of images in LLaVa-NeXT). The current version of LLaVA-NeXT makes use of AnyRes and trains with supervised fine-tuning (SFT) on top of LLaVA-Next on video data to achieves better video understanding capabilities.The model is a current SOTA among open-source models on VideoMME bench.

Add LLaVa NeXT Video by @zucchini-nlp in #31252

New model adder

A very significant change makes its way within the transformers codebase, introducing a new way to add models to transformers. We recommend reading the description of the PR below, but here is the gist of it:

The diff_converter tool is here to replace our old Copied from statements, while keeping our core transformers philosophy:

single model single file

explicit code

standardization of modeling code

readable and educative code

simple code

least amount of modularity

This additionally unlocks the ability to very quickly see the differences between new architectures that get developed. While many architectures are similar, the "single model, single file" policy can obfuscate the changes. With this diff converter, we want to make the changes between architectures very explicit.

Diff converter v2 by @ArthurZucker in #30868

Tool-use and RAG model support

We've made major updates to our support for tool-use and RAG models. We can now automatically generate JSON schema descriptions for Python functions which are suitable for passing to tool models, and we've defined a standard API for tool models which should allow the same tool inputs to be used with many different models. Models will need updates to their chat templates to support the new API, and we're targeting the Nous-Hermes, Command-R and Mistral/Mixtral model families for support in the very near future. Please see the updated chat template docs for more information.

If you are the owner of a model that supports tool use, but you're not sure how to update its chat template to support the new API, feel free to reach out to us for assistance with the update, for example on the Hugging Face Discord server. Ping Matt and yell key phrases like "chat templates" and "Jinja" and your issue will probably get resolved.

Chat Template support for function calling and RAG by @Rocketknight1 in #30621

GGUF support

We further the support of GGUF files to offer fine-tuning within the python/HF ecosystem, before converting them back to the GGUF/GGML/llama.cpp libraries.

Add Qwen2 GGUF loading support by @Isotr0py in #31175
GGUF: Fix llama 3 GGUF by @younesbelkada in #31358
Fix llama gguf converter by @SunMarc in #31575

Trainer improvements

A new optimizer is added in the Trainer.

FEAT / Trainer: LOMO optimizer support by @younesbelkada in #30178

Quantization improvements

Several improvements are done related to quantization: a new cache (the quantized KV cache) is added, offering the ability to convert the cache of generative models, further reducing the memory requirements.

Additionally, the documentation related to quantization is entirely redone with the aim of helping users choose which is the best quantization method.

Quantized KV Cache by @zucchini-nlp in #30483
Docs / Quantization: refactor quantization documentation by @younesbelkada in #30942

Examples

New instance segmentation examples are added by @qubvel

Instance segmentation examples by @qubvel in #31084

Notable improvements

As a notable improvement to the HF vision models that leverage backbones, we enable leveraging HF pretrained model weights as backbones, with the following API:

from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation

config = MaskFormerConfig(backbone="microsoft/resnet-50", use_pretrained_backbone=True)
model = MaskFormerForInstanceSegmentation(config)

Enable HF pretrained backbones by @amyeroberts in #31145

Additionally, we thank @Cyrilvallez for diving into our generate method and greatly reducing the memory requirements.

Reduce by 2 the memory requirement in generate() 🔥🔥🔥 by @Cyrilvallez in #30536

Breaking changes

Remove ConversationalPipeline and Conversation object

Both the ConversationalPipeline and the Conversation object have been deprecated for a while, and are due for removal in 4.42, which is the upcoming version.

The TextGenerationPipeline is recommended for this use-case, and now accepts inputs in the form of the OpenAI API.

🚨 Remove ConversationalPipeline and Conversation object by @Rocketknight1 in #31165

Remove an accidental duplicate softmax application in FLAVA's attention

Removes duplicate softmax application in FLAVA attention. Likely to have a small change on the outputs but flagging with 🚨 as it will change a bit.

🚨 FLAVA: Remove double softmax by @amyeroberts in #31322

Idefics2's `ignore_index` attribute of the loss is updated to `-100`

🚨 [Idefics2] Update ignore index by @NielsRogge in #30898

out_indices from `timm` being updated

Recent updates to timm changed the type of the attribute model.feature_info.out_indices. Previously, out_indices would reflect the input type of out_indices on the create_model call i.e. either tuple or list. Now, this value is always a tuple.

As list are more useful and consistent for us -- we cannot save tuples in configs, they must be converted to lists first -- we instead choose to cast out_indices to always be a list.

This has the possibility of being a slight breaking change if users are creating models and relying on out_indices on being a tuple. As this property only happens when a new model is created, and not if it's saved and reloaded (because of the config), then I think this has a low chance of having much of an impact.

🚨 out_indices always a list by @amyeroberts in #30941

datasets referenced in the quantization config get updated to remove references to datasets with restrictive licenses.

🚨 Remove dataset with restrictive license by @echarlaix in #31452

Bugfixes and improvements

Add fixed resize and pad strategy for object detection by @qubvel in #30742
Enable dynamic resolution input for Swin Transformer and variants by @the-neural-networker in #30656
Add TokenClassification for Mistral, Mixtral and Qwen2 by @josephenguehard in #29878
FIX / Quantization: Fix Dockerfile build by @younesbelkada in #30890
Add support for torch.compile dynamic shapes by @warner-benjamin in #30560
LLaVa-Next: Update docs with batched inference by @zucchini-nlp in #30857
DeformableDETR two stage support bfloat16 by @DonggeunYu in #30907
add return_token_timestamps to WhisperProcessor by @kamilakesbi in #30812
Fix num_hidden_layers in initialization of new model in Mamba by @SrGonao in #30403
separate kwargs in processor (similar to #30193) by @Eric2i in #30905
fix for custom pipeline configuration by @not-lain in #29004
Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM by @ylacombe in #28706
Fix a shape annotation and typos in mamba slow forward by @vasqu in #30691
tokenizer_class = "AutoTokenizer" Llava Family by @ArthurZucker in #30912
Introduce configured_state arg for accelerator_config by @muellerzr in #29781
Add torch.compile for Mistral by @zhenglongjiepheonix in #30642
[docs] Spanish translation of model_memory_anatomy.md by @aaronjimv in #30885
FIX / TST: Fix expected results on Mistral slow test (A10) by @younesbelkada in #30909
PaliGemma - fix processor with no input text by @hiyouga in #30916
CI: AMD MI300 tests fix by @mht-sharma in #30797
Enforce saving at end of training if saving option chosen by @muellerzr in #30160
fix: center_crop occasionally outputs off-by-one dimension matrix by @mattlbeck in #30934
[Benchmark] Reuse optimum-benchmark by @ydshieh in #30615
TST / Workflows: Get slack notifications for docker image build by @younesbelkada in #30891
Fix swin embeddings interpolation by @amyeroberts in #30936
Fix inhomogeneous shape error in example by @Zantares in #30434
update ruff version by @ArthurZucker in #30932
Update build ci image [push-ci-image] by @ArthurZucker in #30933)
Update video-llava docs by @zucchini-nlp in #30935
Fix low cpu mem usage tests by @SunMarc in #30808
[doc] Add references to the fine-tuning blog and distil-whisper to Whisper. by @Vaibhavs10 in #30938
Avoid extra chunk in speech recognition by @jonatanklosko in #29539
[whisper] only trigger forced ids warning once by @sanchit-gandhi in #30966
Paligemma - fix slow tests, add bf16 and f16 slow tests by @molbap in #30851
Finally fix the missing new model failure CI report by @ydshieh in #30968
legacy to init the slow tokenizer when converting from slow was wrong by @ArthurZucker in #30972
Generation: get special tokens from model config by @zucchini-nlp in #30899
[Whisper] Strip prompt before finding common subsequence by @sanchit-gandhi in #27836
Fix link in Pipeline documentation by @junhl in #30948
[Mistral and friends] Update MLP by @NielsRogge in #31057
Paligemma causal attention mask by @molbap in #30967
Update object detection with latest resize and pad strategies by @qubvel in #30955
Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size by @kamilakesbi in #30637
Push ci image by @ArthurZucker in #30982
test_custom_4d_attention_mask skip with sliding window attn by @poedator in #30833
Finish adding support for torch.compile dynamic shapes by @warner-benjamin in #30919
FIX / Docs: Minor changes in quantization docs by @younesbelkada in #30985
Fix accelerate failing tests by @SunMarc in #30836
[tests] add torch.use_deterministic_algorithms for XPU by @faaany in #30774
Add a check that warmup_setps is either 0 or >= 1 by @ymoslem in #30764
Update 4 MptIntegrationTests expected outputs by @ydshieh in #30989
[Port] TensorFlow implementation of Mistral by @ariG23498 in #29708
Remove deprecated properties in tokenization_nllb.py and tokenization_nllb_fast.py by @ymoslem in #29834
Bugfix: WandbCallback uploads initial model checkpoint by @mgerstgrasser in #30897
add prefix space ignored in llama #29625 by @itazap in #30964
Fix training speed regression introduced by "optimize VRAM for calculating pos_bias in LayoutLM v2, v3 by @kkoehncke in #26139)"
Do not trigger autoconversion if local_files_only by @Wauplin in #31004
pin uv==0.1.45 by @ydshieh in #31006
Perceiver interpolate position embedding by @g1y5x3 in #30979
[tests] make test_model_parallelism device-agnostic by @faaany in #30844
FIX / TST: Fix expected results on Mistral AWQ test by @SunMarc in #30971
allow multi-gpu by @ydshieh in #31011
Fix resume_download future warning by @Wauplin in #31007
Quantization / TST: Fix remaining quantization tests by @younesbelkada in #31000
save the list of new model failures by @ydshieh in #31013
added interpolation for vitmae model in pytorch as well as tf. by @bhuvanmdev in #30732
Add split special tokens by @itazap in #30772
Paligemma- fix devices and dtype assignments by @molbap in #31008
Redirect transformers_agents doc to agents by @aymeric-roucher in #31054
unpin uv by @ydshieh in #31055
Follow up: Fix link in dbrx.md by @eitanturok in #30514
Update feature request label in template by @amyeroberts in #30940
Fix quanto tests by @SunMarc in #31062
Fix pad_to_max_length Whisper by @ylacombe in #30787
skip test_model_parallelism for 2 model test classes by @ydshieh in #31067
use [@main](https://github.com/main) by @ydshieh in #31065
Remove ninja from docker image build by @ydshieh in #31080
fix "piano" typo by @clinty in #31027
Update quicktour.md to fix broken link to Glossary by @apalkk in #31072
Remove redundant backend checks in training_args.py by @kevint324 in #30999
fix from_pretrained in offline mode when model is preloaded in cache by @oOraph in #31010
Remove float64 cast for OwlVit and OwlV2 to support MPS device by @qubvel in #31071
Fix OWLv2 post_process_object_detection for multiple images by @qubvel in #31082
Fix typo in trainer.py by @taslimisina in #31048
[SuperPoint, PaliGemma] Update docs by @NielsRogge in #31025
Fix failing tokenizer tests by @LysandreJik in #31083
Watermark: fix tests by @zucchini-nlp in #30961
Docs / PEFT: Add PEFT API documentation by @younesbelkada in #31078
Render chat template tojson filter as unicode by @CISC in #31041
FIX: Add accelerate as a hard requirement by @younesbelkada in #31090
FIX / OPT: Fix OPT multi-GPU training for OPTForQuestionAnswering by @younesbelkada in #31092
skip test_multi_gpu_data_parallel_forward for vit and deit by @ydshieh in #31086
Fix PretrainedConfig docstring with deprecated resume_download by @albertvillanova in #31014
Fix DeepSpeed compatibility with weight_norm by @jonnyli1125 in #30881)
TST: Fix instruct-blip tests by @younesbelkada in #31088
Docs / Quantization: Redirect deleted page by @younesbelkada in #31063
Deprecate low use models by @amyeroberts in #30781
Quantized KV cache: update quanto by @zucchini-nlp in #31052
FEAT: Add mistral v3 conversion script by @younesbelkada in #30981
Use HF_HUB_OFFLINE + fix has_file in offline mode by @Wauplin in #31016
Improve transformers-cli env reporting by @statelesshz in #31003
Fix env.py in cases where torch is not present by @Rocketknight1 in #31113
Fix faulty rstrip in module loading by @Rocketknight1 in #31108
Rm maintainer + migrate by @muellerzr in #31089
Fix nightly circleci by @ydshieh in #31114
FIX / Docs: Fix GPTQ expected number of bits by @younesbelkada in #31111
Add VLM generation default contributor by @gante in #31115
Add on_optimizer_step to callback options by @dhruvbpai in #31095
Cleanup docker build by @ydshieh in #31119
FIX / Quantization: Add extra validation for bnb config by @younesbelkada in #31135
fix get_scheduler when name is warmup_stable_decay by @zspo in #31128
Docs / Quantization: Replace all occurences of load_in_8bit with bnb config by @younesbelkada in #31136
Workflow: Remove IS_GITHUB_CI by @younesbelkada in #31147
helper by @ArthurZucker in #31152
pytest -rsfE by @ydshieh in #31140
Fix quantized cache output by @SunMarc in #31143
Update sam.md by @asifajrof in #31130
Quantization: Enhance bnb error message by @younesbelkada in #31160
[trainer] add sanity evaluation option by @SunMarc in #31146
Add streaming, various fixes by @aymeric-roucher in #30838
Added description of quantization_config by @vamsivallepu in #31133
Fix typo: use_safetenstors to use_safetensors by @CharlesCNorton in #31184
Remove copied froms for deprecated models by @amyeroberts in #31153
Token healing by @ahmed-moubtahij in #30081
[GemmaModel] fix small typo by @ArthurZucker in #31202
Fix Cannot convert [array()] to EagerTensor of dtype int64 by @pavi-ninjaac in #31109
Ignore non-causal mask in more cases with SDPA by @fxmarty in #30138
SlidingWindowCache: reduce differences to other Cache classes by @gante in #30970
Fix test_compile_static_cache by @ydshieh in #30991
fix the get_size_with_aspect_ratio in max_size situation by @SangbumChoi in #30902
Fix typo in utils by @Bojun-Feng in #31169
Rename sanity_evaluation to eval_on_start by @Qubitium in #31192
Wrong translation FR : Contents = Contenu by @jadechoghari in #31186
Cohere: Fix copied from by @younesbelkada in #31213
Set greater_is_better to False if metric_for_best_model ends with "loss" by @miivanov90 in #31142
Fix GPU OOM for mistral.py::Mask4DTestHard by @ydshieh in #31212
[docs] Spanish translation of tokenizer_summary.md by @aaronjimv in #31154
Pass device in Logits Processor's init by @zucchini-nlp in #29804
Fix sentence fragment within test comments by @DomHudson in #31218
fix(PatchTST): Wrong dropout used for PretainHead by @maxstrobel in #31117
Video-LLaVa: handle any number of frames by @zucchini-nlp in #31221
Add dynamic resolution input/interpolate position embedding to deit by @p-kris10 in #31131
fix bf16 issue in text classification pipeline by @chujiezheng in #30996
Fix pipeline tests - torch imports by @amyeroberts in #31227
Add new line switch before logging ***** Running {description} ***** by @jacklanda in #31225
add no split modules for xlmrobertaxl by @ManuelFay in #31223
Fix MistralIntegrationTest by @ydshieh in #31231
Blip: Deprecate BlipModel by @younesbelkada in #31235
Move out common backbone config param validation by @amyeroberts in #31144
Upload (daily) CI results to Hub by @ydshieh in #31168
Specify dtype=torch.bool to avoid xla error by @ysulsky in #31191
Fixing name 'torch' is not defined in bitsandbytes integration by @jamesbraza in #31243
Benchmark GitHub Actions workflow by @ydshieh in #31163
Early labels validation by @amyeroberts in #31240
doc: add info about wav2vec2 bert in older wav2vec2 models. by @Vaibhavs10 in #31120
enable deterministic mode for npu by @statelesshz in #31253
Add missing Flaubert tokenizer tests by @bastrob in #30492
Fix circular reference issue in CLIPTokenizerFast by @dhaivat1729 in #31075
Add condition to benchmark job in push-important-models.yml by @ydshieh in #31259
Skip failing JetMOE generation tests by @amyeroberts in #31266
no need for explicit EXTRA_TOKENS in processing_paligemma.py by @grahamannett in #31022
[SwitchTransformer] Significant performance improvement on MoE blocks by @ranggihwang in #31173
fix loading special_tokens_map_file by @ZhiyuanChen in #31012
Make mamba use cache by @zucchini-nlp in #31116
Generation: fix handling of special tokens by @zucchini-nlp in #31254
Switch from cached_download to hf_hub_download in remaining occurrences by @Wauplin in #31284
fix: str should be used not int when setting env variables by @statelesshz in #31272
Fix _save_tpu: use _maybe_convert_to_cpu instead of to cpu. by @baoleai in #31264
fix accelerate tests for roberta xl by @SunMarc in #31288
Enable dynamic resolution input for Beit by @OmarManzoor in #31053
Mark MobileNetV1ModelTest::test_batching_equivalence as flaky by @amyeroberts in #31258
Pipeline VQA: Add support for list of images and questions as pipeline input by @BlacCod in #31217
Fix SwinLayer / DonutSwinLayer / ClapAudioLayer attention mask device by @gorodnitskiy in #31295
Update text-to-speech.md by @jaguaryang in #31269
Fixed Wav2Vec2ProcessorWithLM decoding error by @karicotiza in #31188
Fix jetmoe model by @Cyrilvallez in #31279
Extend save_pretrained to offloaded models by @blbadger in #27412
Implement JSON dump conversion for torch_dtype in TrainingArguments by @junrae6454 in #31224
interpolation added for TVP. by @bhuvanmdev in #30863
Rename test_model_common_attributes -> test_model_get_set_embeddings by @amyeroberts in #31321
Use unused prepare_img() function in dinov2 conversion script by @IbrahimAmin1 in #31335
docs: fix style by @imba-tjd in #31340
Fix paligemma inverted mask by @molbap in #31207
docs/zh: fix style by @imba-tjd in #31334
Decorators for deprecation and named arguments validation by @qubvel in #30799
Improve error msg when using bitsandbytes by @SunMarc in #31350
Fix Cohere CI by @ydshieh in #31263
Fix gradio tool demos by @aymeric-roucher in #31230
Fast image processor by @amyeroberts in #28847
Add french translation of AutoBackbone by @jadechoghari in #31300
Add support to declare imports for code agent by @JasonZhu1313 in #31355
Fix idefics cache by @zucchini-nlp in #31377
[Bug Fix] Renamed loss to losses to suppress UnboundLocalError by @her0e1c1 in #31365
docs: fix broken link by @imba-tjd in #31370
backbone_utils - fix relative import by @amyeroberts in #31382
README underline between badges fix by @novialriptide in #31376
Update comment in modeling_utils.py by @inf3rnus in #31299
Use huggingface_hub helper function to split state dict by @SunMarc in #31091
Change JSON serialization to custom json.dumps by @junrae6454 in #31100
feat(ci): add trufflehog secrets detection by @McPatate in #31344
[QoL fix] [Image processing] Add warning on assumption of channel dim and avoid infering when inputs are PIL.Image by @aliencaocao in #31364
Make chat templates part of ProcessorMixin by @Rocketknight1 in #30744
add initial design for uniform processors + align model by @molbap in #31197
Add missing French translation of tutoriel_pipeline.md by @jadechoghari in #31396
Temporarily pin datasets upper version to fix CI by @albertvillanova in #31407
Support Clip QKV for MPT by @akakakakakaa in #31307
Pin datasets<2.20.0 for examples by @amyeroberts in #31417
Fix MusicGen SDPA by @ylacombe in #31208
Set seed for M4T retain grad test by @ylacombe in #31419
Fix SpeechT5 decoder_attention_mask shape by @ylacombe in #28071
Change potential inputs_embeds padding logger.warning to logger.warning_once by @naimenz in #31411
Remove duplicate image processor in auto map by @amyeroberts in #31383
Install the tensorflow example requirements in docker by @amyeroberts in #31428
Remove empty create_and_test_config_common_properties tests by @amyeroberts in #31359
xpu: support xpu backend from stock pytorch (>=2.4) by @dvrogozh in #31238
Musicgen special tokens in tensors by @zucchini-nlp in #31420
Fix Bark logits processors device misplacement by @ylacombe in #31416
Rename misnamed image processor test files by @amyeroberts in #31430
Generate: fix tokenizer being popped twice by @gante in #31427
[tests] make TestDeepSpeedModelZoo device-agnostic by @faaany in #31402
Support multiple validation datasets when dataloader_persistent_workers=True by @bastienlc in #30627
Pass datasets trust_remote_code by @albertvillanova in #31406
simple fix by @tokenizer-decode in #31456
Fix typing errors in Qwen2ForTokenClassification by @kevinhu in #31440
Agents: Improve python interpreter by @aymeric-roucher in #31409
Donut: fix generate call from local path by @gante in #31470
Make "tool_use" the default chat template key when tools are passed by @Rocketknight1 in #31429
Fix single letter stop strings by @Rocketknight1 in #31448
Update chat template docs and bump Jinja version by @Rocketknight1 in #31455
Improve PreTrainedTokenizerFast loading time when there are many added tokens by @ydshieh in #31404
Fix documentation typos by @qgallouedec in #31476
Give more useful metric_for_best_model errors by @tomaarsen in #31450
Update perf_train_gpu_many.md by @remyleone in #31451
[GPT2] Add SDPA support by @vasqu in #31172
Fix autocast incompatibility in RecurrentGemma by @xplip in #30832
Use self.config_tester.run_common_tests() by @amyeroberts in #31431
[tests] rename test_config_object to test_ds_config_object by @faaany in #31403
Docs / AQLM: Clarify torch.compile support for AQLM by @younesbelkada in #31473
Mamba: add generative tests by @gante in #31478
Update object_detection.md by @jajupmochi in #31488
Add docs on zeroshot image classification prompt templates by @aliencaocao in #31343
auto-detect device when no device is passed to pipeline by @faaany in #31398
Fix typo: pas_token_id by @ftnext in #30894
Fix wandb integration with SetFit model by @timothepearce in #30021
Consider inheritance in type checking for tensors by @daemyung in #31378
Add valid columns check in _remove_unused_columns method by @arthasking123 in #31466
Fix a teeny-tiny typo in tokenization_utils_base.py's docstring by @sadra-barikbin in #31510
Fix mismatched ` in doc & other common typos by @jhwei in #31516
RWKV: enable generation tests by @gante in #31490
unskip 2 tests in cohere by @ydshieh in #31517
Revive Nightly/Past CI by @ydshieh in #31159
Deprecate legacy cache + use cache position by @zucchini-nlp in #31491
SPLIT PR: add user defined symbols and control symbols by @itazap in #31305
Removed torch.cuda.empty_cache from train loop. by @FoamoftheSea in #31530
Update mask_generation.md by @nicholicaron in #31543
Correct @is_flaky test decoration by @qubvel in #31480
Add implementation of spectrogram_batch by @ravenouse in #27159
chore: fix typos by @xiaoxianBoy in #31559
Update git templates by @ArthurZucker in #31539
Fix the error caused by incorrect use of logger in pipeline by @lanyun1103 in #31565
Fix bug about add_special_tokens and so on by @hiroshi-matsuda-rit in #31496
Add Jinja as a requirement with the right version cutoff by @Rocketknight1 in #31536
Fix doc typo in TrainingArguments by @qgallouedec in #31503
Fix is_torch_xpu_available for torch < 2.3 by @amyeroberts in #31573
Added version constraint on numpy for version <2.0 by @Resteklicken in #31569
Siglip: add _no_split_module by @zucchini-nlp in #31566
fix output data type of image classification by @jiqing-feng in #31444
add preprocessing_num_workers to run_classification.py by @jiahuanluo in #31586
Improve error message for mismatched copies in code blocks by @molbap in #31535
Add ViTImageProcessorFast to tests by @amyeroberts in #31424
docs: move translations to i18n by @SauravMaheshkar in #31584
Removed unnecessary self.projection call in VivitTubeletEmbeddings by @v-iashin in #31632
[GPT-NeoX] Add SDPA support by @vasqu in #31031
Update RT-DETR code snippet by @qubvel in #31631
Llama et al. / FSDP : Fix breaking change in 4.40 for FSDP by @younesbelkada in #31161
Fix RT-DETR inference with float16 and bfloat16 by @qubvel in #31639
Fix paligemma detection inference by @molbap in #31587
Generate: fix assisted generation with past_key_values passed as kwargs by @gante in #31644
Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference by @aliencaocao in #31589
Skip tests properly by @amyeroberts in #31308
Generation: past kv can be None by @zucchini-nlp in #31051
Fix ONNX exports for Optimum compatible models by @merveenoyan in #31311

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@josephenguehard
- Add TokenClassification for Mistral, Mixtral and Qwen2 (#29878)
@vasqu
- Fix a shape annotation and typos in mamba slow forward (#30691)
- [GPT2] Add SDPA support (#31172)
- [GPT-NeoX] Add SDPA support (#31031)
@ariG23498
- [Port] TensorFlow implementation of Mistral (#29708)
@bhuvanmdev
- added interpolation for vitmae model in pytorch as well as tf. (#30732)
- interpolation added for TVP. (#30863)
@SangbumChoi
- fix the get_size_with_aspect_ratio in max_size situation (#30902)
- New model support RTDETR (#29077)
@Cyrilvallez
- Reduce by 2 the memory requirement in generate() 🔥🔥🔥 (#30536)
- Fix jetmoe model (#31279)
@ravenouse
- Add implementation of spectrogram_batch (#27159)

Latest release

Version v5.5.0is out. See relase notes.

v4.42.0

release notes

Published 6/27/2024

MinorContains breaking changes

Release notes

New model additions

Gemma-2

The Gemma2 model was proposed in Gemma2: Open Models Based on Gemini Technology and Research by Gemma2 Team, Google. Gemma2 models are trained on 6T tokens, and released with 2 versions, 2b and 7b.

The abstract from the paper is the following:

Add gemma 2 by @ArthurZucker in #31659

RTDETR

The RT-DETR model was proposed in DETRs Beat YOLOs on Real-time Object Detection by Wenyu Lv, Yian Zhao, Shangliang Xu, Jinman Wei, Guanzhong Wang, Cheng Cui, Yuning Du, Qingqing Dang, Yi Liu.

New model support RTDETR by @SangbumChoi in #29077

InstructBlip

InstructBLIP uses the same architecture as BLIP-2 with a tiny but important difference: it also feeds the text prompt (instruction) to the Q-Former.

Add video modality for InstrucBLIP by @zucchini-nlp in #30182

LlaVa NeXT Video

Add LLaVa NeXT Video by @zucchini-nlp in #31252

New model adder

The diff_converter tool is here to replace our old Copied from statements, while keeping our core transformers philosophy:

single model single file

explicit code

standardization of modeling code

readable and educative code

simple code

least amount of modularity

This additionally unlocks the ability to very quickly see the differences between new architectures that get developed. While many architectures are similar, the "single model, single file" policy can obfuscate the changes. With this diff converter, we want to make the changes between architectures very explicit.

Diff converter v2 by @ArthurZucker in #30868

Tool-use and RAG model support

Chat Template support for function calling and RAG by @Rocketknight1 in #30621

GGUF support

We further the support of GGUF files to offer fine-tuning within the python/HF ecosystem, before converting them back to the GGUF/GGML/llama.cpp libraries.

Add Qwen2 GGUF loading support by @Isotr0py in #31175
GGUF: Fix llama 3 GGUF by @younesbelkada in #31358
Fix llama gguf converter by @SunMarc in #31575

Trainer improvements

A new optimizer is added in the Trainer.

FEAT / Trainer: LOMO optimizer support by @younesbelkada in #30178

Quantization improvements

Additionally, the documentation related to quantization is entirely redone with the aim of helping users choose which is the best quantization method.

Quantized KV Cache by @zucchini-nlp in #30483
Docs / Quantization: refactor quantization documentation by @younesbelkada in #30942

Examples

New instance segmentation examples are added by @qubvel

Instance segmentation examples by @qubvel in #31084

Notable improvements

As a notable improvement to the HF vision models that leverage backbones, we enable leveraging HF pretrained model weights as backbones, with the following API:

from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation

config = MaskFormerConfig(backbone="microsoft/resnet-50", use_pretrained_backbone=True)
model = MaskFormerForInstanceSegmentation(config)

Enable HF pretrained backbones by @amyeroberts in #31145

Additionally, we thank @Cyrilvallez for diving into our generate method and greatly reducing the memory requirements.

Reduce by 2 the memory requirement in generate() 🔥🔥🔥 by @Cyrilvallez in #30536

Breaking changes

Remove ConversationalPipeline and Conversation object

Both the ConversationalPipeline and the Conversation object have been deprecated for a while, and are due for removal in 4.42, which is the upcoming version.

The TextGenerationPipeline is recommended for this use-case, and now accepts inputs in the form of the OpenAI API.

🚨 Remove ConversationalPipeline and Conversation object by @Rocketknight1 in #31165

Remove an accidental duplicate softmax application in FLAVA's attention

Removes duplicate softmax application in FLAVA attention. Likely to have a small change on the outputs but flagging with 🚨 as it will change a bit.

🚨 FLAVA: Remove double softmax by @amyeroberts in #31322

Idefics2's `ignore_index` attribute of the loss is updated to `-100`

🚨 [Idefics2] Update ignore index by @NielsRogge in #30898

out_indices from `timm` being updated

As list are more useful and consistent for us -- we cannot save tuples in configs, they must be converted to lists first -- we instead choose to cast out_indices to always be a list.

🚨 out_indices always a list by @amyeroberts in #30941

datasets referenced in the quantization config get updated to remove references to datasets with restrictive licenses.

🚨 Remove dataset with restrictive license by @echarlaix in #31452

Bugfixes and improvements

Add fixed resize and pad strategy for object detection by @qubvel in #30742
Enable dynamic resolution input for Swin Transformer and variants by @the-neural-networker in #30656
Add TokenClassification for Mistral, Mixtral and Qwen2 by @josephenguehard in #29878
FIX / Quantization: Fix Dockerfile build by @younesbelkada in #30890
Add support for torch.compile dynamic shapes by @warner-benjamin in #30560
LLaVa-Next: Update docs with batched inference by @zucchini-nlp in #30857
DeformableDETR two stage support bfloat16 by @DonggeunYu in #30907
add return_token_timestamps to WhisperProcessor by @kamilakesbi in #30812
Fix num_hidden_layers in initialization of new model in Mamba by @SrGonao in #30403
separate kwargs in processor (similar to #30193) by @Eric2i in #30905
fix for custom pipeline configuration by @not-lain in #29004
Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM by @ylacombe in #28706
Fix a shape annotation and typos in mamba slow forward by @vasqu in #30691
tokenizer_class = "AutoTokenizer" Llava Family by @ArthurZucker in #30912
Introduce configured_state arg for accelerator_config by @muellerzr in #29781
Add torch.compile for Mistral by @zhenglongjiepheonix in #30642
[docs] Spanish translation of model_memory_anatomy.md by @aaronjimv in #30885
FIX / TST: Fix expected results on Mistral slow test (A10) by @younesbelkada in #30909
PaliGemma - fix processor with no input text by @hiyouga in #30916
CI: AMD MI300 tests fix by @mht-sharma in #30797
Enforce saving at end of training if saving option chosen by @muellerzr in #30160
fix: center_crop occasionally outputs off-by-one dimension matrix by @mattlbeck in #30934
[Benchmark] Reuse optimum-benchmark by @ydshieh in #30615
TST / Workflows: Get slack notifications for docker image build by @younesbelkada in #30891
Fix swin embeddings interpolation by @amyeroberts in #30936
Fix inhomogeneous shape error in example by @Zantares in #30434
update ruff version by @ArthurZucker in #30932
Update build ci image [push-ci-image] by @ArthurZucker in #30933)
Update video-llava docs by @zucchini-nlp in #30935
Fix low cpu mem usage tests by @SunMarc in #30808
[doc] Add references to the fine-tuning blog and distil-whisper to Whisper. by @Vaibhavs10 in #30938
Avoid extra chunk in speech recognition by @jonatanklosko in #29539
[whisper] only trigger forced ids warning once by @sanchit-gandhi in #30966
Paligemma - fix slow tests, add bf16 and f16 slow tests by @molbap in #30851
Finally fix the missing new model failure CI report by @ydshieh in #30968
legacy to init the slow tokenizer when converting from slow was wrong by @ArthurZucker in #30972
Generation: get special tokens from model config by @zucchini-nlp in #30899
[Whisper] Strip prompt before finding common subsequence by @sanchit-gandhi in #27836
Fix link in Pipeline documentation by @junhl in #30948
[Mistral and friends] Update MLP by @NielsRogge in #31057
Paligemma causal attention mask by @molbap in #30967
Update object detection with latest resize and pad strategies by @qubvel in #30955
Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size by @kamilakesbi in #30637
Push ci image by @ArthurZucker in #30982
test_custom_4d_attention_mask skip with sliding window attn by @poedator in #30833
Finish adding support for torch.compile dynamic shapes by @warner-benjamin in #30919
FIX / Docs: Minor changes in quantization docs by @younesbelkada in #30985
Fix accelerate failing tests by @SunMarc in #30836
[tests] add torch.use_deterministic_algorithms for XPU by @faaany in #30774
Add a check that warmup_setps is either 0 or >= 1 by @ymoslem in #30764
Update 4 MptIntegrationTests expected outputs by @ydshieh in #30989
[Port] TensorFlow implementation of Mistral by @ariG23498 in #29708
Remove deprecated properties in tokenization_nllb.py and tokenization_nllb_fast.py by @ymoslem in #29834
Bugfix: WandbCallback uploads initial model checkpoint by @mgerstgrasser in #30897
add prefix space ignored in llama #29625 by @itazap in #30964
Fix training speed regression introduced by "optimize VRAM for calculating pos_bias in LayoutLM v2, v3 by @kkoehncke in #26139)"
Do not trigger autoconversion if local_files_only by @Wauplin in #31004
pin uv==0.1.45 by @ydshieh in #31006
Perceiver interpolate position embedding by @g1y5x3 in #30979
[tests] make test_model_parallelism device-agnostic by @faaany in #30844
FIX / TST: Fix expected results on Mistral AWQ test by @SunMarc in #30971
allow multi-gpu by @ydshieh in #31011
Fix resume_download future warning by @Wauplin in #31007
Quantization / TST: Fix remaining quantization tests by @younesbelkada in #31000
save the list of new model failures by @ydshieh in #31013
added interpolation for vitmae model in pytorch as well as tf. by @bhuvanmdev in #30732
Add split special tokens by @itazap in #30772
Paligemma- fix devices and dtype assignments by @molbap in #31008
Redirect transformers_agents doc to agents by @aymeric-roucher in #31054
unpin uv by @ydshieh in #31055
Follow up: Fix link in dbrx.md by @eitanturok in #30514
Update feature request label in template by @amyeroberts in #30940
Fix quanto tests by @SunMarc in #31062
Fix pad_to_max_length Whisper by @ylacombe in #30787
skip test_model_parallelism for 2 model test classes by @ydshieh in #31067
use [@main](https://github.com/main) by @ydshieh in #31065
Remove ninja from docker image build by @ydshieh in #31080
fix "piano" typo by @clinty in #31027
Update quicktour.md to fix broken link to Glossary by @apalkk in #31072
Remove redundant backend checks in training_args.py by @kevint324 in #30999
fix from_pretrained in offline mode when model is preloaded in cache by @oOraph in #31010
Remove float64 cast for OwlVit and OwlV2 to support MPS device by @qubvel in #31071
Fix OWLv2 post_process_object_detection for multiple images by @qubvel in #31082
Fix typo in trainer.py by @taslimisina in #31048
[SuperPoint, PaliGemma] Update docs by @NielsRogge in #31025
Fix failing tokenizer tests by @LysandreJik in #31083
Watermark: fix tests by @zucchini-nlp in #30961
Docs / PEFT: Add PEFT API documentation by @younesbelkada in #31078
Render chat template tojson filter as unicode by @CISC in #31041
FIX: Add accelerate as a hard requirement by @younesbelkada in #31090
FIX / OPT: Fix OPT multi-GPU training for OPTForQuestionAnswering by @younesbelkada in #31092
skip test_multi_gpu_data_parallel_forward for vit and deit by @ydshieh in #31086
Fix PretrainedConfig docstring with deprecated resume_download by @albertvillanova in #31014
Fix DeepSpeed compatibility with weight_norm by @jonnyli1125 in #30881)
TST: Fix instruct-blip tests by @younesbelkada in #31088
Docs / Quantization: Redirect deleted page by @younesbelkada in #31063
Deprecate low use models by @amyeroberts in #30781
Quantized KV cache: update quanto by @zucchini-nlp in #31052
FEAT: Add mistral v3 conversion script by @younesbelkada in #30981
Use HF_HUB_OFFLINE + fix has_file in offline mode by @Wauplin in #31016
Improve transformers-cli env reporting by @statelesshz in #31003
Fix env.py in cases where torch is not present by @Rocketknight1 in #31113
Fix faulty rstrip in module loading by @Rocketknight1 in #31108
Rm maintainer + migrate by @muellerzr in #31089
Fix nightly circleci by @ydshieh in #31114
FIX / Docs: Fix GPTQ expected number of bits by @younesbelkada in #31111
Add VLM generation default contributor by @gante in #31115
Add on_optimizer_step to callback options by @dhruvbpai in #31095
Cleanup docker build by @ydshieh in #31119
FIX / Quantization: Add extra validation for bnb config by @younesbelkada in #31135
fix get_scheduler when name is warmup_stable_decay by @zspo in #31128
Docs / Quantization: Replace all occurences of load_in_8bit with bnb config by @younesbelkada in #31136
Workflow: Remove IS_GITHUB_CI by @younesbelkada in #31147
helper by @ArthurZucker in #31152
pytest -rsfE by @ydshieh in #31140
Fix quantized cache output by @SunMarc in #31143
Update sam.md by @asifajrof in #31130
Quantization: Enhance bnb error message by @younesbelkada in #31160
[trainer] add sanity evaluation option by @SunMarc in #31146
Add streaming, various fixes by @aymeric-roucher in #30838
Added description of quantization_config by @vamsivallepu in #31133
Fix typo: use_safetenstors to use_safetensors by @CharlesCNorton in #31184
Remove copied froms for deprecated models by @amyeroberts in #31153
Token healing by @ahmed-moubtahij in #30081
[GemmaModel] fix small typo by @ArthurZucker in #31202
Fix Cannot convert [array()] to EagerTensor of dtype int64 by @pavi-ninjaac in #31109
Ignore non-causal mask in more cases with SDPA by @fxmarty in #30138
SlidingWindowCache: reduce differences to other Cache classes by @gante in #30970
Fix test_compile_static_cache by @ydshieh in #30991
fix the get_size_with_aspect_ratio in max_size situation by @SangbumChoi in #30902
Fix typo in utils by @Bojun-Feng in #31169
Rename sanity_evaluation to eval_on_start by @Qubitium in #31192
Wrong translation FR : Contents = Contenu by @jadechoghari in #31186
Cohere: Fix copied from by @younesbelkada in #31213
Set greater_is_better to False if metric_for_best_model ends with "loss" by @miivanov90 in #31142
Fix GPU OOM for mistral.py::Mask4DTestHard by @ydshieh in #31212
[docs] Spanish translation of tokenizer_summary.md by @aaronjimv in #31154
Pass device in Logits Processor's init by @zucchini-nlp in #29804
Fix sentence fragment within test comments by @DomHudson in #31218
fix(PatchTST): Wrong dropout used for PretainHead by @maxstrobel in #31117
Video-LLaVa: handle any number of frames by @zucchini-nlp in #31221
Add dynamic resolution input/interpolate position embedding to deit by @p-kris10 in #31131
fix bf16 issue in text classification pipeline by @chujiezheng in #30996
Fix pipeline tests - torch imports by @amyeroberts in #31227
Add new line switch before logging ***** Running {description} ***** by @jacklanda in #31225
add no split modules for xlmrobertaxl by @ManuelFay in #31223
Fix MistralIntegrationTest by @ydshieh in #31231
Blip: Deprecate BlipModel by @younesbelkada in #31235
Move out common backbone config param validation by @amyeroberts in #31144
Upload (daily) CI results to Hub by @ydshieh in #31168
Specify dtype=torch.bool to avoid xla error by @ysulsky in #31191
Fixing name 'torch' is not defined in bitsandbytes integration by @jamesbraza in #31243
Benchmark GitHub Actions workflow by @ydshieh in #31163
Early labels validation by @amyeroberts in #31240
doc: add info about wav2vec2 bert in older wav2vec2 models. by @Vaibhavs10 in #31120
enable deterministic mode for npu by @statelesshz in #31253
Add missing Flaubert tokenizer tests by @bastrob in #30492
Fix circular reference issue in CLIPTokenizerFast by @dhaivat1729 in #31075
Add condition to benchmark job in push-important-models.yml by @ydshieh in #31259
Skip failing JetMOE generation tests by @amyeroberts in #31266
no need for explicit EXTRA_TOKENS in processing_paligemma.py by @grahamannett in #31022
[SwitchTransformer] Significant performance improvement on MoE blocks by @ranggihwang in #31173
fix loading special_tokens_map_file by @ZhiyuanChen in #31012
Make mamba use cache by @zucchini-nlp in #31116
Generation: fix handling of special tokens by @zucchini-nlp in #31254
Switch from cached_download to hf_hub_download in remaining occurrences by @Wauplin in #31284
fix: str should be used not int when setting env variables by @statelesshz in #31272
Fix _save_tpu: use _maybe_convert_to_cpu instead of to cpu. by @baoleai in #31264
fix accelerate tests for roberta xl by @SunMarc in #31288
Enable dynamic resolution input for Beit by @OmarManzoor in #31053
Mark MobileNetV1ModelTest::test_batching_equivalence as flaky by @amyeroberts in #31258
Pipeline VQA: Add support for list of images and questions as pipeline input by @BlacCod in #31217
Fix SwinLayer / DonutSwinLayer / ClapAudioLayer attention mask device by @gorodnitskiy in #31295
Update text-to-speech.md by @jaguaryang in #31269
Fixed Wav2Vec2ProcessorWithLM decoding error by @karicotiza in #31188
Fix jetmoe model by @Cyrilvallez in #31279
Extend save_pretrained to offloaded models by @blbadger in #27412
Implement JSON dump conversion for torch_dtype in TrainingArguments by @junrae6454 in #31224
interpolation added for TVP. by @bhuvanmdev in #30863
Rename test_model_common_attributes -> test_model_get_set_embeddings by @amyeroberts in #31321
Use unused prepare_img() function in dinov2 conversion script by @IbrahimAmin1 in #31335
docs: fix style by @imba-tjd in #31340
Fix paligemma inverted mask by @molbap in #31207
docs/zh: fix style by @imba-tjd in #31334
Decorators for deprecation and named arguments validation by @qubvel in #30799
Improve error msg when using bitsandbytes by @SunMarc in #31350
Fix Cohere CI by @ydshieh in #31263
Fix gradio tool demos by @aymeric-roucher in #31230
Fast image processor by @amyeroberts in #28847
Add french translation of AutoBackbone by @jadechoghari in #31300
Add support to declare imports for code agent by @JasonZhu1313 in #31355
Fix idefics cache by @zucchini-nlp in #31377
[Bug Fix] Renamed loss to losses to suppress UnboundLocalError by @her0e1c1 in #31365
docs: fix broken link by @imba-tjd in #31370
backbone_utils - fix relative import by @amyeroberts in #31382
README underline between badges fix by @novialriptide in #31376
Update comment in modeling_utils.py by @inf3rnus in #31299
Use huggingface_hub helper function to split state dict by @SunMarc in #31091
Change JSON serialization to custom json.dumps by @junrae6454 in #31100
feat(ci): add trufflehog secrets detection by @McPatate in #31344
[QoL fix] [Image processing] Add warning on assumption of channel dim and avoid infering when inputs are PIL.Image by @aliencaocao in #31364
Make chat templates part of ProcessorMixin by @Rocketknight1 in #30744
add initial design for uniform processors + align model by @molbap in #31197
Add missing French translation of tutoriel_pipeline.md by @jadechoghari in #31396
Temporarily pin datasets upper version to fix CI by @albertvillanova in #31407
Support Clip QKV for MPT by @akakakakakaa in #31307
Pin datasets<2.20.0 for examples by @amyeroberts in #31417
Fix MusicGen SDPA by @ylacombe in #31208
Set seed for M4T retain grad test by @ylacombe in #31419
Fix SpeechT5 decoder_attention_mask shape by @ylacombe in #28071
Change potential inputs_embeds padding logger.warning to logger.warning_once by @naimenz in #31411
Remove duplicate image processor in auto map by @amyeroberts in #31383
Install the tensorflow example requirements in docker by @amyeroberts in #31428
Remove empty create_and_test_config_common_properties tests by @amyeroberts in #31359
xpu: support xpu backend from stock pytorch (>=2.4) by @dvrogozh in #31238
Musicgen special tokens in tensors by @zucchini-nlp in #31420
Fix Bark logits processors device misplacement by @ylacombe in #31416
Rename misnamed image processor test files by @amyeroberts in #31430
Generate: fix tokenizer being popped twice by @gante in #31427
[tests] make TestDeepSpeedModelZoo device-agnostic by @faaany in #31402
Support multiple validation datasets when dataloader_persistent_workers=True by @bastienlc in #30627
Pass datasets trust_remote_code by @albertvillanova in #31406
simple fix by @tokenizer-decode in #31456
Fix typing errors in Qwen2ForTokenClassification by @kevinhu in #31440
Agents: Improve python interpreter by @aymeric-roucher in #31409
Donut: fix generate call from local path by @gante in #31470
Make "tool_use" the default chat template key when tools are passed by @Rocketknight1 in #31429
Fix single letter stop strings by @Rocketknight1 in #31448
Update chat template docs and bump Jinja version by @Rocketknight1 in #31455
Improve PreTrainedTokenizerFast loading time when there are many added tokens by @ydshieh in #31404
Fix documentation typos by @qgallouedec in #31476
Give more useful metric_for_best_model errors by @tomaarsen in #31450
Update perf_train_gpu_many.md by @remyleone in #31451
[GPT2] Add SDPA support by @vasqu in #31172
Fix autocast incompatibility in RecurrentGemma by @xplip in #30832
Use self.config_tester.run_common_tests() by @amyeroberts in #31431
[tests] rename test_config_object to test_ds_config_object by @faaany in #31403
Docs / AQLM: Clarify torch.compile support for AQLM by @younesbelkada in #31473
Mamba: add generative tests by @gante in #31478
Update object_detection.md by @jajupmochi in #31488
Add docs on zeroshot image classification prompt templates by @aliencaocao in #31343
auto-detect device when no device is passed to pipeline by @faaany in #31398
Fix typo: pas_token_id by @ftnext in #30894
Fix wandb integration with SetFit model by @timothepearce in #30021
Consider inheritance in type checking for tensors by @daemyung in #31378
Add valid columns check in _remove_unused_columns method by @arthasking123 in #31466
Fix a teeny-tiny typo in tokenization_utils_base.py's docstring by @sadra-barikbin in #31510
Fix mismatched ` in doc & other common typos by @jhwei in #31516
RWKV: enable generation tests by @gante in #31490
unskip 2 tests in cohere by @ydshieh in #31517
Revive Nightly/Past CI by @ydshieh in #31159
Deprecate legacy cache + use cache position by @zucchini-nlp in #31491
SPLIT PR: add user defined symbols and control symbols by @itazap in #31305
Removed torch.cuda.empty_cache from train loop. by @FoamoftheSea in #31530
Update mask_generation.md by @nicholicaron in #31543
Correct @is_flaky test decoration by @qubvel in #31480
Add implementation of spectrogram_batch by @ravenouse in #27159
chore: fix typos by @xiaoxianBoy in #31559
Update git templates by @ArthurZucker in #31539
Fix the error caused by incorrect use of logger in pipeline by @lanyun1103 in #31565
Fix bug about add_special_tokens and so on by @hiroshi-matsuda-rit in #31496
Add Jinja as a requirement with the right version cutoff by @Rocketknight1 in #31536
Fix doc typo in TrainingArguments by @qgallouedec in #31503
Fix is_torch_xpu_available for torch < 2.3 by @amyeroberts in #31573
Added version constraint on numpy for version <2.0 by @Resteklicken in #31569
Siglip: add _no_split_module by @zucchini-nlp in #31566
fix output data type of image classification by @jiqing-feng in #31444
add preprocessing_num_workers to run_classification.py by @jiahuanluo in #31586
Improve error message for mismatched copies in code blocks by @molbap in #31535
Add ViTImageProcessorFast to tests by @amyeroberts in #31424
docs: move translations to i18n by @SauravMaheshkar in #31584
Removed unnecessary self.projection call in VivitTubeletEmbeddings by @v-iashin in #31632
[GPT-NeoX] Add SDPA support by @vasqu in #31031
Update RT-DETR code snippet by @qubvel in #31631
Llama et al. / FSDP : Fix breaking change in 4.40 for FSDP by @younesbelkada in #31161
Fix RT-DETR inference with float16 and bfloat16 by @qubvel in #31639
Fix paligemma detection inference by @molbap in #31587
Generate: fix assisted generation with past_key_values passed as kwargs by @gante in #31644
Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference by @aliencaocao in #31589
Skip tests properly by @amyeroberts in #31308
Generation: past kv can be None by @zucchini-nlp in #31051
Fix ONNX exports for Optimum compatible models by @merveenoyan in #31311

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@josephenguehard
- Add TokenClassification for Mistral, Mixtral and Qwen2 (#29878)
@vasqu
- Fix a shape annotation and typos in mamba slow forward (#30691)
- [GPT2] Add SDPA support (#31172)
- [GPT-NeoX] Add SDPA support (#31031)
@ariG23498
- [Port] TensorFlow implementation of Mistral (#29708)
@bhuvanmdev
- added interpolation for vitmae model in pytorch as well as tf. (#30732)
- interpolation added for TVP. (#30863)
@SangbumChoi
- fix the get_size_with_aspect_ratio in max_size situation (#30902)
- New model support RTDETR (#29077)
@Cyrilvallez
- Reduce by 2 the memory requirement in generate() 🔥🔥🔥 (#30536)
- Fix jetmoe model (#31279)
@ravenouse
- Add implementation of spectrogram_batch (#27159)

v4.42.0

New model additions

Gemma-2

RTDETR

InstructBlip

LlaVa NeXT Video

New model adder

Tool-use and RAG model support

GGUF support

Trainer improvements

Quantization improvements

Examples

Notable improvements

Breaking changes

Remove ConversationalPipeline and Conversation object

Remove an accidental duplicate softmax application in FLAVA's attention

Idefics2's ignore_index attribute of the loss is updated to -100

out_indices from timm being updated

datasets referenced in the quantization config get updated to remove references to datasets with restrictive licenses.

Bugfixes and improvements

Significant community contributions

v4.42.0

New model additions

Gemma-2

RTDETR

InstructBlip

LlaVa NeXT Video

New model adder

Tool-use and RAG model support

GGUF support

Trainer improvements

Quantization improvements

Examples

Notable improvements

Breaking changes

Remove ConversationalPipeline and Conversation object

Remove an accidental duplicate softmax application in FLAVA's attention

Idefics2's ignore_index attribute of the loss is updated to -100

out_indices from timm being updated

datasets referenced in the quantization config get updated to remove references to datasets with restrictive licenses.

Bugfixes and improvements

Significant community contributions

v4.42.0

New model additions

Gemma-2

RTDETR

InstructBlip

LlaVa NeXT Video

New model adder

Tool-use and RAG model support

GGUF support

Trainer improvements

Quantization improvements

Examples

Notable improvements

Breaking changes

Remove ConversationalPipeline and Conversation object

Remove an accidental duplicate softmax application in FLAVA's attention

Idefics2's ignore_index attribute of the loss is updated to -100

out_indices from timm being updated

datasets referenced in the quantization config get updated to remove references to datasets with restrictive licenses.

Bugfixes and improvements

Significant community contributions

Idefics2's `ignore_index` attribute of the loss is updated to `-100`

out_indices from `timm` being updated

Idefics2's `ignore_index` attribute of the loss is updated to `-100`

out_indices from `timm` being updated

Idefics2's `ignore_index` attribute of the loss is updated to `-100`

out_indices from `timm` being updated