v4.49.0

release notes

Published 2/17/2025

MinorContains breaking changes

Release notes

New models

Helium

Helium-1 preview is a lightweight language model with 2B parameters, targeting edge and mobile devices. It supports the following languages: English, French, German, Italian, Portuguese, Spanish.

Add-helium by @ArthurZucker in #35669

Qwen2.5-VL

The Qwen2.5-VL model is an update to Qwen2-VL from Qwen team, Alibaba Group.

The abstract from this update is the following:

Qwen2.5-VL marks a major step forward from Qwen2-VL, built upon the latest Qwen2.5 LLM. We’ve accelerated training and testing through the strategic implementation of window attention within the ViT. The ViT architecture itself has been refined with SwiGLU and RMSNorm, aligning it more closely with the LLM’s structure. A key innovation is the expansion of native dynamic resolution to encompass the temporal dimension, in addition to spatial aspects. Furthermore, we’ve upgraded MRoPE, incorporating absolute time alignment on the time axis to allow the model to effectively capture temporal dynamics, regardless of frame rate, leading to superior video understanding.

add qwen2.5vl by @ShuaiBai623 in #35569

SuperGlue

The SuperGlue model was proposed in SuperGlue: Learning Feature Matching with Graph Neural Networks by Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz and Andrew Rabinovich.

This model consists of matching two sets of interest points detected in an image. Paired with the SuperPoint model, it can be used to match two images and estimate the pose between them. This model is useful for tasks such as image matching, homography estimation, etc.

Add SuperGlue model by @sbucaille in #29886

Granite Vision Support

The Granite Vision model is a variant of LLaVA-NeXT, leveraging a Granite language model alongside a SigLIP visual encoder. It utilizes multiple concatenated vision hidden states as its image features, similar to VipLlava. It also uses a larger set of image grid pinpoints than the original LlaVa-NeXT models to support additional aspect ratios.

Granite Vision Support by @alex-jw-brooks in #35579

Zamba2

Zamba2 is a large language model (LLM) trained by Zyphra, and made available under an Apache 2.0 license.

Zamba2-1.2B, Zamba2-2.7B and Zamba2-7B are hybrid models combining state-space models (Specifically Mamba) and transformer, and were trained using next-token prediction. Zamba2 uses shared transformer layers after every 6 mamba blocks. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba2-1.2B, Zamba2-2.7B and Zamba2-7B were pre-trained on 2T and 3T tokens, respectively.

Add Zamba2 by @pglorio in #34517

GOT-OCR 2.0

GOT-OCR2 works on a wide range of tasks, including plain document OCR, scene text OCR, formatted document OCR, and even OCR for tables, charts, mathematical formulas, geometric shapes, molecular formulas and sheet music. While this implementation of the model will only output plain text, the outputs can be further processed to render the desired format, with packages like pdftex, mathpix, matplotlib, tikz, verovio or pyecharts. The model can also be used for interactive OCR, where the user can specify the region to be recognized by providing the coordinates or the color of the region’s bounding box.

Add GOT-OCR 2.0 to Transformers by @yonigozlan in #34721

DAB-DETR

DAB-DETR is an enhanced variant of Conditional DETR. It utilizes dynamically updated anchor boxes to provide both a reference query point (x, y) and a reference anchor size (w, h), improving cross-attention computation. This new approach achieves 45.7% AP when trained for 50 epochs with a single ResNet-50 model as the backbone.

Add DAB-DETR for object detection by @conditionedstimulus in #30803

Depth PRO

DepthPro is a foundation model for zero-shot metric monocular depth estimation, designed to generate high-resolution depth maps with remarkable sharpness and fine-grained details. It employs a multi-scale Vision Transformer (ViT)-based architecture, where images are downsampled, divided into patches, and processed using a shared Dinov2 encoder. The extracted patch-level features are merged, upsampled, and refined using a DPT-like fusion stage, enabling precise depth estimation.

Add Apple's Depth-Pro for depth estimation by @geetu040 in #34583

RT-DETRv2

An improved Real-Time DEtection TRansformer (RT-DETR). RT-DETRv2 refines RT-DETR by introducing selective multi-scale feature extraction, a discrete sampling operator for broader deployment compatibility. These improvements yield a 0.3 to 1.4 increase in mAP metrics on the COCO dataset, all while maintaining the same parameter count and frames-per-second (FPS) performance.

Adding RTDETRv2 by @jadechoghari in #34773

Transformers-CLI

Transformers' CLI welcomes a new command: chat. This command starts a conversation with the model of your choosing directly in your terminal.

This feature exists in TRL and has been migrated to transformers for easier usage.

[Chat] Add Chat from TRL 🐈 by @gante in #35714

Processor Standardization

An ongoing work is to standardize the image processors so that their API is equivalent. Additionally, the processors are given a fast variant so that they are never blockers in the image processing pipelines.

In this release, several processors have been standardized and have seen their fast version be contributed.

OwlViT/Owlv2 post processing standardization by @qubvel in #34929
OmDet Turbo processor standardization by @qubvel in #34937
Grounding DINO Processor standardization by @qubvel in #34853
Refactoring of ImageProcessorFast by @yonigozlan in #35069
add Qwen2-VL image processor fast by @yonigozlan in #35733
Remove Multi-threaded image conversion for fast image processors by @yonigozlan in #36105

Breaking changes

DPT segmentation maps

DPT image processors did not support segmentation_maps, instead only requiring images. This has been fixed. This adds an argument to the preprocess method, therefore users using arguments as positional arguments with that method may see changed behavior. We recommend using keyword arguments for such methods so as to not be bothered by the addition of new features.

🔴 🔴 🔴 Added segmentation maps support for DPT image processor by @simonreise in #34345

Image classification pipeline and single vs multi-label

The problem_type in the config.json file was read incorrectly by the pipeline, which mapped single-label to multi-label losses, and vice-versa. This has been fixed.

🚨🚨🚨 image-classification pipeline single-label and multi-label prob type squashing fns (sigmoid vs softmax) are backwards by @rwightman in #35848

Fixing the LayerNorm beta/gamma renames

The description of the pull request is the easiest way to understand the problem, why it exists, and how it is solved; please read the description below:

🚨🚨🚨 An attempt to fix #29554. Include 'LayerNorm.' in gamma/beta rename scope, optimize string search. by @rwightman in #35615

VLM cleanup

The ignore_index property of the llava configuration has been removed as it was not serving a purpose.

🔴 VLM: compile compatibility by @zucchini-nlp in #35724

Quantization

Quantization has received several improvements and fixes, including the contribution of FP8 quantization and the HIGGS quantization interface.

Additionally, we're replacing the AutoGPTQ implementaiton with GPTQModel from ModelCloud (see repository here)).

GPTQModel originated as major refractor of AutoGPTQ but is now a full-stand-in replacement with cleaner api, up-to-date model support, faster inference, higher quality quants.

Enable gptqmodel by @ jiqing-feng in #35012
Split and clean up GGUF quantization tests by @Isotr0py in #35502
Display warning for unknown quants config instead of an error by @SunMarc in #35963
Adding FP8 Quantization to transformers by @MekkCyber in #36026
New HIGGS quantization interfaces, JIT kernel compilation support. by @BlackSamorez in #36148

Generate

[generate] revert change in Aria: the maximum cache length must match max_length by @gante in #36120
🧹 remove generate-related objects and methods scheduled for removal in v4.48 by @gante in #35677
[generate] can instantiate GenerationConfig(cache_implementation="static") by @gante in #35679
[generate] return Cache object even if passed in a legacy format by @gante in #35673
[generate] update docstring of SequenceBiasLogitsProcessor by @gante in #35699
Test: generate with torch.compile(model.forward) as a fast test by @gante in #34544
[generate] move max time tests by @gante in #35962
[generate] shape checks in tests compatible with fixed-length caches (+ some minor fixes) by @gante in #35993

Pipelines

Pipelines have received several bug fixes and improvements which are detailed below.

Stop mutating input dicts in audio classification pipeline by @Rocketknight1 in #35754
fix document qa bf16 pipeline by @jiqing-feng in #35456
fix low-precision audio classification pipeline by @jiqing-feng in #35435
[pipeline] missing import regarding assisted generation by @gante in #35752
Output dicts support in text generation pipeline by @jonasrohw in #35092
Fix Audio Classification Pipeline top_k Documentation Mismatch and Bug #35736 by @sambhavnoobcoder in #35771

Bugfixes and improvements

Fix flaky test_custom_4d_attention_mask by @ydshieh in #35606
Use inherit tempdir makers for tests + fix failing DS tests by @muellerzr in #35600
Added error when sequence length is bigger than max_position_embeddings by @Taha1506 in #32156
Let EarlyStoppingCallback not require load_best_model_at_end by @muellerzr in #35101
Fix flaky test_beam_search_low_memory by @ydshieh in #35611
Skip MobileNetV1ModelTest::test_batching_equivalence for now by @ydshieh in #35614
Update codeowners with individual model owners by @Rocketknight1 in #35595
Fix device in rope module when using dynamic updates by @Cyrilvallez in #35608
Fix whisper compile by @jiqing-feng in #35413
Removed some duplicated code by @Sai-Suraj-27 in #35637
[Phi] bias should be True by @ArthurZucker in #35650
Enable different torch dtype in sub models by @zucchini-nlp in #34873
[Compile] Only test compiling model forward pass by @ArthurZucker in #35658
[tests] make cuda-only tests device-agnostic by @faaany in #35607
[i18n-ar] Translated file : docs/source/ar/tasks/token_classification.md into Arabic by @AhmedAlmaghz in #35193
Fix zero_shot_image_classification documentation guide link in SigLIP by @aretrace in #35671
Fix : adding einops lib in the CI docker for some bitsandbytes tests by @MekkCyber in #35652
Update torchao.md: use auto-compilation by @martin0258 in #35490
Fix : HQQ config when hqq not available by @MekkCyber in #35655
Fix expected output for ggml test by @MekkCyber in #35686
Fix : add require_read_token for gemma2 gated model by @MekkCyber in #35687
Enhanced Installation Section in README.md by @egojoseph in #35094
Enhance DataCollatorForLanguageModeling with Configurable Token Replacement Probabilities by @mahdibaghbanzadeh in #35251
Clean-up composite configs by @zucchini-nlp in #34603
Add future import for Py < 3.10 by @Rocketknight1 in #35666
Enable gptqmodel by @jiqing-feng in #35012
Fix : Nemotron Processor in GGUF conversion by @MekkCyber in #35708
Fix typo in /docs/source/ja/model_doc/decision_transformer.md URL by @hiroaki222 in #35705
Replace deprecated batch_size with max_batch_size when using HybridCache by @mtreinik in #35498
Fix: Falcon tie_word_embeddings in GGUF by @MekkCyber in #35715
Fix condition when GA loss bug fix is not performed by @techkang in #35651
Fix the bug that Trainer cannot correctly call torch_jit_model_eval by @Wanguy in #35722
[generation] fix type hint by @gante in #35725
Add proper jinja2 error by @Rocketknight1 in #35533
Optimize ForCausalLMLoss by removing unnecessary contiguous() call to reduce memory overhead by @efsotr in #35646
Modular: support for importing functions from any file by @Cyrilvallez in #35692
Remove batch size argument warning when unjustified by @quintenroets in #35519
[cache] add a test to confirm we can use cache at train time by @gante in #35709
Remove pt_to_tf by @gante in #35672
Added resource class configuration option for check_circleci_user job by @Sai-Suraj-27 in #32866
Fix some tests by @Cyrilvallez in #35682
Unable to use MimiModel with DeepSpeed ZeRO-3 by @anferico in #34735
check is added for the report_to variable in TrainingArguments by @alpertunga-bile in #35403
Added liger_kernel compatibility with PeftModel by @ambroser53 in #35680
Restore is_torch_greater_or_equal_than for backward compatibility by @tlrmchlsmth in #35734
Revert "Unable to use MimiModel with DeepSpeed ZeRO-3" by @eustlb in #35755
ci: fix xpu skip condition for test_model_parallel_beam_search by @dvrogozh in #35742
Use AMD CI workflow defined in hf-workflows by @ivarflakstad in #35058
Fix CI for VLMs by @zucchini-nlp in #35690
Security fix for self-comment-ci.yml by @ydshieh in #35548
[ViTPose] Convert more checkpoints by @NielsRogge in #35638
fix register_buffer in MimiEuclideanCodebook by @anferico in #35759
remove code owners as it was generating too much noise BUT by @ArthurZucker in #35784
Skip Falcon 7B GGML Test by @MekkCyber in #35783
[fix] cannot import name 'Pop2PianoFeatureExtractor' from 'transformers' by @faaany in #35604
transformers.image_transforms.normalize wrong types by @CalOmnie in #35773
Patch moonshine by @eustlb in #35731
Don't import torch.distributed when it's not available by @booxter in #35777
Fix vits low-precision dtype by @jiqing-feng in #35418
Tool calling: support more types by @aymeric-roucher in #35776
Fixes, improvements to timm import behaviour by @rwightman in #35800
modular_model_converter bugfix on assignments by @nikosanto13 in #35642
Deterministic sorting in modular converter when adding new functions by @Cyrilvallez in #35795
Fix "test_chat_template_dict" in video LLMs by @zucchini-nlp in #35660
Update AMD Docker image by @ivarflakstad in #35804
Add LlavaImageProcessor by @NielsRogge in #33191
Byebye test_batching_equivalence's flakiness by @ydshieh in #35729
[Doc] Adding blog post to model doc for TimmWrapper by @ariG23498 in #35744
add a new flax example for Bert model inference by @louie-tsai in #34794
Support adamw_torch_8bit by @fzyzcjy in #34993
Auto-add timm tag to timm-wrapper models. by @pcuenca in #35794
Fix : BLOOM tie_word_embeddings in GGUF by @MekkCyber in #35812
Fixed typo in autoawq version number in an error message for IPEX backend requirements. by @InfroLab in #35815
Remove deprecated get_cached_models by @Wauplin in #35809
Optimized set_initialized_submodules. by @LagPixelLOL in #35493
[i18n-ar] Translated file: docs/source/ar/tasks/masked_language_modeling.md into Arabic by @AhmedAlmaghz in #35198
move fastspeech to audio models by @eustlb in #35788
Improve modular documentation by @Cyrilvallez in #35737
[Mimi] update test expected values for t4 runners by @eustlb in #35696
Remove old benchmark code by @gante in #35730
Remove pyav pin to allow python 3.11 to be used by @CalOmnie in #35823
Another security patch for self-comment-ci.yml by @ydshieh in #35816
Init cache on meta device by @zucchini-nlp in #35164
Hotfix: missing working-directory in self-comment-ci.yml by @ydshieh in #35833
[gpt2] fix generation tests by @gante in #35822
Fix : Nemotron tokenizer for GGUF format by @MekkCyber in #35836
Fix head_dim in config extracted from Gemma2 GGUF model by @Isotr0py in #35818
[chat] docs fix by @gante in #35840
Fix compatibility issues when using auto_gptq with these older versions by @LRL-ModelCloud in #35830
Add PyTorch version check for FA backend on AMD GPUs by @mht-sharma in #35813
Fix NoneType type as it requires py>=3.10 by @SunMarc in #35843
[ tests] remove some flash attention class tests by @ArthurZucker in #35817
[Backend support] Allow num_logits_to_keep as Tensor + add flag by @Cyrilvallez in #35757
Fix GA loss for Deepspeed by @timjeffrey10 in #35808
Fix uploading processors/tokenizers to WandB on train end by @jack89roberts in #35701
Fix more CI tests by @ArthurZucker in #35661
[DOC] Fix contamination and missing paragraph in translation by @Yosshi999 in #35851
Fix typo by @SilverSoldier in #35854
fix apply_chat_template() padding choice by @baoyf4244 in #35828
Fix test_pipelines_video_classification that was always failing by @CalOmnie in #35842
Fix Llava-NeXT / Llava-NeXT Video / Llava-OneVision's token unpadding mismatch by @sheryc in #35779
use torch.testing.assertclose instead to get more details about error in cis by @ArthurZucker in #35659
add xpu device check in device_placement by @faaany in #35865
Add Rocketknight1 to self-comment-ci.yml by @ydshieh in #35881
[doctest] Fixes by @stevhliu in #35863
Fix fast image processor warnings in object detection examples by @sugendran in #35892
Update deepspeed amd image by @ivarflakstad in #35906
Fix typing in audio_utils.chroma_filter_bank by @CalOmnie in #35888
[docs] uv install by @stevhliu in #35821
Fix the config class comparison for remote code models by @Rocketknight1 in #35592
Close Zamba2Config code block by @Rocketknight1 in #35914
[docs] Fix Zamba2 by @stevhliu in #35916
Remove _supports_static_cache = True for some model classes by @ydshieh in #34975
Use rocm6.2 for AMD images by @ivarflakstad in #35930
Add default TP plan for all models with backend support by @Cyrilvallez in #35870
Fix: loading DBRX back from saved path by @zucchini-nlp in #35728
Fix mask slicing for models with HybridCache by @Cyrilvallez in #35681
Qwen-2-5-VL: fix CI by @zucchini-nlp in #35935
Fix TP initialization by @Cyrilvallez in #35860
fix(FA): QKV not being casted to target_dtype for FA with dpo lora by @NanoCode012 in #35834
Remove INC notebook reference in documentation by @echarlaix in #35936
use torch constraints to check if covariance is positive definite during mean resizing. by @abuelnasr0 in #35693
fix test_generated_length_assisted_generation by @keyboardAnt in #34935
Update unwrap_and_save_reload_schedule to use weights_only=False by @ydshieh in #35952
Update squad_convert_example_to_features to work with numpy v2 by @ydshieh in #35955
Fix flaky test_assisted_decoding_matches_greedy_search by @ydshieh in #35951
Trainer Refactor: Part 1 by @muellerzr in #35567
update docker file transformers-pytorch-deepspeed-latest-gpu by @ydshieh in #35940
[tests] further fix Tester object has no attribute '_testMethodName' by @faaany in #35781
Update README.md by @BlessedTatonka in #35958
fix iterator overflow when gradient accumulation is 1 by @winglian in #35960
Fix is_causal being a tensor by @IlyasMoutawwakil in #35791
[bart] minor test fixes by @gante in #35965
Pixtral: vectorize patch embeddings and enable tests by @zucchini-nlp in #35122
Whisper: fix static cache CI by @zucchini-nlp in #35852
Less flaky for TimmBackboneModelTest::test_batching_equivalence by @ydshieh in #35971
Support batching for UsefulSensors Moonshine by @njeffrie in #35922
not to use A100 for benchmark.yml by @ydshieh in #35974
Handle empty change indices in SAM's mask to rle conversion by @MSt-10 in #35665
Add support for nested images to LLava and VipLLava by @yonigozlan in #35558
[Moonshine] compute head_dim_padding at init by @eustlb in #35984
[Moshi] disable automatic compilation if the model can't compile by @gante in #35992
use torch 2.6 for daily CI by @ydshieh in #35985
Update-tp test by @ArthurZucker in #35844
Add mean_resizing for every VLMs' resizing_token_embeddings() by @YenFuLin in #35717
Update Granite Vision Model Path / Tests by @alex-jw-brooks in #35998
Qwen2-VL: fix rope delta calculation by @zucchini-nlp in #36013
Fix custom kernel for DeformableDetr, RT-Detr, GroindingDINO, OmDet-Turbo in Pytorch 2.6.0 by @qubvel in #35979
apply_chat_template: consistent behaviour for return_assistant_tokens_mask=True return_tensors=True by @mrsndmn in #35582
layernorm_decay_fix by @Ryoo72 in #35927
Update Mistral converter by @Cyrilvallez in #35967
Refactor (and fix) gpt_neox by @Cyrilvallez in #35610
Fix device mismatch error in Whisper model during feature extraction by @thedebugger in #35866
Fix RMSNormGated in Zamba2 by @pglorio in #35943
Commont bot CI for other jobs (generation / quantization) by @ydshieh in #35341
Hotfix for self-comment-ci.yml by @ydshieh in #36030
feat(ci): ignore trufflehog unverified results by @McPatate in #36031
CircleCI with python 3.9 by @ydshieh in #36027
Update tests regarding attention types after #35235 by @ydshieh in #36024
Fix Gemma2 synced multi-GPU generation by @ManukyanD in #35232
Fix synced multi-GPU generation with LLMs and VLMs by @ManukyanD in #35893
Add XPU type for work-around -inf mask causing sdpa NaN issue in modeling files by @Liangliang-Ma in #35647
add support for empty list as input to create_model_card by @ROZBEH in #36042
DeepSpeed github repo move sync by @stas00 in #36021
[docs] no hard coding cuda as bnb has multi-backend support by @faaany in #35867
[docs] fix bugs in the bitsandbytes documentation by @faaany in #35868
[docs] no hard-coding cuda by @faaany in #36043
Fix how we compute the final non-padding token for ForSequenceClassification models by @Rocketknight1 in #35911
Add Qwen2VLImageProcessorFast into Qwen2VLProcessor by @yeliudev in #35987
Iterative generation using Input embeds and past_key_values by @yaswanth19 in #35890
Fix usage of unpad_input function by @pavelgein in #35925
Fix repo consistency by @ydshieh in #36063
Update test_flash_attn_2_can_dispatch_composite_models by @ydshieh in #36050
Paligemma: fix generation with Gemma2 by @zucchini-nlp in #36044
Save checkpoint to temporary directory to handle partial saves during failures by @SilverSoldier in #35580
Nail in edge case of torch dtype being overriden permantly in the case of an error by @muellerzr in #35845
Fix words typos in ggml test. by @zhanluxianshen in #36060
Fix model kwargs by @muellerzr in #35875
Fix StopStringCriteria to handle tokens above len(tokenizer) by @Rocketknight1 in #35797
[docs] fix outdated example code in trainer.md by @faaany in #36066
Adding RT-DETRv2 for object detection by @jadechoghari in #34773
Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL by @DeepWaved in #36065
Move audio top_k tests to the right file and add slow decorator by @Rocketknight1 in #36072
Fix OS err by @muellerzr in #36094
[docs] fix model checkpoint name by @faaany in #36075
[docs] fix typo by @faaany in #36080
[docs] fix not-working example code in perf_infer_gpu_one.md by @faaany in #36087
fix MllamaVisionAttention typehint by @kylesayrs in #35975
Processors: allow tuples of images when checking by @zucchini-nlp in #36084
Chat template: update for processor by @zucchini-nlp in #35953
Paligemma: revert #36084 by @zucchini-nlp in #36113
Support constant lr with cooldown by @LoserCheems in #35453
Enable pytest live log and show warning logs on GitHub Actions CI runs by @ydshieh in #35912
Refactor OPT model by @jiqing-feng in #36101
Revert checkpoint tmp dir by @SunMarc in #36112
[Bugfix] fix file name of docstring in utils/check_table.py by @kkscilife in #36108
fix bnb warning by @SunMarc in #36116
AutoformerForPrediction test add atol by @ivarflakstad in #36017
Fix nighlty CIs: missing atols by @ArthurZucker in #35903
Add common test for torch.export and fix some vision models by @qubvel in #35124
fix: typos in documentation files by @maximevtush in #36122
update awesome-transformers.md. by @zhanluxianshen in #36115
Fix max size deprecated warning by @HichTala in #34998
Fix CI issues by @molbap in #35662
update tiktoken integ to use converted by @ArthurZucker in #36135
Make output_dir Optional in TrainingArguments #27866 by @sambhavnoobcoder in #35735
[docs] minor doc fix by @faaany in #36127
[docs] update awq doc by @faaany in #36079
Add pipeline parallel plan to PretrainedConfig and PreTrainedModel by @hmellor in #36091
add RAdamScheduleFree optimizer by @nhamanasu in #35313
added warning to Trainer when label_names is not specified for PeftModel by @MilkClouds in #32085
Whisper: remove redundant assisted generation tests by @gante in #34814
Add utility for Reload Transformers imports cache for development workflow #35508 by @sambhavnoobcoder in #35858
VLM: enable skipped tests by @zucchini-nlp in #35746
[commands] remove deprecated/inoperational commands by @gante in #35718
Fix Gradient Checkpointing for Deberta & Deberta-V2 using PEFT / Adapters by @lenglaender in #35898
🚨 Remove cache migration script by @Wauplin in #35810
multi-gpu: fix tensor device placements for various models by @dvrogozh in #35763
Optim: APOLLO optimizer integration by @zhuhanqing in #36062
Fix multi gpu loss sync condition, add doc and test by @techkang in #35743
adding option to save/reload scaler by @hsilva664 in #34932
Update doc re list of models supporting TP by @kwen2501 in #35864
Add more rigerous non-slow grad accum tests by @muellerzr in #35668
Fix test fetcher by @ydshieh in #36129
skip test_initialization for VitPoseBackboneModelTest for now by @ydshieh in #36154
Add git LFS to AMD docker image by @ivarflakstad in #36016
Mllama fsdp by @blbadger in #36000
Fix PaliGemma Pad Token Masking During Training #35855 by @sambhavnoobcoder in #35859
Add reminder config to issue template and print DS version in env by @Ben-Schneider-code in #35156
Fix Gemma2 dtype issue when storing weights in float16 precision by @Nerogar in #35398
Replace deprecated update_repo_visibility by @Wauplin in #35970
Fix tests for vision models by @qubvel in #35654
qwen2.5vl: fix bugs when using flash2+bf16 or num_return_sequences>1 by @gewenbin0992 in #36083
docs: fix return type annotation of get_default_model_revision by @MarcoGorelli in #35982
Fix PretrainedTokenizerFast check => Fix PretrainedTokenizerFast Save by @CL-ModelCloud in #35835
Move DataCollatorForMultipleChoice from the docs to the package by @bauwenst in #34763
Helium documentation fixes by @LysandreJik in #36170
Remove loading custom kernel for RT-DETRv2 by @qubvel in #36098
[Modular] skip modular checks based on diff by @gante in #36130
Fix red CI by @ArthurZucker in #36174
Fix : fix doc fp8 by @MekkCyber in #36173
Efficient Inference Kernel for SpQR by @elvircrn in #34976
fix training issues by @ArthurZucker in #36158
add disable compile option by @ArthurZucker in #36161
CI: avoid human error, automatically infer generative models by @gante in #33212
Use tqdm auto by @SmartManoj in #35726
Optimize Qwen2VL vision model by precomputing cos/sin embeds before ViT blocks by @li-plus in #35837
Make check_repository_consistency run faster by MP by @ydshieh in #36175
Fix the key name for _load_rng_state under torch.cuda by @wizyoung in #36138
Follow up to SpQR integration by @MekkCyber in #36176
Fix a mistake in #36175 by @ydshieh in #36179
Fix make_batched_videos and add tests by @yonigozlan in #36143
Uniformize OwlViT and Owlv2 processors by @yonigozlan in #35700
Add support for partial rotary embeddings in Phi3 model by @garg-amit in #35947
CI: fix test-save-trainer by @zucchini-nlp in #36191
Chat template docs by @zucchini-nlp in #36163
Add ImageProcessorFast to Qwen2.5-VL processor by @Isotr0py in #36164
Prepare processors for VideoLLMs by @zucchini-nlp in #36149
Add require_read_token to fp8 tests by @MekkCyber in #36189
Revert qwen2 breaking changes related to attention refactor by @ArthurZucker in #36162
Guard against unset resolved_archive_file by @dmlap in #35628
[Bugfix] Fix reloading of pixtral/llava configs by @kylesayrs in #36077

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@jiqing-feng
- Fix whisper compile (#35413)
- Enable gptqmodel (#35012)
- fix document qa bf16 pipeline (#35456)
- Fix vits low-precision dtype (#35418)
- fix low-precision audio classification pipeline (#35435)
- Refactor OPT model (#36101)
@AhmedAlmaghz
- [i18n-ar] Translated file : docs/source/ar/tasks/token_classification.md into Arabic (#35193)
- [i18n-ar] Translated file: docs/source/ar/tasks/masked_language_modeling.md into Arabic (#35198)
@sbucaille
- Add SuperGlue model (#29886)
@Isotr0py
- Fix head_dim in config extracted from Gemma2 GGUF model (#35818)
- Split and clean up GGUF quantization tests (#35502)
- Add ImageProcessorFast to Qwen2.5-VL processor (#36164)
@ShuaiBai623
- add qwen2.5vl (#35569)
@alex-jw-brooks
- Granite Vision Support (#35579)
- Update Granite Vision Model Path / Tests (#35998)
@pglorio
- Add Zamba2 (#34517)
- Fix RMSNormGated in Zamba2 (#35943)
@conditionedstimulus
- Add DAB-DETR for object detection (#30803)
@jadechoghari
- Adding RT-DETRv2 for object detection (#34773)
@geetu040
- Add Apple's Depth-Pro for depth estimation (#34583)
@zhuhanqing
- Optim: APOLLO optimizer integration (#36062)
@bauwenst
- Move DataCollatorForMultipleChoice from the docs to the package (#34763)
@elvircrn
- Efficient Inference Kernel for SpQR (#34976)

Latest release

Version v5.5.0is out. See relase notes.