v4.27.0

release notes

Published 3/15/2023

MinorContains breaking changes

Release notes

BridgeTower

The goal of this model is to build a bridge between each uni-modal encoder and the cross-modal encoder to enable comprehensive and detailed interaction at each layer of the cross-modal encoder thus achieving remarkable performance on various downstream tasks with almost negligible additional performance and computational costs.

Add BridgeTower model by @abhiwand in #20775
Add loss for BridgeTowerForMaskedLM and BridgeTowerForImageAndTextRetrieval by @abhiwand in #21684
[WIP] Add BridgeTowerForContrastiveLearning by @abhiwand in #21964

Whisper speedup

The Whisper model was integrated a few releases ago. This release offers significant performance optimizations when generating with timestamps. This was made possible by rewriting the generate() function of Whisper, which now uses the generation_config and implementing a batched timestamp prediction. The language and task can now also be setup when calling generate(). For more details about this refactoring checkout this colab. Notably, whisper is also now supported in Flax 🚀 thanks to @andyehrenberg ! More whisper related commits:

[Whisper] Refactor whisper by @ArthurZucker in #21252
[WHISPER] Small patch by @ArthurZucker in #21307
[Whisper] another patch by @ArthurZucker in #21324
add flax whisper implementation by @andyehrenberg in #20479
Add WhisperTokenizerFast by @jonatanklosko in #21222
Remove CLI spams with Whisper FeatureExtractor by @qmeeus in #21267
Update document of WhisperDecoderLayer by @ling0322 in #21621
[WhisperModel] fix bug in reshaping labels by @jonatasgrosman in #21653
[Whisper] Add SpecAugment by @bofenghuang in #21298
Fix-ci-whisper by @ArthurZucker in #21767
Fix WhisperModelTest by @ydshieh in #21883
[Whisper] Add rescaling function with do_normalize by @ArthurZucker in #21263
Refactor whisper asr pipeline to include language too. by @Narsil in #21427
Update model_split_percents for WhisperModelTest by @ydshieh in #21922
[Whisper] Fix feature normalization in WhisperFeatureExtractor by @bofenghuang in #21938
[Whisper] Add model for audio classification by @sanchit-gandhi in #21754
fixes the gradient checkpointing of whisper by @soma2000-lang in #22019
Skip 3 tests for WhisperEncoderModelTest by @ydshieh in #22060
[Whisper] Remove embed_tokens from encoder docstring by @sanchit-gandhi in #21996
[Whiper] add get_input_embeddings to WhisperForAudioClassification by @younesbelkada in #22133
[🛠️] Fix-whisper-breaking-changes by @ArthurZucker in #21965

DETA

DETA (short for Detection Transformers with Assignment) improves Deformable DETR by replacing the one-to-one bipartite Hungarian matching loss with one-to-many label assignments used in traditional detectors with non-maximum suppression (NMS). This leads to significant gains of up to 2.5 mAP.

Add DETA by @NielsRogge in #20983

SpeechT5

The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder.

add SpeechT5 model by @hollance in #18922

XLM-V

XLM-V is multilingual language model with a one million token vocabulary trained on 2.5TB of data from Common Crawl (same as XLM-R).

Add XLM-V to Model Doc by @stefan-it in #21498

BLIP-2

BLIP-2 leverages frozen pre-trained image encoders and large language models (LLMs) by training a lightweight, 12-layer Transformer encoder in between them, achieving state-of-the-art performance on various vision-language tasks. Most notably, BLIP-2 improves upon Flamingo, an 80 billion parameter model, by 8.7% on zero-shot VQAv2 with 54x fewer trainable parameters.

Add BLIP-2 by @NielsRogge in #21441

X-MOD

X-MOD extends multilingual masked language models like XLM-R to include language-specific modular components (language adapters) during pre-training. For fine-tuning, the language adapters in each transformer layer are frozen.

Add X-MOD by @jvamvas in #20939

Ernie-M

ERNIE-M is a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to overcome the constraint that the parallel corpus size places on the model performance.

Add Ernie-M Model to huggingface by @susnato in #21349

TVLT

The Textless Vision-Language Transformer (TVLT) is a model that uses raw visual and audio inputs for vision-and-language representation learning, without using text-specific modules such as tokenization or automatic speech recognition (ASR). It can perform various audiovisual and vision-language tasks like retrieval, question answering, etc.

Add TVLT by @zinengtang in #20725

CLAP

CLAP (Contrastive Language-Audio Pretraining) is a neural network trained on a variety of (audio, text) pairs. It can be instructed in to predict the most relevant text snippet, given an audio, without directly optimizing for the task. The CLAP model uses a SWINTransformer to get audio features from a log-Mel spectrogram input, and a RoBERTa model to get text features. Both the text and audio features are then projected to a latent space with identical dimension. The dot product between the projected audio and text features is then used as a similar score.

[CLAP] Add CLAP to the library by @ArthurZucker in #21370
[CLAP] Fix few broken things by @younesbelkada in #21670

GPTSAN

GPTSAN is a Japanese language model using Switch Transformer. It has the same structure as the model introduced as Prefix LM in the T5 paper, and support both Text Generation and Masked Language Modeling tasks. These basic tasks similarly can fine-tune for translation or summarization.

add GPTSAN model (reopen) by @tanreinama in #21291

EfficientNet

EfficientNets are a family of image classification models, which achieve state-of-the-art accuracy, yet being an order-of-magnitude smaller and faster than previous models.

Add EfficientNet by @alaradirik in #21563

ALIGN

ALIGN is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image classification. ALIGN features a dual-encoder architecture with EfficientNet as its vision encoder and BERT as its text encoder, and learns to align visual and text representations with contrastive learning. Unlike previous work, ALIGN leverages a massive noisy dataset and shows that the scale of the corpus can be used to achieve SOTA representations with a simple recipe.

Add ALIGN to transformers by @alaradirik in #21741

Informer

Informer is a method to be applied to long-sequence time-series forecasting. This method introduces a Probabilistic Attention mechanism to select the “active” queries rather than the “lazy” queries and provides a sparse Transformer thus mitigating the quadratic compute and memory requirements of vanilla attention.

[Time-Series] informer model by @elisim in #21099

API updates and improvements

Safetensors

safetensors is a safe format of serialization of tensors, which has been supported in transformers as a first-class citizen for the past few versions.

This change enables explicitly forcing the from_pretrained method to use or not to use safetensors. This unlocks a few use-cases, notably the possibility to enforce loading only from this format, limiting security risks.

Example of usage:

from transformers import AutoModel

# As of version v4.27.0, this loads the `pytorch_model.bin` by default if `safetensors` is not installed.
# It loads the `model.safetensors` file if `safetensors` is installed.
model = AutoModel.from_pretrained('bert-base-cased')

# This forces the load from the `model.safetensors` file.
model = AutoModel.from_pretrained('bert-base-cased', use_safetensors=True)

# This forces the load from the `pytorch_model.bin` file.
model = AutoModel.from_pretrained('bert-base-cased', use_safetensors=False)

[Safetensors] Add explicit flag to from pretrained by @patrickvonplaten in #22083

Variant

This PR adds a "variant" keyword argument to PyTorch's from_pretrained and save_pretrained so that multiple weight variants can be saved in the model repo.

Example of usage with the model hosted in this folder on the Hub:

from transformers import CLIPTextModel

path = "huggingface/the-no-branch-repo"  # or ./text_encoder if local

# Loads the `no_ema` variant. This loads the `pytorch_model.fp16.bin` file from this folder.
model = CLIPTextModel.from_pretrained(path, subfolder="text_encoder", variant="fp16")

# This loads the no-variant checkpoint, loading the `pytorch_model.bin` file from this folder.
model = CLIPTextModel.from_pretrained(path, subfolder="text_encoder")

Add variant to transformers by @patrickvonplaten in #21332
[Variant] Make sure variant files are not incorrectly deleted by @patrickvonplaten in #21562

bitsandbytes

The bitsandbytes integration is overhauled, now offering a new configuration: the BytsandbytesConfig.

Read more about it in the documentation.

[bnb] Introducing BitsAndBytesConfig by @younesbelkada in #21579
[bnb] fix bnb decoders bug by @younesbelkada in #21688

FSDP

This PR enables the user to make use of the PyTorch/XLA implementation of FSDP, including the newly added auto-wrap feature. Four arguments have been added to training_args.py to facilitate this functionality:

xla_fsdp: this flag is a string containing the location of a .json file which specifies the FSDP arguments the user wants to use when wrapping their model.
xla_fsdp_min_num_params: this flag is an int which will set a size-based automatic wrapping policy which automatically FSDP wraps any module with at least xla_fsdp_min_num_params many parameters.
xla_fsdp_transformer_layer_cls_to_wrap: this flag is a list of (case-sensitive) strings which will set a layer-class-based automatic wrapping policy which automatically FSDP wraps any module whose name matches one of the listed strings.
xla_fsdp_grad_ckpt: this flag is a bool which determines whether gradient checkpointing is enabled for the automatically wrapped layers.

Enable PyTorch/XLA Fully Sharded Data Parallel (FSDP) by @AlexWertheim in #21406

Breaking changes

Generate

This PR standardizes beam search behavior across all three frameworks through early_stopping. PyTorch is unchanged, but TensorFlow and Flax users will see a significant speedup if they keep the default generation parameters.

There are, however, minor differences in outputs of the .generate method with beam search on TensorFlow and Flax. It should be very small and will come with significant speedups, but in case it breaks your workflow, we recommend you downgrade to a previous version and let us know in a GitHub issue so that we may investigate what is going on.

🚨🚨 Generate: standardize beam search behavior across frameworks by @gante in #21368

Single model initialization

Model initialization has problems which led to the initialization being incoherent across models and across initialization techniques. This is technically a bugfix, but as it may result in your models being initialized with different values, we think it best to highlight it here.

🚨🚨🚨 Enforce single model initialization by @sgugger in #21431

Deprecations

This PR deprecated the parallelize API which has been replaced by accelerate months ago. We recommend loading the model using the device_map attribute and setting it to balanced to obtain the previous behavior.

Setting your own device_map is still permitted, but it needs to be a dictionary from module name to device, for example:

device_map = {'h.0': 0, 'h.1': 1, ...}

Deprecate parallelize API by @sgugger in #21448

Pipelines

A new pipeline focused on zero-shot audio classification is added to the repository.

[Pipeline] Add zero shot audio classification pipeline by @ArthurZucker in #21600

Documentation

The task and model summaries have been refactored to take into account the larger number of tasks and models we now have.

Update task summary by @stevhliu in #21067
Refactor model summary by @stevhliu in #21408

Bugfixes and improvements

[t5] Fix T5 inference in float16 + bnb error by @younesbelkada in #21281
[examples/deepspeed] fix renamed api by @stas00 in #21283
[GenerationConfig] add additional kwargs handling by @ArthurZucker in #21269
[W2V2 with LM] Fix decoder test with params by @sanchit-gandhi in #21277
Fix TrainingArguments.label_names docs to reflect the correct default value behaviour by @fredtcaroli in #21288
Update expected values for doctest by @stevhliu in #21284
[GIT] Add test for batched generation by @NielsRogge in #21282
Supporting ImageProcessor in place of FeatureExtractor for pipelines by @Narsil in #20851
[Mask2Former] Add doc tests by @NielsRogge in #21232
Moving to cleaner tokenizer version or oneformer. by @Narsil in #21292
Fix EfficientFormer by @ydshieh in #21294
[Hubert] Fix Hubert processing auto by @younesbelkada in #21299
Update OneFormerModelIntegrationTest expected values by @ydshieh in #21295
[Doctest] Fix Blenderbot doctest by @younesbelkada in #21297
Documentation code sample fixes by @MKhalusova in #21302
[CI-Daily] replace past in prepare inputs for generation by @ArthurZucker in #21296
Small fix to ExponentialDecayLengthPenalty docstring by @njhill in #21308
Accept batched tensor of images as input to image processor by @amyeroberts in #21144
Use model_class.__name__ and compare against XXX_MAPPING_NAMES by @ydshieh in #21304
Fix 2 paths in the doctest list by @ydshieh in #21314
[i18n-KO] Translated quicktour page to Korean by @wonhyeongseo in #20946
Small QoL for qa. by @Narsil in #21316
check paths in utils/documentation_tests.txt by @ydshieh in #21315
Fix TFEncoderDecoder tests by @ydshieh in #21301
Generate: better compute_transition_scores examples by @gante in #21323
[Doctest] Fix Perceiver doctest by @younesbelkada in #21318
Update Hebrew language code to he per IANA registry by @altryne in #21310
Fix M2M100 positional embedding creation for ONNX by @michaelbenayoun in #21328
Fix RobertaPreLayerNorm doctest by @ydshieh in #21337
Little cleanup: let huggingface_hub manage token retrieval by @Wauplin in #21333
Automated compatible models list for task guides by @MKhalusova in #21338
Fix GitModelIntegrationTest.test_batched_generation device issue by @ydshieh in #21362
Pipeline testing - using tiny models on Hub by @ydshieh in #20426
fix the issue that the output dict of jit model could not get [0] by @sywangyi in #21354
Corrected by @HsiangNianian in #21350
Remove duplicate declarations in dummy inputs for TFLongformer by @peakji in #21352
Fix DETR tests after #21144 by @amyeroberts in #21365
Add cPython files in build by @sgugger in #21372
Generate: Relaxed max_length and max_new_tokens coexistence by @gante in #21347
Fixes path for Graphormer checkpoint by @clefourrier in #21367
Adding resource section to GPT-J docs by @adit299 in #21270
translate index to zh by @bfss in #20095)
[run_(clm|mlm).py examples] add streaming dataset support by @stas00 in #21343
Template for framework-agnostic tests by @gante in #21348
Cleanup the usage of layer_norm_eps in some models by @ydshieh in #21336
Do not log the generation config for each prediction step in TrainerSeq2Seq by @regisss in #21385
[Docs] Minor fixes by @NielsRogge in #21383
Simplify column_names in run_clm/mlm by @lhoestq in #21382
Add support of backward_prefetch and forward_prefetch by @raghavanone in #21237
Remove more unused attributes in config classes by @ydshieh in #21327
Generate: fix TF XLA tests on models with max_position_embeddings or max_target_positions by @gante in #21389
Update Graphormer and fix its torchscript test failures by @ydshieh in #21380
Moved LiLT under multimodal models in TOC by @MKhalusova in #21393
Fix the issue of using only inputs_embeds in convbert model by @raghavanone in #21398
Skip batches fast with accelerate by @sgugger in #21390
Added DagshubCallback by @jinensetpal in #21404
Add TF image classification example script by @amyeroberts in #19956
Generate: decoder-only models can generate with inputs_embeds by @gante in #21405
Use torch 1.13.1 in push/schedule CI by @ydshieh in #21421
Fix image_processor_class bug by @shikhartuli in #21410
Add distinct section names for PyTorch and TF by @Rocketknight1 in #21422
Add the GeLU activation from pytorch with the tanh approximation by @jlamypoirier in #21345
Fix Graphormer test suite by @clefourrier in #21419
[bnb] Fine-tuning HF 8-bit models by @younesbelkada in #21290
Allow to add more information in is_flaky by @ydshieh in #21426
Fix some pipeline tests by @ydshieh in #21401
Fix task guide formatting by @stevhliu in #21409
Fixes bug in the creation of ExponentialDecayLengthPenalty by @jorgemcgomes in #21423
Add inputs_embeds support for .generate() with BLOOM models by @akreal in #21430
Remove more unused attributes in config classes by @ydshieh in #21392
Added model resources for LayoutLM Issue#19848 by @avisinghal6 in #21377
Fix device issue in a ConvBertModelTest test by @ydshieh in #21438
do not scale gradient in bf16 mode by @kashif in #21428
exclude deleted files in the fixup script by @dtuit in #21436
Add tutorial doc for TF + TPU by @Rocketknight1 in #21429
For IterableDataset, return DataLoader using self._train_batch_size. … by @agossard in #21447
Avoid flaky generation sampling tests by @ydshieh in #21445
Fix SpeechT5ForSpeechToSpeechIntegrationTests device issue by @ydshieh in #21460
Add perf numbers for perf_train_cpu by @jianan-gu in #20974
Added documentation for DagsHubCallback by @jinensetpal in #21452
Fix PushToHubCallback import in Share a model docs by @ireneisdoomed in #21457
Add VQGAN-CLIP research project by @ErwannMillon in #21329
Fixed RAG script which was failing on dummy example by @kaustubhdhole in #21416
make SpeechT5 doc examples deterministic by @hollance in #21470
Generate: TF can now accept custom logits processors by @gante in #21454
Removing more_itertools dependency. by @Narsil in #21473
[examples] improve block_size warning message by @stas00 in #21463
[i18n-fr] Translate index page to French by @NoB0 in #21458
OPT: BLIP2-ready prepare_inputs_for_generation by @gante in #21477
Add tips for generation with Int8 models by @lewtun in #21424
Update quality tooling for formatting by @sgugger in #21480
Fix epoch number when resuming training by @sgugger in #21478
[CI ] Remove past in favor of pat_key_values by @ArthurZucker in #21443
Generate: TF can now generate from embeddings in encoder-decoder models by @gante in #21475
[Doc] Fix int8 docs by @younesbelkada in #21487
changed "ot" to "to" by @Iulian277 in #21488
🖊️ fix typo in pytorch semantic segmentation readme by @jvdd in #21492
Typos/fixes to link syntax by @Rocketknight1 in #21450
Sanity check the type of id2label and label2id arguments of from_pretrained for TokenClassification models by @raghavanone in #21490
[OPT] Adds GPT2TokenizerFast to the list of tokenizer to use for OPT. by @ArthurZucker in #20823
A new test to check config attributes being used by @ydshieh in #21453
Add limit_all_gathers option to fsdp_config and fix forward_prefetch bug by @raghavanone in #21489
Cleanup quality by @sgugger in #21493
[tokenizer] sanitize saved config by @stas00 in #21483
Add inverse sqrt learning rate scheduler by @Sager611 in #21495
Check for mapping/dict in distributed_concat function by @prajwal967 in #21500
Fix import in Accelerate for find_exec_bs by @sgugger in #21501
Wrap RemBert integration test forward passes with torch.no_grad() by @katiele47 in #21503
Exclude the madeup words from M2M100Tokenizer.vocab_size by @guillaumekln in #20976
[Doc] Minor URL fixes in PyTorch Text Classification Readme by @stefan-it in #21511
Generate: TF compute_transition_scores by @gante in #21341
no more dummies for speech processors by @hollance in #21517
Update OPT conversion script to work for OPT-IML by @thomasw21 in #21519
[tests] add missing report_to none by @stas00 in #21505
Fixing backward compatiblity image_processor in pipeline. by @Narsil in #21513
Fix multiple eos_token_ids in model.generate(...) by @tokestermw in #21461
Add __len__ method to _LazyAutoMapping by @ydshieh in #21522
Generate: make TF .generate() signature == PT .generate() signature by @gante in #21525
Generate: TF .generate() can now be exported with dynamic length by @gante in #21474
Fix missing unfinished_sequences by @tokestermw in #21529
Fix ClearML Integration to run in ClearML pipelines and external Tasks. by @thepycoder in #21531
Tag tests as slow ⌛ by @gante in #21537
fix typo in run_speech_recognition_ctc.py by @21jun in #21528
Fix inclusion of non py files in package by @sgugger in #21546
Fix from_pretrained API with config and state_dict by @sgugger in #21542
Added with torch.no_grad() to XLM-Roberta integration test by @katiele47 in #21547
[pipeline] A simple fix for half-precision & 8bit models by @younesbelkada in #21479
Added with torch.no_grad() to Camembert integration test by @katiele47 in #21544
adding a tip for deepspeed integration in multi-node environment by @izapolsk in #21459
Fix stuff related to the causal_mask in CodeGen. by @GeneZC in #21527
Replace inefficient torch.sqrt taking scalar input with numpy.sqrt by @FindHao in #21496
Add _mp_fn to run_mae.py for XLA testing by @steventk-g in #21551
[Tests] Improve flax test_attention_outputs by @Shubhamai in #21486
[from_pretrained] extend torch_dtype="auto" to look up config.torch_dtype first, expand docs by @stas00 in #21524
[Tasks] Adds image captioning by @sayakpaul in #21512
Goodbye to Blip-2 doctests by @ydshieh in #21566
[deepspeed] deal with models w/o config.hidden_size by @stas00 in #21504
improving contributing tests section by @Shubhamai in #21569
Replace input_values_processing with unpack_inputs by @amyeroberts in #21502
Added timesformer configuration by @AdiaWu in #21446
Remove more unused attributes in config classes by @ydshieh in #21543
[Blip2] Add int8 support for blip2-flan-t5-xxl by @younesbelkada in #21574
Generate: TF supports multiple eos tokens by @gante in #21571
Add: document question answering task guide by @MKhalusova in #21518
CI: skip failing TF hubert test by @gante in #21601
Remove trailing 'extractive' word from en documentation by @tpaviot in #21594
[MINOR] Fix link in timeseries transformer docs by @cakiki in #21602
Add inputs_embeds support when generating with GPT-J by @dimitry12 in #21575
Generate: Fix flaky indexing error in test_constrained_beam_search_generate_dict_output by @gante in #21561
[bnb] Let's make the daily CI green 🍏 by @younesbelkada in #21597
annotated TFvisionEncoderDecoder input type hints by @miyu386 in #21432
Correct Markdown bullets indentation by @wangkuiyi in #21583
Add missing arguemtn to run_clip.py by @WarrenGreen in #21588
Fix Blip-2 CI by @ydshieh in #21595
Generate: correct default model input creation for decoder-only models by @gante in #21580
[i18n-fr] Translate quicktour page to French by @NoB0 in #21589
Update setup.py by @stas00 in #21584
[deepspeed] performance docs by @stas00 in #21573
Clarify available pipelines in quicktour by @stevhliu in #21607
Fix env. variable type issue in testing by @ydshieh in #21609
Fix TF CTC tests by @gante in #21606
Add in big model inference to issue template by @muellerzr in #21611
Enable requires_grad on input embedding to train on top of frozen layers by @younesbelkada in #21598
Generate: filter encoder inputs when its signature does not accept wildcards by @gante in #21603
Generate: input expansion for any model input by @gante in #21624
Final cleanup of TOKENIZER_FOR_DOC by @sgugger in #21565
Remove Niels from templates by @sgugger in #21564
Fix generation config for empty state dict by @sgugger in #21630
Removes duplicate computations in DETR post processing by @eclique in #21592
Fix typo in documentation. by @mmcdermott in #21632
Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) by @BenoitDalFerro in #21627
Fix typo in QA task guide by @stevhliu in #21608
fix: Race Condition when using Sagemaker Checkpointing and Model Repository by @DougTrajano in #21614
Remove extra "max_length is reached." from InfNaNLogitsProcessor documentation by @mmcdermott in #21634
Fix Blip-2 CI again by @ydshieh in #21637
Skip wav2vec2 hubert high mem tests by @amyeroberts in #21643
Fix passing kwargs to TFBertTokenizer by @balvisio in #21619
Skipping more high mem tests - Wav2Vec2 Hubert by @amyeroberts in #21647
Pass parent exception as context exception to provide clearer stack trace by @balvisio in #21636
Generate: PT Dynamo without graph breaks in the main greedy/sample loop by @gante in #21648
Update deprecated load_module by @sgugger in #21651
Fix typos in contrastive-image-text example README by @regisss in #21665
[WIP] Move X-MOD models to facebook organization by @jvamvas in #21640
refactor: Make direct_transformers_import util by @connor-henderson in #21652
[bloom] gradient_checkpointing fix by @stas00 in #21655
Add OPT resources to the transformers documentation by @alissadb in #21625
Adapt PerceiverIO Multimodal class to work with arbitrary modalities by @stevenmanton in #20054
Fix multi-gpu training error for LayoutLMv2 by @akkikiki in #21675
Generate: eta sampling numerical stability by @gante in #21676
[ImageProcessor] Refactor default mean & std to OPENAI_CLIP_MEAN & OPENAI_CLIP_STD by @younesbelkada in #21425
[BLIP] update blip path on slow tests by @younesbelkada in #21476
Fix dynamic module import error by @ydshieh in #21646
Fix for non-contiguous label tensors in VisonEncoderDecoder by @morganmcg1 in #21582
Fix-rag-finetune-project-requirement by @ArthurZucker in #21697
Pass along revision in dynamic code fetch by @sgugger in #21698
Fix axial positional encoding calculations for reformer.mdx by @ijindal in #21649
remove position ids and token type ids from forward args in docstring by @ArthurZucker in #21701
Fix typo in PROCESSOR_MAPPING_NAMES and add tests by @ydshieh in #21703
Fix get_class_in_module by @ydshieh in #21709
Fix TVLT (torch device issue) by @ydshieh in #21710
Adding task guides to resources by @MKhalusova in #21704
Adding type hints to call() functions in this file by @mollerup23 in #21548
Time series transformer: input projection and Std scaler by @kashif in #21020
Apply ruff flake8-comprehensions by @Skylion007 in #21694
[MBart] Fix cross attention mask check by @younesbelkada in #21730
Respect documentation on passive log level by @sgugger in #21700
Remove gptsan_japanese from doctest list to avoid GPU OOM by @ydshieh in #21722
Change doc example for BigBirdForQuestionAnswering by @ydshieh in #21723
Fix ErnieMEmbeddings device issue by @ydshieh in #21726
Fix GPTSanJapaneseModel by @ydshieh in #21731
[SpeechT5HifiGan] Handle batched inputs by @sanchit-gandhi in #21702
Fix to KerasMetricCallback when the model returns unstructured output by @Rocketknight1 in #21727
Added "Open in Colab" to task guides by @MKhalusova in #21729
typos in french documentation by @tpaviot in #21750
Make ImageProcessorMixin compatible with subfolder kwarg by @Abhinay1997 in #21725
Update doctest GH workflow file by @ydshieh in #21744
Fix 2 quicktour file doctest by @ydshieh in #21742
[GPTNeo] Fix gradient checkpointing bug by @younesbelkada in #21733
Generate: Fix GIT batched captioning by @gante in #21738
Added Type Hints for modeling_tf_encoder_decoder.py by @Batese2001 in #21673
Auto api Value Error addition to Troubleshoot by @MKhalusova in #21708
[deepspeed tests] fix issues introduced by #21700 by @stas00 in #21769
Graphormer fix by @clefourrier in #21699
fix: Change is_last chunk calc and add conditional break in chunk_iter by @connor-henderson in #21612
[Flax] adding support for batch norm layers by @Shubhamai in #21581
[Examples] Generalise run audio classification for log-mel models by @sanchit-gandhi in #21756
Different behavior in DistilBERT when using "inputs_embeds" by @ArthurZucker in #21752
[Flax] Fix erroneous kwargs being passed to generate config by @sanchit-gandhi in #21765
Generate - update cookie cutters to not initialize cache with training and gradient checkpointing by @gante in #21759
[time series] updated expected values for integration test. by @kashif in #21762
[GPT2, ProphetNet] Fix gradient checkpointing bug by @yhl48 in #21772
[SpeechT5] Fix HiFiGAN tests by @sanchit-gandhi in #21788
Fix resume_from_checkpoint for deepspeed by @mosheber in #21735
[examples/summarization] deal with max_length and num_beams by @bofenghuang in #21740
Fix type in gpt2 config docstring by @WeberJulian in #21782
Fix en documentation typos by @tpaviot in #21799
[FX tracer] Make concrete_args from outside available by @lygztq in #21775
[torch] remove deprecated uint8 in favor of bool by @ArthurZucker in #21384
[tests] add accelerate marker by @younesbelkada in #21743
Fix PyTorch Perceiver PerceiverFourierPositionEncoding with fp16 by @fxmarty in #21787
Fix nn.init.trunc_normal_ call on torch.float16 data by @fxmarty in #21789
Fix gradient checkpointing bug in gptneox by @KMFODA in #21815
Inheritance-based framework detection by @gante in #21784
Fix quality with ruff==0.0.253 by @ydshieh in #21828
introduce logger.warning_once and use it for grad checkpointing code by @stas00 in #21804
Rename MobileViTModelTest to TFMobileViTModelTest by @ydshieh in #21825
Fix gradient checkpointing bug BioGpt by @saswatmeher in #21844
check for None forced tokens by @andyehrenberg in #21793
Fix gradient checkpointing bug in git by @KMFODA in #21818
Fix gradient checkpointing imagegpt by @KMFODA in #21816
Fix tf random token masking probability in data collator by @anruijian in #21834
[T5] Fix torchquant issue by @younesbelkada in #21843
[Blip2] Add Blip2Model by @younesbelkada in #21817
Fix the issue of blip model returning loss even when the label is not provided. by @raghavanone in #21811
[GPTJ] Fix gradient checkpointing bug by @krypticmouse in #21794
Add: task guide for zero shot object detection by @MKhalusova in #21829
Make Slack CI reporting stronger by @ydshieh in #21823
[Blip2] Fix Blip-2 multi gpu by @younesbelkada in #21707
🔥Rework pipeline testing by removing PipelineTestCaseMeta 🚀 by @ydshieh in #21516
Improve TF weight loading, especially PT crossloading by @Rocketknight1 in #21792
Fix flaky test for log level by @sgugger in #21776
prepare for "floordiv is deprecated and its behavior will change in a future version of pytorch" by @ArthurZucker in #20211
[ConvBert] Fix #21523 by @ArthurZucker in #21849
Flax beam search fix by @andyehrenberg in #21857
Fix gradient checkpointing bug Bart by @saswatmeher in #21866
[deepspeed] check whether model is NLP one instead of counting on input type by @izapolsk in #21800
Change the way tensor is reshaped in BartAttention (from .view to .reshape) by @raghavanone in #21860
Italian translation of community.mdx by @lorenzobalzani in #21871
[Blip] Fix blip doctest by @younesbelkada in #21868
Removed BLIP mention from the troubleshooting guide by @MKhalusova in #21872
update FSDP and add XLA-FSDP documentation by @pacman100 in #21812
[doc] deepspeed tests by @stas00 in #21859
Add an utility file to get information from test files by @ydshieh in #21856
Add check for different embedding types in examples by @Rocketknight1 in #21881
Make loading of pretrained gpt2 faster by avoiding initialization of Conv1D's weights by @twaka in #21879
Fix Gradient checkpointing bug BigBird by @saswatmeher in #21882
Fix test_load_default_pipelines_pt for ClapModel by @ydshieh in #21886
fix checkpoint by @ArthurZucker in #21874
[Refactor] Relative imports wherever we can by @ArthurZucker in #21880
[ZAC] fix ci daily by @ArthurZucker in #21893
Use PyAV instead of Decord in examples by @amyeroberts in #21572
Add inputs_embeds functionality when generating with BioGPT by @sidkiblawi in #21889
[T5 doc] Fix confusing documentation about d_kv by @ArthurZucker in #21896
fix typo in Bart's attention by @kashif in #21898
[GPT-J] add deprecation warning by @ArthurZucker in #21869
fsdp bf16 enable autocast by @pacman100 in #21847
Fix gradient checkpointing bug LED by @KMFODA in #21840
Fix gradient checkpointing bug M2M 100 by @KMFODA in #21841
Fix gradient checkpointing bug marian by @KMFODA in #21842
Mark pipeline tests to skip them easily by @sgugger in #21887
Clean up auto mapping names by @ydshieh in #21903
Prophetnet batch dimension inversion fix by @kiansierra in #21870
Make schedulers picklable by making lr_lambda fns global by @connor-henderson in #21768
Add Blip and Blip2 for pipeline tests by @ydshieh in #21904
Temporarily skip 3 tests in BridgeTowerModelTest by @ydshieh in #21908
Faster zero shot image by @Narsil in #21897
[time series] Add Time series inputs tests by @kashif in #21846
Avoid modeling tests run in pipeline CI jobs by @ydshieh in #21911
Fix doctests for TFVisionTextDualEncoder by @Rocketknight1 in #21910
faster forward following what is done for images by @ArthurZucker in #21906
Fix gradient checkpointing bug in MBart by @KMFODA in #21918
Fix gradient checkpointing bug in mvp by @KMFODA in #21920
Fix gradient checkpointing megatron bert by @KMFODA in #21921
Use large VM for repo_utils_job by @ydshieh in #21928
Cleanup more auto mapping names by @ydshieh in #21909
feat: filter try/except when looking at custom code by @zanussbaum in #21914
Fix AlignModelTest tests by @ydshieh in #21923
Avoid failure in check_repo.py due to missing backends by @ydshieh in #21930
Fix wrong documentation about DataCollator padding defaults by @substanc3-dev in #21919
[Flan-UL2] Add-flan-ul2 by @ArthurZucker in #21929
Update README logo by @gary149 in #21933
[CLAP] Support batched inputs for CLAP. Fixes pipeline issues by @ArthurZucker in #21931
Fix gradient checkpointing bug in OPT by @KMFODA in #21943
Fix gradient checkpointing bug in Pegasus by @KMFODA in #21944
Fix gradient checkpointing bug in Rembert by @KMFODA in #21945
Fix gradient checkpointing bug in Roformer by @KMFODA in #21946
Fixed gradient_checkpointing/use_cache bug in blenderbot by @Batese2001 in #21833
Update expected values in XLMProphetNetModelIntegrationTest by @ydshieh in #21957
[CI] Fix ci by @ArthurZucker in #21940
Disable DDP for neuron by @aws-sangeetha in #21953
Fix bert issue by @saswatmeher in #21963
[Generate] Fix gradient_checkpointing and use_cache bug for BLOOM by @asrimanth in #21956
Add missing parameter definition in layoutlm config by @Atomnp in #21960
Use larger atol in torch.allclose for some tests by @ydshieh in #21966
Add TF contrastive image text finetuning example by @Rocketknight1 in #21939
Update expected values for test_xglm_sample by @ydshieh in #21975
Fix gradient checkpointing bug in BigBird Pegasus by @KMFODA in #21976
Fix gradient checkpointing bug in Blenderbot Small by @KMFODA in #21977
Fix gradient checkpointing bug in BlipText by @KMFODA in #21978
Fix gradient checkpointing bug in Codegen by @KMFODA in #21979
Fix gradient checkpointing bug in ESM by @KMFODA in #21980
docs: improve clarity for language modeling by @pdhall99 in #21952
Update Jukebox tests by @ydshieh in #21984
Add check before int casting for PIL conversion by @amyeroberts in #21969
Fix MinNewTokensLengthLogitsProcessor when used with a list of eos tokens by @eladsegal in #21959
[DETR, YOLOS] Fix device bug by @NielsRogge in #21974
Remove unneeded casts to bool by @regisss in #21983
Update notification_service.py by @ydshieh in #21992
Skip test_multi_gpu_data_parallel_forward for some model tests by @ydshieh in #21991
Stop requiring Torch for our TF examples! by @Rocketknight1 in #21997
[TF] Fix creating a PR while pushing in TF framework by @ArthurZucker in #21968
[DETR and friends] Remove is_timm_available by @NielsRogge in #21814
Update tiny model creation script and some others files by @ydshieh in #22006
Generate - add 1 to cur_len to make up the new beam length by @jimmieliu in #21993
VideoMAE doctest - use valid dummy pixel values by @amyeroberts in #22022
update: bertology paper by @QiushiSun in #22012
Update AudioClassificationPipelineTests::test_small_model_pt for PT 2.0.0 by @ydshieh in #22023
[bnb] Fix bnb error message by @younesbelkada in #22026
Fix test for torchneuroncore in Trainer by @sgugger in #22028
Add tokenize_kwargs parameter definition in the FeatureExtractionPipeline by @anruijian in #22031
[examples/speech-recognition] Add SpecAugment to run_speech_recognition_seq2seq.py by @bofenghuang in #21942
Avoid text_config_dict and vision_config_dict being saved for CLIP-like models by @ydshieh in #22035
Mark all BridgeTower tests slow for now by @ydshieh in #22039
Bug fix: token classification pipeline while passing offset_mapping by @cceyda in #22034
Update ALIGN docs by @alaradirik in #22025
[21737][T5]: Fix gradient checkpoint bug by @nipunjindal in #22036
Docs Improvement - In ZSH, not using ' ' around pip install fails, fix it by @shaun-scale in #22045
Can't install tf2 on M1 Chip by default by @shaun-scale in #22046
Remove set_access_token usage + fail tests if FutureWarning by @Wauplin in #22051
Show the number of huggingface_hub warnings in CI report by @ydshieh in #22054
Return analysis for hyperparameter_search with Ray backend by @anruijian in #22040
pt-to-tf model architecture override by @Rocketknight1 in #22055
rm $ symbol from code block from contributing.md by @kamalkraj in #22057
[deepspeed] offload + non-cpuadam optimizer exception by @stas00 in #22043
Edit the docstring of image_processing_donut to match code by @vermouthmjl in #22033
Add setters by type of args to TrainingArguments by @sgugger in #21570
Update tiny model creation script by @ydshieh in #22058
Fix case when using --gradient_accumulation_steps with DDP disabled. by @aws-sangeetha in #22007
Add a progress bar for the total download of shards by @sgugger in #22062
Fix gradient checkpointing bug in Speech2Text by @KMFODA in #22079
Fix gradient checkpointing bug in switch transformer by @KMFODA in #22081
[GPT2] Propose fix for #21080 by @ArthurZucker in #21853
Fix small typo in flan-ul2.mdx by @kevin51jiang in #22068
Generate - Fix broken documentation links by @gante in #22078
Fix gradient checkpointing bug in Speecht5 by @KMFODA in #22080
Fix hint in src/transformers/modeling_utils.py by @J-shang in #22074
handle numpy inputs in whole word mask data collator by @dwyatte in #22032
GPT-J specific half precision on CPU note by @MKhalusova in #22086
Fix imports of TF MobileViT by @sgugger in #22065
Revert "[GPT2] Propose fix for #21080" by @ydshieh in #22093
Add AutoModelForZeroShotImageClassification by @alaradirik in #22087
add new model of MGP-STR by @wdp-007 in #21418
Add pr_checks.mdx Italian translation by @alexcalabrese in #17459)
Fix gradient checkpointing bug in xglm by @KMFODA in #22127
Add TFVisionTextDualEncoder by @Rocketknight1 in #21873
Fix gradient checkpointing bug in Trajectory Transformer by @KMFODA in #22125
Fix gradient checkpointing bug in xlm_roberta_xl by @KMFODA in #22128
Added big_models.mdx italian translation #17600 by @nickprock in #22115
[Blip2] skip accelerate test by @younesbelkada in #22124
Fix gradient checkpointing bug in xmod by @KMFODA in #22129
Fix gradient checkpointing bug in LongT5 by @KMFODA in #22130
Fix gradient checkpointing bug in trocr by @KMFODA in #22126
Zero-shot image classification task guide by @MKhalusova in #22132
Fix doc link for MGP-STR by @sgugger in #22138
Adding Type Hints to TF_Pegasus model by @mollerup23 in #21941
Add a new script to check model testers' config by @ydshieh in #22063
Update configuration_align.py (projected_dim=640) by @bishmdl76 in #22139
Trainer: let generate pick its inputs by @gante in #22108
Enforce same behavior as PyTorch 2.0 for older versions by @sgugger in #22136
[trainer] fix bug in grad accum with multiple epochs by @stas00 in #22098
[deepspeed docs] Activation Checkpointing by @stas00 in #22099
Remove backend check for torch.compile by @sgugger in #22140
Prepare daily CI for torch 2.0.0 by @ydshieh in #22135
docs: New terms and updates to glossary by @MichaelRipa in #21982
Move is_pipeline_test_to_skip to specific model test classes by @ydshieh in #21999
Add ConvNeXT V2 by @alaradirik in #21679
Update 2 doctest expected values for torch 2.0.0 by @ydshieh in #22148
Translation Italian: perf_train_cpu and perf_train_cpu_many by @nickprock in #22151
Fix big model inference for T5 models in float16 by @sgugger in #22095
Create MaskedImageCompletionOutput and fix ViT docs by @alaradirik in #22152
to_pil - don't rescale if int and in range 0-255 by @amyeroberts in #22158
[trainer] add --optim adamw_torch_fused for pt-2.0+ by @stas00 in #22144
Revert "Enforce same behavior as PyTorch 2.0 for older versions" by @sgugger in #22163

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@abhiwand
- Add BridgeTower model (#20775)
- Add loss for BridgeTowerForMaskedLM and BridgeTowerForImageAndTextRetrieval (#21684)
- [WIP] Add BridgeTowerForContrastiveLearning (#21964)
@wonhyeongseo
- [i18n-KO] Translated quicktour page to Korean (#20946)
@ErwannMillon
- Add VQGAN-CLIP research project (#21329)
@NoB0
- [i18n-fr] Translate index page to French (#21458)
- [i18n-fr] Translate quicktour page to French (#21589)
@jvamvas
- Add X-MOD (#20939)
- [WIP] Move X-MOD models to facebook organization (#21640)
@susnato
- Add Ernie-M Model to huggingface (#21349)
@zinengtang
- Add TVLT (#20725)
@andyehrenberg
- add flax whisper implementation (#20479)
- check for None forced tokens (#21793)
- Flax beam search fix (#21857)
@tanreinama
- add GPTSAN model (reopen) (#21291)
@jonatanklosko
- Add WhisperTokenizerFast (#21222)
@Skylion007
- Apply ruff flake8-comprehensions (#21694)
@kiansierra
- Prophetnet batch dimension inversion fix (#21870)
@elisim
- [Time-Series] informer model (#21099)
@wdp-007
- add new model of MGP-STR (#21418)