release notes
release notes
Published 6/8/2023
MinorContains breaking changesTransformers has just reached 100k stars on GitHub, and to celebrate we wanted to highlight 100 projects in the vicinity of transformers and we have decided to create an awesome-transformers page to do just that.
We accept PRs to add projects to the list!
By leveraging the bitsandbytes library by @TimDettmers, we add 4-bit support to transformers models!
The Agents framework has been improved and continues to be stabilized. Among bug fixes, here are the important new features that were added:
transformers instead of relying on APIs.AzureOpenAiAgent class to support Azure OpenAI agents.The safetensors library is a safe serialization framework for machine learning tensors. It has been audited and will become the default serialization framework for several organizations (Hugging Face, EleutherAI, Stability AI).
It has now become a core dependency of transformers.
The SwiftFormer paper introduces a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations in the self-attention computation with linear element-wise multiplications. A series of models called ‘SwiftFormer’ is built based on this, which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Even their small variant achieves 78.5% top-1 ImageNet1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2× faster compared to MobileViT-v2.
This model augments the Transformer as a deep decomposition architecture, which can progressively decompose the trend and seasonal components during the forecasting process.
MobileViTV2 is the second version of MobileViT, constructed by replacing the multi-headed self-attention in MobileViT with separable self-attention.
PerSAM proposes a minimal modification to SAM to allow dreambooth-like personalization, enabling to segment concepts in new images using just one example.
We add support for loading timm weights within the AutoBackbone API in transformers. timm models can be instantiated through the TimmBackbone class, and then used with any vision model that needs a backbone.
We add conditional text generation to the image to text pipeline; allowing the model to continue generating an initial text prompt according to an image.
A major rework of the internals of the Trainer is underway, leveraging accelerate instead of redefining them in transformers. This should unify both framework and lead to increased interoperability and more efficient development.
accelerator.prepare by @pacman100 in #23914chore: allow protobuf 3.20.3 requirement by @jose-turintech in #22759
Bring back the PR Refactor doctests + add CI to main by @ydshieh in #23271
[gpt] Gpt2 fix half precision causal mask by @younesbelkada in #23256
Temporary tolerance fix for flaky whipser PT-TF equiv. test by @amyeroberts in #23257
Add top_k argument to post-process of conditional/deformable-DETR by @CreatlV in #22787
transformers-cli -> huggingface-cli by @AlpinDale in #23276
Temporarily increase tol for PT-FLAX whisper tests by @amyeroberts in #23288
Added missing " in CHAT_PROMPT_TEMPLATE by @galatolofederico in #23287
Convert numpy arrays to lists before saving the evaluation metrics as json by @harisankar95 in #23268
skip test_run_squad_no_trainer for now by @ydshieh in #23302
Add gradient_checkpointing parameter to FlaxWhisperEncoder by @raghavanone in #23300
Agents extras by @LysandreJik in #23301
Fix typo in gradio-tools docs by @freddyaboulton in #23305
Remove LanguageIdentificationTool in __init__.py as we don't have it yet by @ydshieh in #23326
Fix docker image (caused by tensorflow_text) by @ydshieh in #23321
Compute the mask in-place, with less memory reads, and on CUDA on XLNetLMHeadModel by @lezcano in #23332
Only add files with modification outside doc blocks by @ydshieh in #23327
[docs] Fix Agents and Tools docstring by @stevhliu in #23313
Handle padding warning in generation when using inputs_embeds by @zrthxn in #23131
replaced assert with raise ValueError for t5, switch_transformers, pix2struct, mt5, longt5, gptsan_japanese. by @susnato in #23273
Use cu118 with cudnn >= 8.6 in docker file by @ydshieh in #23339
Removing one of the twice defined position_embeddings in LongFormer by @GregorySenay in #23343
Typo suggestion by @richardachen in #23360
Fix OwlViTForObjectDetection.image_guided_detection doc example by @ydshieh in #23370
Revert "Only add files with modification outside doc blocks" by @ydshieh in #23371
[Bugfix] OPTDecoderLayer does not return attentions when gradient_checkpointing and training is enabled. by @gmlwns2000 in #23367
Skip failing AlignModelTest::test_multi_gpu_data_parallel_forward by @ydshieh in #23374
Fix test typos - audio feature extractors by @LWprogramming in #23310
Added type hints for Graphormer pytorch version by @dewasahu2003 in #23073
Replace NumPy Operations with JAX NumPy Equivalents for JIT Compilation Compatibility by @gojiteji in #23356
Use mkstemp to replace deprecated mktemp by @ready-research in #23372
Update test_batched_inference_image_captioning_conditioned by @ydshieh in #23391
OPT/BioGPT: Improved attention mask shape exception by @gante in #23270
Fix chat prompt in HFAgent by @IvanSedykh in #23335
🌐 [i18n-KO] Translated asr.mdx to Korean by @sim-so in #23106
[Pix2Struct] Add conditional generation on docstring example by @younesbelkada in #23399
Generate: faster can_generate check on TF and Flax by @gante in #23398
[AutoModel] fix torch_dtype=auto in from_pretrained by @stas00 in #23379
Docs: add link to assisted generation blog post by @gante in #23397
Replace appends with list comprehension. by @ttsugriy in #23359
Why crash the whole run when HFHub gives a 50x error? by @ropoctl in #23320
Run doctest (in PRs) only when some doc example(s) are modified by @ydshieh in #23387
Update ConvNextV2ModelIntegrationTest::test_inference_image_classification_head by @ydshieh in #23402
Use dict.items to avoid unnecessary lookups. by @ttsugriy in #23415
[SAM] fix sam slow test by @younesbelkada in #23376
Return early once stop token is found. by @ttsugriy in #23421
[Reland] search model buffers for dtype as the last resort by @cyyever in #23319
Add Missing tokenization test [electra] by @IMvision12 in #22997
Small fixes and link in the README by @LysandreJik in #23428
TF: embeddings out of bounds check factored into function by @gante in #23427
Encoder-Decoder: add informative exception when the decoder is not compatible by @gante in #23426
Remove hardcoded prints in Trainer by @hugoabonizio in #23432
Fix device issue in SwiftFormerModelIntegrationTest::test_inference_image_classification_head by @ydshieh in #23435
Generate: skip left-padding tests on old models by @gante in #23437
remove unnecessary print in gpt neox sequence classifier by @cfhammill in #23433
🌐 [i18n-KO] Translated tasks/zero_shot_object_detection.mdx to Korean by @HanNayeoniee in #23430
Fix (skip) a pipeline test for RwkvModel by @ydshieh in #23444
Fix DecisionTransformerConfig doctring by @joaoareis in #23450
Make RwkvModel accept attention_mask but discard it internally by @ydshieh in #23442
Less flaky test_assisted_decoding_matches_greedy_search by @ydshieh in #23451
Add an option to log result from the Agent by @sgugger in #23454
fix bug in group_texts function, that was inserting short batches by @BodaSadalla98 in #23429
feat: Whisper prompting by @connor-henderson in #22496
Remove .data usages in optimizations.py by @alanwaketan in #23417
TF port of the Segment Anything Model (SAM) by @Rocketknight1 in #22970
[RWKV] Rwkv fix for 8bit inference by @younesbelkada in #23468
Use config to set name and description if not present by @sgugger in #23473
Fix PretrainedConfig min_length docstring by @joaoareis in #23471
Fix: Change tensors to integers for torch.dynamo and torch.compile compatibility by @loevlie in #23475
[Blip] Remove redundant shift right by @younesbelkada in #23153
Fix confusing transformers installation in CI by @ydshieh in #23465
Fix tests/repo_utils/test_get_test_info.py by @ydshieh in #23485
Debug example code for MegaForCausalLM by @Tylersuard in #23382
Fix tensor device while attention_mask is not None by @zspo in #23538
Fix accelerate logger bug by @younesbelkada in #23650
Bugfix: LLaMA layer norm incorrectly changes input type and consumers lots of memory by @TimDettmers in #23535
Fix wav2vec2 is_batched check to include 2-D numpy arrays by @LWprogramming in #23223
changing the requirements to a cpu torch version that works by @sshahrokhi in #23483
Fix SAM tests and use smaller checkpoints by @Rocketknight1 in #23656
small fix to remove unused eos in processor when it's not used. by @Narsil in #23408
Fix typo in a parameter name for open llama model by @aaalexlit in #23637
🌐 [i18n-KO] Translated tasks/monocular_depth_estimation.mdx to Korean by @HanNayeoniee in #23621
[SAM] Fixes pipeline and adds a dummy pipeline test by @younesbelkada in #23684
TF version compatibility fixes by @Rocketknight1 in #23663
[Blip] Fix blip doctest by @younesbelkada in #23698
is_batched fix for remaining 2-D numpy arrays by @LWprogramming in #23309
Skip TFCvtModelTest::test_keras_fit_mixed_precision for now by @ydshieh in #23699
fix: load_best_model_at_end error when load_in_8bit is True by @dkqkxx in #23443
add GPTJ/bloom/llama/opt into model list and enhance the jit support by @sywangyi in #23291
Paged Optimizer + Lion Optimizer for Trainer by @TimDettmers in #23217
Export to ONNX doc refocused on using optimum, added tflite by @MKhalusova in #23434
fix: use bool instead of uint8/byte in Deberta/DebertaV2/SEW-D to make it compatible with TensorRT by @uchuhimo in #23683
Better TF docstring types by @Rocketknight1 in #23477
TF SAM memory reduction by @Rocketknight1 in #23732
fix: delete duplicate sentences in document_question_answering.mdx by @jungnerd in #23735
fix: Whisper generate, move text_prompt_ids trim up for max_new_tokens calculation by @connor-henderson in #23724
Overhaul TF serving signatures + dummy inputs by @Rocketknight1 in #23234
[Whisper] Reduce batch size in tests by @sanchit-gandhi in #23736
Fix the regex in get_imports to support multiline try blocks and excepts with specific exception types by @dakinggg in #23725
Remove the last few TF serving sigs by @Rocketknight1 in #23738
Fix pip install --upgrade accelerate command in modeling_utils.py by @tloen in #23747
Fix psuh_to_hub in Trainer when nothing needs pushing by @sgugger in #23751
Revamp test selection for the example tests by @sgugger in #23737
[LongFormer] code nits, removed unused parameters by @ArthurZucker in #23749
[Nllb-Moe] Fix nllb moe accelerate issue by @younesbelkada in #23758
[OPT] Doc nit, using fast is fine by @ArthurZucker in #23789
Update trainer.mdx class_weights example by @amitportnoy in #23787
no_cuda does not take effect in non distributed environment by @sywangyi in #23795
Enable code-specific revision for code on the Hub by @sgugger in #23799
add type hint in pipeline model argument by @y3sar in #23740
TF SAM shape flexibility fixes by @Rocketknight1 in #23842
🌐 [i18n-KO] Translated fast_tokenizers.mdx to Korean by @KIHOON71 in #22956
[i18n-KO] Translated video_classification.mdx to Korean by @KIHOON71 in #23026
🌐 [i18n-KO] Translated troubleshooting.mdx to Korean by @0525hhgus in #23166
Adds a FlyteCallback by @peridotml in #23759
Update collating_graphormer.py by @clefourrier in #23862
[LlamaTokenizerFast] nit update post_processor on the fly by @ArthurZucker in #23855
#23388 Issue: Update RoBERTa configuration by @vijethmoudgalya in #23863
[from_pretrained] imporve the error message when _no_split_modules is not defined by @ArthurZucker in #23861
Editing issue with pickle def with lambda function by @Natyren in #23869
Adds AutoProcessor.from_pretrained support for MCTCTProcessor by @Ubadub in #23856
🌐 [i18n-KO] Translated pad_truncation.mdx to Korean by @sim-so in #23823
Fix bug leading to missing token in GPTSanJapaneseTokenizer by @passaglia in #23883
Fix last instances of kbit -> quantized by @sgugger in #23797
fix(configuration_llama): add keys_to_ignore_at_inference to LlamaConfig by @calico-1226 in #23891
Fix Trainer when model is loaded on a different GPU by @sgugger in #23792
Support shared tensors by @thomasw21 in #23871
ensure banned_mask and indices in same device by @cauyxy in #23901
Unpin numba by @sanchit-gandhi in #23162
[bnb] add warning when no linear by @younesbelkada in #23894
fix: Replace add_prefix_space in get_prompt_ids with manual space for FastTokenizer compatibility by @connor-henderson in #23796
[RWKV] Fix RWKV 4bit by @younesbelkada in #23910
add conditional statement for auxiliary loss calculation by @harisankar95 in #23899
Raise error if loss can't be calculated - ViT MIM by @amyeroberts in #23872
Bug fix - flip_channel_order for channels first images by @amyeroberts in #23701
Update the update metadata job to use upload_folder by @sgugger in #23917
[PushToHub] Make it possible to upload folders by @NielsRogge in #23920
Skip device placement for past key values in decoder models by @sgugger in #23919
[Flax Whisper] Update decode docstring by @sanchit-gandhi in #23908
Effectively allow encoder_outputs input to be a tuple in pix2struct by @fxmarty in #23932
rename DocumentQuestionAnsweringTool parameter input to match docstring by @Adam-D-Lewis in #23939
Update stale.yml to use HuggingFaceBot by @LysandreJik in #23941
Make TF ESM inv_freq non-trainable like PyTorch by @Rocketknight1 in #23940
Revert "Update stale.yml to use HuggingFaceBot" by @LysandreJik in #23943
#23675 Registering Malay language by @soongbren in #23689
Modify device_map behavior when loading a model using from_pretrained by @SunMarc in #23922
use _make_causal_mask in clip/vit models by @kashif in #23942
Fix ReduceLROnPlateau object has no attribute 'get_last_lr' by @wasupandceacar in #23944
[MMS] Scaling Speech Technology to 1,000+ Languages | Add attention adapter to Wav2Vec2 by @patrickvonplaten in #23813
add new mms functions to doc by @patrickvonplaten in #23954
🌐 [i18n-KO] Translated object_detection.mdx to Korean by @KIHOON71 in #23164
Trainer: fixed evaluate raising KeyError for ReduceLROnPlateau by @claudius-kienle in #23952
[Whisper Tokenizer] Skip special tokens when decoding with timestamps by @sanchit-gandhi in #23945
Add an option to reduce compile() console spam by @Rocketknight1 in #23938
Fix typo in doc comment of BitsAndBytesConfig by @ledyba in #23978
Skip test_multi_gpu_data_parallel_forward for MobileViTV2ModelTest by @ydshieh in #24017
Auto tokenizer registration by @Bearnardd in #23965
expose safe_serialization argument in the pipeline API by @yessenzhar in #23775
Pix2Struct: fix wrong broadcast axis of attention mask in visual encoder by @affjljoo3581 in #23976
TensorBoard callback no longer adds hparams by @bri25yu in #23999
🌐 [i18n-KO] Translated tasks_explained.mdx to Korean by @0525hhgus in #23844
🌐 [i18n-KO] Translated language-modeling.mdx by @wonhyeongseo in #23969
🌐 [i18n-KO] Translated bertology.mdx to Korean by @wonhyeongseo in #23968
Use TruncatedNormal from Keras initializers by @hvaara in #24036
Prevent ZeroDivisionError on trainer.evaluate if model and dataset are tiny by @tomaarsen in #24049
Modification of one text example file should trigger said test by @sgugger in #24051
Tiny fix for check_self_hosted_runner.py by @ydshieh in #24052
Reduce memory usage in TF building by @Rocketknight1 in #24046
Move TF building to an actual build() method by @Rocketknight1 in #23760
Use new parametrization based weight norm if available by @ezyang in #24030
bring back filtered_test_list_cross_tests.txt by @ydshieh in #24055
Fix device placement for model-parallelism in generate for encoder/de… by @sgugger in #24025
Generate: increase left-padding test atol by @gante in #23448
[Wav2Vec2] Fix torch srcipt by @patrickvonplaten in #24062
Add support for non-rust implemented tokenization for __getitem__ method. by @jacklanda in #24039
Support PEFT models when saving the model using trainer by @younesbelkada in #24073
[Hub] Add safe_serialization in push_to_hub by @younesbelkada in #24074
Fix is_optimum_neuron_available by @michaelbenayoun in #23961
[bnb] Fix bnb skip modules by @younesbelkada in #24043
Make the TF dummies even smaller by @Rocketknight1 in #24071
Fix expected value in tests of the test fetcher by @sgugger in #24077
Update delete_doc_comment_trigger.yml by @mishig25 in #24084
Do not prepare lr scheduler as it as the right number of steps by @sgugger in #24088
Fix a tiny typo in WhisperForConditionalGeneration::generate docstring by @sadra-barikbin in #24045
[Trainer] Correct behavior of _load_best_model for PEFT models by @younesbelkada in #24103
The following contributors have made significant changes to the library over the last release:
release notes
Published 6/8/2023
MinorContains breaking changesTransformers has just reached 100k stars on GitHub, and to celebrate we wanted to highlight 100 projects in the vicinity of transformers and we have decided to create an awesome-transformers page to do just that.
We accept PRs to add projects to the list!
By leveraging the bitsandbytes library by @TimDettmers, we add 4-bit support to transformers models!
The Agents framework has been improved and continues to be stabilized. Among bug fixes, here are the important new features that were added:
transformers instead of relying on APIs.AzureOpenAiAgent class to support Azure OpenAI agents.The safetensors library is a safe serialization framework for machine learning tensors. It has been audited and will become the default serialization framework for several organizations (Hugging Face, EleutherAI, Stability AI).
It has now become a core dependency of transformers.
The SwiftFormer paper introduces a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations in the self-attention computation with linear element-wise multiplications. A series of models called ‘SwiftFormer’ is built based on this, which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Even their small variant achieves 78.5% top-1 ImageNet1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2× faster compared to MobileViT-v2.
This model augments the Transformer as a deep decomposition architecture, which can progressively decompose the trend and seasonal components during the forecasting process.
MobileViTV2 is the second version of MobileViT, constructed by replacing the multi-headed self-attention in MobileViT with separable self-attention.
PerSAM proposes a minimal modification to SAM to allow dreambooth-like personalization, enabling to segment concepts in new images using just one example.
We add support for loading timm weights within the AutoBackbone API in transformers. timm models can be instantiated through the TimmBackbone class, and then used with any vision model that needs a backbone.
We add conditional text generation to the image to text pipeline; allowing the model to continue generating an initial text prompt according to an image.
A major rework of the internals of the Trainer is underway, leveraging accelerate instead of redefining them in transformers. This should unify both framework and lead to increased interoperability and more efficient development.
accelerator.prepare by @pacman100 in #23914chore: allow protobuf 3.20.3 requirement by @jose-turintech in #22759
Bring back the PR Refactor doctests + add CI to main by @ydshieh in #23271
[gpt] Gpt2 fix half precision causal mask by @younesbelkada in #23256
Temporary tolerance fix for flaky whipser PT-TF equiv. test by @amyeroberts in #23257
Add top_k argument to post-process of conditional/deformable-DETR by @CreatlV in #22787
transformers-cli -> huggingface-cli by @AlpinDale in #23276
Temporarily increase tol for PT-FLAX whisper tests by @amyeroberts in #23288
Added missing " in CHAT_PROMPT_TEMPLATE by @galatolofederico in #23287
Convert numpy arrays to lists before saving the evaluation metrics as json by @harisankar95 in #23268
skip test_run_squad_no_trainer for now by @ydshieh in #23302
Add gradient_checkpointing parameter to FlaxWhisperEncoder by @raghavanone in #23300
Agents extras by @LysandreJik in #23301
Fix typo in gradio-tools docs by @freddyaboulton in #23305
Remove LanguageIdentificationTool in __init__.py as we don't have it yet by @ydshieh in #23326
Fix docker image (caused by tensorflow_text) by @ydshieh in #23321
Compute the mask in-place, with less memory reads, and on CUDA on XLNetLMHeadModel by @lezcano in #23332
Only add files with modification outside doc blocks by @ydshieh in #23327
[docs] Fix Agents and Tools docstring by @stevhliu in #23313
Handle padding warning in generation when using inputs_embeds by @zrthxn in #23131
replaced assert with raise ValueError for t5, switch_transformers, pix2struct, mt5, longt5, gptsan_japanese. by @susnato in #23273
Use cu118 with cudnn >= 8.6 in docker file by @ydshieh in #23339
Removing one of the twice defined position_embeddings in LongFormer by @GregorySenay in #23343
Typo suggestion by @richardachen in #23360
Fix OwlViTForObjectDetection.image_guided_detection doc example by @ydshieh in #23370
Revert "Only add files with modification outside doc blocks" by @ydshieh in #23371
[Bugfix] OPTDecoderLayer does not return attentions when gradient_checkpointing and training is enabled. by @gmlwns2000 in #23367
Skip failing AlignModelTest::test_multi_gpu_data_parallel_forward by @ydshieh in #23374
Fix test typos - audio feature extractors by @LWprogramming in #23310
Added type hints for Graphormer pytorch version by @dewasahu2003 in #23073
Replace NumPy Operations with JAX NumPy Equivalents for JIT Compilation Compatibility by @gojiteji in #23356
Use mkstemp to replace deprecated mktemp by @ready-research in #23372
Update test_batched_inference_image_captioning_conditioned by @ydshieh in #23391
OPT/BioGPT: Improved attention mask shape exception by @gante in #23270
Fix chat prompt in HFAgent by @IvanSedykh in #23335
🌐 [i18n-KO] Translated asr.mdx to Korean by @sim-so in #23106
[Pix2Struct] Add conditional generation on docstring example by @younesbelkada in #23399
Generate: faster can_generate check on TF and Flax by @gante in #23398
[AutoModel] fix torch_dtype=auto in from_pretrained by @stas00 in #23379
Docs: add link to assisted generation blog post by @gante in #23397
Replace appends with list comprehension. by @ttsugriy in #23359
Why crash the whole run when HFHub gives a 50x error? by @ropoctl in #23320
Run doctest (in PRs) only when some doc example(s) are modified by @ydshieh in #23387
Update ConvNextV2ModelIntegrationTest::test_inference_image_classification_head by @ydshieh in #23402
Use dict.items to avoid unnecessary lookups. by @ttsugriy in #23415
[SAM] fix sam slow test by @younesbelkada in #23376
Return early once stop token is found. by @ttsugriy in #23421
[Reland] search model buffers for dtype as the last resort by @cyyever in #23319
Add Missing tokenization test [electra] by @IMvision12 in #22997
Small fixes and link in the README by @LysandreJik in #23428
TF: embeddings out of bounds check factored into function by @gante in #23427
Encoder-Decoder: add informative exception when the decoder is not compatible by @gante in #23426
Remove hardcoded prints in Trainer by @hugoabonizio in #23432
Fix device issue in SwiftFormerModelIntegrationTest::test_inference_image_classification_head by @ydshieh in #23435
Generate: skip left-padding tests on old models by @gante in #23437
remove unnecessary print in gpt neox sequence classifier by @cfhammill in #23433
🌐 [i18n-KO] Translated tasks/zero_shot_object_detection.mdx to Korean by @HanNayeoniee in #23430
Fix (skip) a pipeline test for RwkvModel by @ydshieh in #23444
Fix DecisionTransformerConfig doctring by @joaoareis in #23450
Make RwkvModel accept attention_mask but discard it internally by @ydshieh in #23442
Less flaky test_assisted_decoding_matches_greedy_search by @ydshieh in #23451
Add an option to log result from the Agent by @sgugger in #23454
fix bug in group_texts function, that was inserting short batches by @BodaSadalla98 in #23429
feat: Whisper prompting by @connor-henderson in #22496
Remove .data usages in optimizations.py by @alanwaketan in #23417
TF port of the Segment Anything Model (SAM) by @Rocketknight1 in #22970
[RWKV] Rwkv fix for 8bit inference by @younesbelkada in #23468
Use config to set name and description if not present by @sgugger in #23473
Fix PretrainedConfig min_length docstring by @joaoareis in #23471
Fix: Change tensors to integers for torch.dynamo and torch.compile compatibility by @loevlie in #23475
[Blip] Remove redundant shift right by @younesbelkada in #23153
Fix confusing transformers installation in CI by @ydshieh in #23465
Fix tests/repo_utils/test_get_test_info.py by @ydshieh in #23485
Debug example code for MegaForCausalLM by @Tylersuard in #23382
Fix tensor device while attention_mask is not None by @zspo in #23538
Fix accelerate logger bug by @younesbelkada in #23650
Bugfix: LLaMA layer norm incorrectly changes input type and consumers lots of memory by @TimDettmers in #23535
Fix wav2vec2 is_batched check to include 2-D numpy arrays by @LWprogramming in #23223
changing the requirements to a cpu torch version that works by @sshahrokhi in #23483
Fix SAM tests and use smaller checkpoints by @Rocketknight1 in #23656
small fix to remove unused eos in processor when it's not used. by @Narsil in #23408
Fix typo in a parameter name for open llama model by @aaalexlit in #23637
🌐 [i18n-KO] Translated tasks/monocular_depth_estimation.mdx to Korean by @HanNayeoniee in #23621
[SAM] Fixes pipeline and adds a dummy pipeline test by @younesbelkada in #23684
TF version compatibility fixes by @Rocketknight1 in #23663
[Blip] Fix blip doctest by @younesbelkada in #23698
is_batched fix for remaining 2-D numpy arrays by @LWprogramming in #23309
Skip TFCvtModelTest::test_keras_fit_mixed_precision for now by @ydshieh in #23699
fix: load_best_model_at_end error when load_in_8bit is True by @dkqkxx in #23443
add GPTJ/bloom/llama/opt into model list and enhance the jit support by @sywangyi in #23291
Paged Optimizer + Lion Optimizer for Trainer by @TimDettmers in #23217
Export to ONNX doc refocused on using optimum, added tflite by @MKhalusova in #23434
fix: use bool instead of uint8/byte in Deberta/DebertaV2/SEW-D to make it compatible with TensorRT by @uchuhimo in #23683
Better TF docstring types by @Rocketknight1 in #23477
TF SAM memory reduction by @Rocketknight1 in #23732
fix: delete duplicate sentences in document_question_answering.mdx by @jungnerd in #23735
fix: Whisper generate, move text_prompt_ids trim up for max_new_tokens calculation by @connor-henderson in #23724
Overhaul TF serving signatures + dummy inputs by @Rocketknight1 in #23234
[Whisper] Reduce batch size in tests by @sanchit-gandhi in #23736
Fix the regex in get_imports to support multiline try blocks and excepts with specific exception types by @dakinggg in #23725
Remove the last few TF serving sigs by @Rocketknight1 in #23738
Fix pip install --upgrade accelerate command in modeling_utils.py by @tloen in #23747
Fix psuh_to_hub in Trainer when nothing needs pushing by @sgugger in #23751
Revamp test selection for the example tests by @sgugger in #23737
[LongFormer] code nits, removed unused parameters by @ArthurZucker in #23749
[Nllb-Moe] Fix nllb moe accelerate issue by @younesbelkada in #23758
[OPT] Doc nit, using fast is fine by @ArthurZucker in #23789
Update trainer.mdx class_weights example by @amitportnoy in #23787
no_cuda does not take effect in non distributed environment by @sywangyi in #23795
Enable code-specific revision for code on the Hub by @sgugger in #23799
add type hint in pipeline model argument by @y3sar in #23740
TF SAM shape flexibility fixes by @Rocketknight1 in #23842
🌐 [i18n-KO] Translated fast_tokenizers.mdx to Korean by @KIHOON71 in #22956
[i18n-KO] Translated video_classification.mdx to Korean by @KIHOON71 in #23026
🌐 [i18n-KO] Translated troubleshooting.mdx to Korean by @0525hhgus in #23166
Adds a FlyteCallback by @peridotml in #23759
Update collating_graphormer.py by @clefourrier in #23862
[LlamaTokenizerFast] nit update post_processor on the fly by @ArthurZucker in #23855
#23388 Issue: Update RoBERTa configuration by @vijethmoudgalya in #23863
[from_pretrained] imporve the error message when _no_split_modules is not defined by @ArthurZucker in #23861
Editing issue with pickle def with lambda function by @Natyren in #23869
Adds AutoProcessor.from_pretrained support for MCTCTProcessor by @Ubadub in #23856
🌐 [i18n-KO] Translated pad_truncation.mdx to Korean by @sim-so in #23823
Fix bug leading to missing token in GPTSanJapaneseTokenizer by @passaglia in #23883
Fix last instances of kbit -> quantized by @sgugger in #23797
fix(configuration_llama): add keys_to_ignore_at_inference to LlamaConfig by @calico-1226 in #23891
Fix Trainer when model is loaded on a different GPU by @sgugger in #23792
Support shared tensors by @thomasw21 in #23871
ensure banned_mask and indices in same device by @cauyxy in #23901
Unpin numba by @sanchit-gandhi in #23162
[bnb] add warning when no linear by @younesbelkada in #23894
fix: Replace add_prefix_space in get_prompt_ids with manual space for FastTokenizer compatibility by @connor-henderson in #23796
[RWKV] Fix RWKV 4bit by @younesbelkada in #23910
add conditional statement for auxiliary loss calculation by @harisankar95 in #23899
Raise error if loss can't be calculated - ViT MIM by @amyeroberts in #23872
Bug fix - flip_channel_order for channels first images by @amyeroberts in #23701
Update the update metadata job to use upload_folder by @sgugger in #23917
[PushToHub] Make it possible to upload folders by @NielsRogge in #23920
Skip device placement for past key values in decoder models by @sgugger in #23919
[Flax Whisper] Update decode docstring by @sanchit-gandhi in #23908
Effectively allow encoder_outputs input to be a tuple in pix2struct by @fxmarty in #23932
rename DocumentQuestionAnsweringTool parameter input to match docstring by @Adam-D-Lewis in #23939
Update stale.yml to use HuggingFaceBot by @LysandreJik in #23941
Make TF ESM inv_freq non-trainable like PyTorch by @Rocketknight1 in #23940
Revert "Update stale.yml to use HuggingFaceBot" by @LysandreJik in #23943
#23675 Registering Malay language by @soongbren in #23689
Modify device_map behavior when loading a model using from_pretrained by @SunMarc in #23922
use _make_causal_mask in clip/vit models by @kashif in #23942
Fix ReduceLROnPlateau object has no attribute 'get_last_lr' by @wasupandceacar in #23944
[MMS] Scaling Speech Technology to 1,000+ Languages | Add attention adapter to Wav2Vec2 by @patrickvonplaten in #23813
add new mms functions to doc by @patrickvonplaten in #23954
🌐 [i18n-KO] Translated object_detection.mdx to Korean by @KIHOON71 in #23164
Trainer: fixed evaluate raising KeyError for ReduceLROnPlateau by @claudius-kienle in #23952
[Whisper Tokenizer] Skip special tokens when decoding with timestamps by @sanchit-gandhi in #23945
Add an option to reduce compile() console spam by @Rocketknight1 in #23938
Fix typo in doc comment of BitsAndBytesConfig by @ledyba in #23978
Skip test_multi_gpu_data_parallel_forward for MobileViTV2ModelTest by @ydshieh in #24017
Auto tokenizer registration by @Bearnardd in #23965
expose safe_serialization argument in the pipeline API by @yessenzhar in #23775
Pix2Struct: fix wrong broadcast axis of attention mask in visual encoder by @affjljoo3581 in #23976
TensorBoard callback no longer adds hparams by @bri25yu in #23999
🌐 [i18n-KO] Translated tasks_explained.mdx to Korean by @0525hhgus in #23844
🌐 [i18n-KO] Translated language-modeling.mdx by @wonhyeongseo in #23969
🌐 [i18n-KO] Translated bertology.mdx to Korean by @wonhyeongseo in #23968
Use TruncatedNormal from Keras initializers by @hvaara in #24036
Prevent ZeroDivisionError on trainer.evaluate if model and dataset are tiny by @tomaarsen in #24049
Modification of one text example file should trigger said test by @sgugger in #24051
Tiny fix for check_self_hosted_runner.py by @ydshieh in #24052
Reduce memory usage in TF building by @Rocketknight1 in #24046
Move TF building to an actual build() method by @Rocketknight1 in #23760
Use new parametrization based weight norm if available by @ezyang in #24030
bring back filtered_test_list_cross_tests.txt by @ydshieh in #24055
Fix device placement for model-parallelism in generate for encoder/de… by @sgugger in #24025
Generate: increase left-padding test atol by @gante in #23448
[Wav2Vec2] Fix torch srcipt by @patrickvonplaten in #24062
Add support for non-rust implemented tokenization for __getitem__ method. by @jacklanda in #24039
Support PEFT models when saving the model using trainer by @younesbelkada in #24073
[Hub] Add safe_serialization in push_to_hub by @younesbelkada in #24074
Fix is_optimum_neuron_available by @michaelbenayoun in #23961
[bnb] Fix bnb skip modules by @younesbelkada in #24043
Make the TF dummies even smaller by @Rocketknight1 in #24071
Fix expected value in tests of the test fetcher by @sgugger in #24077
Update delete_doc_comment_trigger.yml by @mishig25 in #24084
Do not prepare lr scheduler as it as the right number of steps by @sgugger in #24088
Fix a tiny typo in WhisperForConditionalGeneration::generate docstring by @sadra-barikbin in #24045
[Trainer] Correct behavior of _load_best_model for PEFT models by @younesbelkada in #24103
The following contributors have made significant changes to the library over the last release:
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.