release notes
release notes
Published 8/22/2023
MinorContains breaking changesThe IDEFICS model was proposed in OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents by Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh
IDEFICS is the first open state-of-the-art visual language model at the 80B scale!
The model accepts arbitrary sequences of image and text and produces text, similarly to a multimodal ChatGPT.
Blogpost: hf.co/blog/idefics Playground: HuggingFaceM4/idefics_playground
MPT has been added and is now officially supported within Transformers. The repositories from MosaicML have been updated to work best with the model integration within Transformers.
MPT] Add MosaicML's MPT model to transformers by @ArthurZucker & @younesbelkada in #24629GPTQ quantization is now supported in Transformers, through the optimum library. The backend relies on the auto_gptq library, from which we use the GPTQ and QuantLinear classes.
See below for an example of the API, quantizing a model using the new GPTQConfig configuration utility.
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
model_name = "facebook/opt-125m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer, group_size=128, desc_act=False)
# works also with device_map (cpu offload works but not disk offload)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, quantization_config=config)
Most models under TheBloke namespace with the suffix GPTQ should be supported, for example, to load a GPTQ quantized model on TheBloke/Llama-2-13B-chat-GPTQ simply run (after installing latest optimum and auto-gptq libraries):
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TheBloke/Llama-2-13B-chat-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
For more information about this feature, we recommend taking a look at the following announcement blogpost: https://huggingface.co/blog/gptq-integration
A new pipeline, dedicated to text-to-audio and text-to-speech models, has been added to Transformers. It currently supports the 3 text-to-audio models integrated into transformers: SpeechT5ForTextToSpeech, MusicGen and Bark.
See below for an example:
from transformers import pipeline
classifier = pipeline(model="suno/bark")
output = pipeline("Hey it's HuggingFace on the phone!")
audio = output["audio"]
sampling_rate = output["sampling_rate"]
Classifier-Free Guidance decoding is a text generation technique developed by EleutherAI, announced in this paper. With this technique, you can increase prompt adherence in generation. You can also set it up with negative prompts, ensuring your generation doesn't go in specific directions. See its docs for usage instructions.
A new task guide going into Visual Question Answering has been added to Transformers.
We continue the deprecation of models that was introduced in https://github.com/huggingface/transformers/pull/24787.
By deprecating, we indicate that we will stop maintaining such models, but there is no intention of actually removing those models and breaking support for them (they might one day move into a separate repo/on the Hub, but we would still add the necessary imports to make sure backward compatibility stays). The main point is that we stop testing those models. The usage of the models drives this choice and aims to ease the burden on our CI so that it may be used to focus on more critical aspects of the library.
There are ongoing efforts to translate the transformers' documentation in other languages. These efforts are driven by groups independent to Hugging Face, and their work is greatly appreciated further to lower the barrier of entry to ML and Transformers.
If you'd like to kickstart such an effort or help out on an existing one, please feel free to reach out by opening an issue.
tasks/document_question_answering.md to Korean by @jungnerd in #24588quicktour.md by @wonhyeongseo in #24664serialization.md by @wonhyeongseo in #24686testing.md to Korean by @Sunmin0520 in #24900perf_train_cpu.md to Korean by @seank021 in #24911<tf_xla>.md to Korean by @54data in #24904perf_hardware.md to Korean by @augustinLib in #24966hpo_train.md to Korean by @harheem in #24968perf_infer_cpu.md to Korean by @junejae in #24920transformers_agents.md to Korean by @sim-so in #24881perf_infer_gpu_many.md to Korean by @heuristicwave in #24943perf_infer_gpu_one.md to Korean by @eenzeenee in #24978add_tensorflow_model.md to Korean by @keonju2 in #25017perf_train_cpu_many.md to Korean by @nuatmochoi in #24923add_new_model.md to Korean by @mjk0618 in #24957model_summary.md to Korean by @0525hhgus in #24625philosophy.md to Korean by @TaeYupNoh in #25010perf_train_tpu_tf.md to Korean by @0525hhgus in #25433Addition of input_data_format argument to image transforms and ImageProcessor methods, allowing the user to explicitly set the data format of the images being processed. This enables processing of images with non-standard number of channels e.g. 4 or removes error which occur when the data format was inferred but the channel dimension was ambiguous.
import numpy as np
from transformers import ViTImageProcessor
img = np.random.randint(0, 256, (4, 6, 3))
image_processor = ViTImageProcessor()
inputs = image_processor(img, image_mean=0, image_std=1, input_data_format="channels_first")
torch.scaled_dot_product_attention & Flash AttentionUsers are not aware that it is possible to force dispatch torch.scaled_dot_product_attention method from torch to use Flash Attention kernels. This leads to considerable speedup and memory saving, and is also compatible with quantized models. We decided to make this explicit to users in the documentation.
In a nutshell, one can just run:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m").to("cuda")
# convert the model to BetterTransformer
model.to_bettertransformer()
input_text = "Hello my dog is cute and"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
to enable Flash-attenion in their model. However, this feature does not support padding yet.
Users will no longer encounter CPU RAM OOM when using FSDP to train very large models in multi-gpu or multi-node multi-gpu setting.
Users no longer have to pass fsdp_transformer_layer_cls_to_wrap as the code now use _no_split_modules by default which is available for most of the popular models. DeepSpeed Z3 init now works properly with Accelerate Launcher + Trainer.
Trainer classThe default optimizer in the Trainer class has been updated to be adam_torch rather than our own adam_hf, as the official Torch optimizer is more robust and fixes some issues.
In order to keep the old behavior, ensure that you pass "adamw_hf" as the optim value in your TrainingArguments.
adamw_hf to adamw_torch 🚨🚨🚨 by @muellerzr in #25109There was an issue with the definition of the rescale of values with ViVit and EfficientNet. These have been fixed, but will result in different model outputs for both of these models. To understand the change and see what needs to be done to obtain previous results, please take a look at the following PR.
The EfficientNetForImageClassification model class did not follow conventions and added a softmax to the model logits. This was removed so that it respects the convention set by other models.
In order to obtain previous results, pass the model logits through a softmax.
Some SPM models had issues with their management of added tokens. Namely the Llama and T5, among others, were behaving incorrectly. These have been updated in https://github.com/huggingface/transformers/pull/25224.
An option to obtain the previous behavior was added through the legacy flag, as explained in the PR linked above.
SPM] Finish fix spm models 🚨🚨🚨 by @ArthurZucker in #25224use_cache=True by @ydshieh in #24893test_model_parallelism for FalconModel by @ydshieh in #24914Llama2] replace self.pretraining_tp with self.config.pretraining_tp by @younesbelkada in #24906image_processing_vilt.py wrong default documented by @stas00 in #24931main_input_name in src/transformers/keras_callbacks.py by @ydshieh in #24916LogitsProcessor class by @shauray8 in #24848RWKV] Add Gradient Checkpointing support for RWKV by @younesbelkada in #24955Parameter.ds_numel by @apoorvkh in #24942LlamaConfig] Nit: pad token should be None by default by @ArthurZucker in #24958llama tokenization doctest by @ydshieh in #24990bnb] Add simple check for bnb import by @younesbelkada in #24995Llama] remove persistent inv_freq tensor by @ArthurZucker in #24998logging.py] set default stderr path if None by @ArthurZucker in #25033TrainingArgs to wandb.config without sanitization. by @parambharat in #250358bit] Fix 8bit corner case with Blip2 8bit by @younesbelkada in #25047RWKV] Add note in doc on RwkvStoppingCriteria by @ArthurZucker in #25055TF32 flag for PyTorch cuDNN backend by @XuehaiPan in #25075per_gpu_eval_batch_size with per_device_eval_batch_size in readme of multiple-choice task by @statelesshz in #25078generate] Only warn users if the generation_config's max_length is set to the default value by @ArthurZucker in #25030ForSequenceClassification] Support left padding by @ArthurZucker in #24979TF] Also apply patch to support left padding by @ArthurZucker in #25085test_model_is_small by @connor-henderson in #25087PreTrainedTokenizerFast] Keep properties from fast tokenizer by @ArthurZucker in #25053MusicgenForConditionalGeneration tests by @ydshieh in #25091T5, MT5, UMT5] Add [T5, MT5, UMT5]ForSequenceClassification by @sjrl in #24726PvtModelIntegrationTest::test_inference_fp16 by @ydshieh in #25106use_auth_token -> token by @ydshieh in #25083T5/LlamaTokenizer] default legacy to None to not always warn by @ArthurZucker in #25131MptConfig] support from pretrained args by @ArthurZucker in #25116token things by @ydshieh in #25146.push_to_hub and cleanup get_full_repo_name usage by @Wauplin in #25120use_auth_token -> token in example scripts by @ydshieh in #25167Mpt] Fix mpt slow test by @younesbelkada in #25170InstructBlip] Fix instructblip slow test by @younesbelkada in #25171_prepare_output_docstrings by @ydshieh in #25202PreTrainedModel] Wrap cuda and to method correctly by @younesbelkada in #25206all_model_classes in FlaxBloomGenerationTest by @ydshieh in #25211pipeline] revisit device check for pipeline by @younesbelkada in #25207Pix2Struct] Fix pix2struct cross attention by @younesbelkada in #25200Docs/quantization] Clearer explanation on how things works under the hood. + remove outdated info by @younesbelkada in #25216MPT] Add require_bitsandbytes on MPT integration tests by @younesbelkada in #25201Detr] Fix detr BatchNorm replacement issue by @younesbelkada in #25230token arugment in example scripts by @ydshieh in #25172pytest_options={"rA": None} in CI by @ydshieh in #25263num_hidden_layers=2 🚀🚀🚀 by @ydshieh in #25266pytest_num_workers=8 for torch/tf jobs by @ydshieh in #25274report_to logging integrations in docstring by @tomaarsen in #25281bark could have tiny model by @ydshieh in #25290trust_remote_code in example scripts by @Jackmin801 in #25248Repository to upload_folder by @sgugger in #25095NoRepeatNGramLogitsProcessor Example for LogitsProcessor class by @Rishab26 in #25186torch.compile() for vision models by @merveenoyan in #24748test_model_parallelism by @ydshieh in #25359token in example template by @ydshieh in #25351torch_job worker(s) crashing by @ydshieh in #25374token by @ydshieh in #25382OneFormerModelTest.test_model_with_labels by @ydshieh in #25383TopPLogitsWarper by @chiral-carbon in #25361device_map is passed by @gante in #25413torch.compile() docs by @merveenoyan in #25432examples to tests to run when setup.py is modified by @ydshieh in #25437main on PRs/branches if setup.py is not modified by @ydshieh in #25445main on PRs/branches" by @ydshieh in #25466auxiliary_head is None in UperNetPreTrainedModel by @mmurray in #25514MaskFormerModelIntegrationTest OOM by @ydshieh in #25544torch.fx tests on nightly CI by @ydshieh in #25549test_onnx_runtime_optimize for now by @ydshieh in #25560Docs] Fix un-rendered images by @younesbelkada in #25561TRANSFORMERS_TEST_DEVICE by @vvvm23 in #25506test_beam_search_xla_generate_simple for T5 by @ydshieh in #25566resize_embedding] Introduce pad_to_multiple_of and guidance by @ArthurZucker in #25088SwitchTransformers] Remove unused module by @ArthurZucker in #25427NllbMoe] Update code to properly support loss computation by @ArthurZucker in #25429Tests] Fix failing 8bit test by @younesbelkada in #25564test_contrastive_generate for TFXLNet by @ydshieh in #25574Docs / BetterTransformer ] Added more details about flash attention + SDPA by @younesbelkada in #25265.cuda with .to(torch_device) in tests by @vvvm23 in #25571split_special_tokens] Add support for split_special_tokens argument to encode by @ArthurZucker in #25081Llama] remove prompt and fix prefix finetuning by @ArthurZucker in #25565TokenizerFast] Fix setting prefix space in init by @ArthurZucker in #25563resize_token_embeddings by @SunMarc in #25596The following contributors have made significant changes to the library over the last release:
testing.md to Korean (#24900)T5, MT5, UMT5] Add [T5, MT5, UMT5]ForSequenceClassification (#24726)trust_remote_code in example scripts (#25248)add_new_model.md to Korean (#24957)release notes
Published 8/22/2023
MinorContains breaking changesThe IDEFICS model was proposed in OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents by Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh
IDEFICS is the first open state-of-the-art visual language model at the 80B scale!
The model accepts arbitrary sequences of image and text and produces text, similarly to a multimodal ChatGPT.
Blogpost: hf.co/blog/idefics Playground: HuggingFaceM4/idefics_playground
MPT has been added and is now officially supported within Transformers. The repositories from MosaicML have been updated to work best with the model integration within Transformers.
MPT] Add MosaicML's MPT model to transformers by @ArthurZucker & @younesbelkada in #24629GPTQ quantization is now supported in Transformers, through the optimum library. The backend relies on the auto_gptq library, from which we use the GPTQ and QuantLinear classes.
See below for an example of the API, quantizing a model using the new GPTQConfig configuration utility.
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
model_name = "facebook/opt-125m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer, group_size=128, desc_act=False)
# works also with device_map (cpu offload works but not disk offload)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, quantization_config=config)
Most models under TheBloke namespace with the suffix GPTQ should be supported, for example, to load a GPTQ quantized model on TheBloke/Llama-2-13B-chat-GPTQ simply run (after installing latest optimum and auto-gptq libraries):
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TheBloke/Llama-2-13B-chat-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
For more information about this feature, we recommend taking a look at the following announcement blogpost: https://huggingface.co/blog/gptq-integration
A new pipeline, dedicated to text-to-audio and text-to-speech models, has been added to Transformers. It currently supports the 3 text-to-audio models integrated into transformers: SpeechT5ForTextToSpeech, MusicGen and Bark.
See below for an example:
from transformers import pipeline
classifier = pipeline(model="suno/bark")
output = pipeline("Hey it's HuggingFace on the phone!")
audio = output["audio"]
sampling_rate = output["sampling_rate"]
Classifier-Free Guidance decoding is a text generation technique developed by EleutherAI, announced in this paper. With this technique, you can increase prompt adherence in generation. You can also set it up with negative prompts, ensuring your generation doesn't go in specific directions. See its docs for usage instructions.
A new task guide going into Visual Question Answering has been added to Transformers.
We continue the deprecation of models that was introduced in https://github.com/huggingface/transformers/pull/24787.
By deprecating, we indicate that we will stop maintaining such models, but there is no intention of actually removing those models and breaking support for them (they might one day move into a separate repo/on the Hub, but we would still add the necessary imports to make sure backward compatibility stays). The main point is that we stop testing those models. The usage of the models drives this choice and aims to ease the burden on our CI so that it may be used to focus on more critical aspects of the library.
There are ongoing efforts to translate the transformers' documentation in other languages. These efforts are driven by groups independent to Hugging Face, and their work is greatly appreciated further to lower the barrier of entry to ML and Transformers.
If you'd like to kickstart such an effort or help out on an existing one, please feel free to reach out by opening an issue.
tasks/document_question_answering.md to Korean by @jungnerd in #24588quicktour.md by @wonhyeongseo in #24664serialization.md by @wonhyeongseo in #24686testing.md to Korean by @Sunmin0520 in #24900perf_train_cpu.md to Korean by @seank021 in #24911<tf_xla>.md to Korean by @54data in #24904perf_hardware.md to Korean by @augustinLib in #24966hpo_train.md to Korean by @harheem in #24968perf_infer_cpu.md to Korean by @junejae in #24920transformers_agents.md to Korean by @sim-so in #24881perf_infer_gpu_many.md to Korean by @heuristicwave in #24943perf_infer_gpu_one.md to Korean by @eenzeenee in #24978add_tensorflow_model.md to Korean by @keonju2 in #25017perf_train_cpu_many.md to Korean by @nuatmochoi in #24923add_new_model.md to Korean by @mjk0618 in #24957model_summary.md to Korean by @0525hhgus in #24625philosophy.md to Korean by @TaeYupNoh in #25010perf_train_tpu_tf.md to Korean by @0525hhgus in #25433Addition of input_data_format argument to image transforms and ImageProcessor methods, allowing the user to explicitly set the data format of the images being processed. This enables processing of images with non-standard number of channels e.g. 4 or removes error which occur when the data format was inferred but the channel dimension was ambiguous.
import numpy as np
from transformers import ViTImageProcessor
img = np.random.randint(0, 256, (4, 6, 3))
image_processor = ViTImageProcessor()
inputs = image_processor(img, image_mean=0, image_std=1, input_data_format="channels_first")
torch.scaled_dot_product_attention & Flash AttentionUsers are not aware that it is possible to force dispatch torch.scaled_dot_product_attention method from torch to use Flash Attention kernels. This leads to considerable speedup and memory saving, and is also compatible with quantized models. We decided to make this explicit to users in the documentation.
In a nutshell, one can just run:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m").to("cuda")
# convert the model to BetterTransformer
model.to_bettertransformer()
input_text = "Hello my dog is cute and"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
to enable Flash-attenion in their model. However, this feature does not support padding yet.
Users will no longer encounter CPU RAM OOM when using FSDP to train very large models in multi-gpu or multi-node multi-gpu setting.
Users no longer have to pass fsdp_transformer_layer_cls_to_wrap as the code now use _no_split_modules by default which is available for most of the popular models. DeepSpeed Z3 init now works properly with Accelerate Launcher + Trainer.
Trainer classThe default optimizer in the Trainer class has been updated to be adam_torch rather than our own adam_hf, as the official Torch optimizer is more robust and fixes some issues.
In order to keep the old behavior, ensure that you pass "adamw_hf" as the optim value in your TrainingArguments.
adamw_hf to adamw_torch 🚨🚨🚨 by @muellerzr in #25109There was an issue with the definition of the rescale of values with ViVit and EfficientNet. These have been fixed, but will result in different model outputs for both of these models. To understand the change and see what needs to be done to obtain previous results, please take a look at the following PR.
The EfficientNetForImageClassification model class did not follow conventions and added a softmax to the model logits. This was removed so that it respects the convention set by other models.
In order to obtain previous results, pass the model logits through a softmax.
Some SPM models had issues with their management of added tokens. Namely the Llama and T5, among others, were behaving incorrectly. These have been updated in https://github.com/huggingface/transformers/pull/25224.
An option to obtain the previous behavior was added through the legacy flag, as explained in the PR linked above.
SPM] Finish fix spm models 🚨🚨🚨 by @ArthurZucker in #25224use_cache=True by @ydshieh in #24893test_model_parallelism for FalconModel by @ydshieh in #24914Llama2] replace self.pretraining_tp with self.config.pretraining_tp by @younesbelkada in #24906image_processing_vilt.py wrong default documented by @stas00 in #24931main_input_name in src/transformers/keras_callbacks.py by @ydshieh in #24916LogitsProcessor class by @shauray8 in #24848RWKV] Add Gradient Checkpointing support for RWKV by @younesbelkada in #24955Parameter.ds_numel by @apoorvkh in #24942LlamaConfig] Nit: pad token should be None by default by @ArthurZucker in #24958llama tokenization doctest by @ydshieh in #24990bnb] Add simple check for bnb import by @younesbelkada in #24995Llama] remove persistent inv_freq tensor by @ArthurZucker in #24998logging.py] set default stderr path if None by @ArthurZucker in #25033TrainingArgs to wandb.config without sanitization. by @parambharat in #250358bit] Fix 8bit corner case with Blip2 8bit by @younesbelkada in #25047RWKV] Add note in doc on RwkvStoppingCriteria by @ArthurZucker in #25055TF32 flag for PyTorch cuDNN backend by @XuehaiPan in #25075per_gpu_eval_batch_size with per_device_eval_batch_size in readme of multiple-choice task by @statelesshz in #25078generate] Only warn users if the generation_config's max_length is set to the default value by @ArthurZucker in #25030ForSequenceClassification] Support left padding by @ArthurZucker in #24979TF] Also apply patch to support left padding by @ArthurZucker in #25085test_model_is_small by @connor-henderson in #25087PreTrainedTokenizerFast] Keep properties from fast tokenizer by @ArthurZucker in #25053MusicgenForConditionalGeneration tests by @ydshieh in #25091T5, MT5, UMT5] Add [T5, MT5, UMT5]ForSequenceClassification by @sjrl in #24726PvtModelIntegrationTest::test_inference_fp16 by @ydshieh in #25106use_auth_token -> token by @ydshieh in #25083T5/LlamaTokenizer] default legacy to None to not always warn by @ArthurZucker in #25131MptConfig] support from pretrained args by @ArthurZucker in #25116token things by @ydshieh in #25146.push_to_hub and cleanup get_full_repo_name usage by @Wauplin in #25120use_auth_token -> token in example scripts by @ydshieh in #25167Mpt] Fix mpt slow test by @younesbelkada in #25170InstructBlip] Fix instructblip slow test by @younesbelkada in #25171_prepare_output_docstrings by @ydshieh in #25202PreTrainedModel] Wrap cuda and to method correctly by @younesbelkada in #25206all_model_classes in FlaxBloomGenerationTest by @ydshieh in #25211pipeline] revisit device check for pipeline by @younesbelkada in #25207Pix2Struct] Fix pix2struct cross attention by @younesbelkada in #25200Docs/quantization] Clearer explanation on how things works under the hood. + remove outdated info by @younesbelkada in #25216MPT] Add require_bitsandbytes on MPT integration tests by @younesbelkada in #25201Detr] Fix detr BatchNorm replacement issue by @younesbelkada in #25230token arugment in example scripts by @ydshieh in #25172pytest_options={"rA": None} in CI by @ydshieh in #25263num_hidden_layers=2 🚀🚀🚀 by @ydshieh in #25266pytest_num_workers=8 for torch/tf jobs by @ydshieh in #25274report_to logging integrations in docstring by @tomaarsen in #25281bark could have tiny model by @ydshieh in #25290trust_remote_code in example scripts by @Jackmin801 in #25248Repository to upload_folder by @sgugger in #25095NoRepeatNGramLogitsProcessor Example for LogitsProcessor class by @Rishab26 in #25186torch.compile() for vision models by @merveenoyan in #24748test_model_parallelism by @ydshieh in #25359token in example template by @ydshieh in #25351torch_job worker(s) crashing by @ydshieh in #25374token by @ydshieh in #25382OneFormerModelTest.test_model_with_labels by @ydshieh in #25383TopPLogitsWarper by @chiral-carbon in #25361device_map is passed by @gante in #25413torch.compile() docs by @merveenoyan in #25432examples to tests to run when setup.py is modified by @ydshieh in #25437main on PRs/branches if setup.py is not modified by @ydshieh in #25445main on PRs/branches" by @ydshieh in #25466auxiliary_head is None in UperNetPreTrainedModel by @mmurray in #25514MaskFormerModelIntegrationTest OOM by @ydshieh in #25544torch.fx tests on nightly CI by @ydshieh in #25549test_onnx_runtime_optimize for now by @ydshieh in #25560Docs] Fix un-rendered images by @younesbelkada in #25561TRANSFORMERS_TEST_DEVICE by @vvvm23 in #25506test_beam_search_xla_generate_simple for T5 by @ydshieh in #25566resize_embedding] Introduce pad_to_multiple_of and guidance by @ArthurZucker in #25088SwitchTransformers] Remove unused module by @ArthurZucker in #25427NllbMoe] Update code to properly support loss computation by @ArthurZucker in #25429Tests] Fix failing 8bit test by @younesbelkada in #25564test_contrastive_generate for TFXLNet by @ydshieh in #25574Docs / BetterTransformer ] Added more details about flash attention + SDPA by @younesbelkada in #25265.cuda with .to(torch_device) in tests by @vvvm23 in #25571split_special_tokens] Add support for split_special_tokens argument to encode by @ArthurZucker in #25081Llama] remove prompt and fix prefix finetuning by @ArthurZucker in #25565TokenizerFast] Fix setting prefix space in init by @ArthurZucker in #25563resize_token_embeddings by @SunMarc in #25596The following contributors have made significant changes to the library over the last release:
testing.md to Korean (#24900)T5, MT5, UMT5] Add [T5, MT5, UMT5]ForSequenceClassification (#24726)trust_remote_code in example scripts (#25248)add_new_model.md to Korean (#24957)🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.