release notes
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
release notes
Published 1/22/2024
MinorContains new featuresrelease notes
Published 1/22/2024
MinorContains new featuresQwen2 is the new model series of large language models from the Qwen team. Previously, the Qwen series was released, including Qwen-72B, Qwen-1.8B, Qwen-VL, Qwen-Audio, etc.
Qwen2 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes.
Phi-2 is a transformer language model trained by Microsoft with exceptionally strong performance for its small size of 2.7 billion parameters. It was previously available as a custom code model, but has now been fully integrated into transformers.
phi-2 example by @susnato in #28392softmax_scale in PhiFlashAttention2. by @gugarosa in #28537The SigLIP model was proposed in Sigmoid Loss for Language Image Pre-Training by Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer. SigLIP proposes to replace the loss function used in CLIP by a simple pairwise sigmoid loss. This results in better performance in terms of zero-shot classification accuracy on ImageNet.
The VipLlava model was proposed in Making Large Multimodal Models Understand Arbitrary Visual Prompts by Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee.
VipLlava enhances the training protocol of Llava by marking images and interact with the model using natural cues like a “red bounding box” or “pointed arrow” during training.
The FastSpeech2Conformer model was proposed with the paper Recent Developments On Espnet Toolkit Boosted By Conformer by Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe, Kun Wei, Wangyou Zhang, and Yuekai Zhang.
FastSpeech 2 is a non-autoregressive model for text-to-speech (TTS) synthesis, which develops upon FastSpeech, showing improvements in training speed, inference speed and voice quality. It consists of a variance adapter; duration, energy and pitch predictor and waveform and mel-spectrogram decoder.
The Wav2Vec2-BERT model was proposed in Seamless: Multilingual Expressive and Streaming Speech Translation by the Seamless Communication team from Meta AI.
This model was pre-trained on 4.5M hours of unlabeled audio data covering more than 143 languages. It requires finetuning to be used for downstream tasks such as Automatic Speech Recognition (ASR), or Audio Classification.
Enables saving and loading transformers models in 4bit formats - you can now push bitsandbytes 4-bit weights on Hugging Face Hub. To save 4-bit models and push them on the hub, simply install the latest bitsandbytes package from pypi pip install -U bitsandbytes, load your model in 4-bit precision and call save_pretrained / push_to_hub. An example repo here
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "facebook/opt-125m"
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)
model.push_to_hub("ybelkada/opt-125m-bnb-4bit")
Docs] Add 4-bit serialization docs by @younesbelkada in #28182Enable passing in 4D attention masks to models that support it. This is useful for reducing memory footprint of certain generation tasks.
attention_mask support by @poedator in #27539Ability to customise which modules are quantized and which are not.
Awq] Enable the possibility to skip quantization for some target modules by @younesbelkada in #27950modules_in_block_to_quantize arg in GPTQconfig by @SunMarc in #27956Added fused modules support
Awq] Add llava fused modules support by @younesbelkada in #28239Mixtral / Awq] Add mixtral fused modules for Awq by @younesbelkada in #28240Llava / Vip-Llava] Add SDPA into llava by @younesbelkada in #28107Mixtral & Mistral] Add support for sdpa by @ArthurZucker in #28133All decoding strategies (temperature fallback, compression/log-prob/no-speech threshold, ...) of OpenAI's long-form transcription (see: https://github.com/openai/whisper or section 4.5 in paper) have been added. Contrary to https://github.com/openai/whisper, Transformers long-form transcription is fully compatible with pure FP16 and Batching!
For more information see: https://github.com/huggingface/transformers/pull/27658.
Assisted generation was reworked to accept arbitrary sources of candidate sequences. This enabled us to smoothly integrate ngram speculation, and opens the door for new candidate generation methods. Additionally, we've added the speculative decoding strategy on top of assisted generation: when you call assisted generation with an assistant model and do_sample=True, you'll benefit from the faster speculative decoding sampling 🏎️💨
assisted_decoding now accepts arbitrary candidate generators by @gante in #27751generate for the assistant by @gante in #28031Adding pickle protection via weights_only=True in the torch.load calls.
Unlike PyTorch, TensorFlow models build their weights "lazily" after model initialization, using the shape of their inputs to figure out what their weight shapes should be. We previously needed a full forward pass through TF models to ensure that all layers received an input they could use to build their weights, but with this change we now have proper build() methods that can correctly infer shapes and build model weights. This avoids a whole range of potential issues, as well as significantly accelerating model load times.
The last version to support PyTorch 1.10 was 4.36.x. As it has been more than 2 years, and we're looking forward to using features available in PyTorch 1.11 and up, we do not support PyTorch 1.10 for v4.37 (i.e. we don't run the tests against torch 1.10).
You can now add custom tags into your model before pushing it on the Hub! This enables you to filter models that contain that tag on the Hub with a simple URL filter. For example if you want to filter models that have trl tag you can search: https://huggingface.co/models?other=trl&sort=created
core/ FEAT] Add the possibility to push custom tags using PreTrainedModel itself by @younesbelkada in #28405 - e.g.from transformers import AutoModelForCausalLM
model_name = "HuggingFaceM4/tiny-random-LlamaForCausalLM"
model = AutoModelForCausalLM.from_pretrained(model_name)
model.add_model_tags(["tag-test"])
model.push_to_hub("llama-tagged")
Mixtral] Change mistral op order by @younesbelkada in #27955Tokenizer Serialization] Fix the broken serialisation by @ArthurZucker in #27099Whisper] raise better errors by @ArthurZucker in #27971CI slow] Fix expected values by @ArthurZucker in #27999SeamlessM4TTokenizer] Safe import by @ArthurZucker in #28026core / modeling] Fix training bug with PEFT + GC by @younesbelkada in #28031test_retain_grad_hidden_states_attentions is flaky by @gante in #28035FA-2] Fix fa-2 issue when passing config to from_pretrained by @younesbelkada in #28043Modeling / Mixtral] Fix GC + PEFT issues with Mixtral by @younesbelkada in #28061Mixtral] update conversion script to reflect new changes by @younesbelkada in #28068test_retain_grad_hidden_states_attentions by @ylacombe in #28060low_cpu_mem_usage Flag Conflict with DeepSpeed Zero 3 in from_pretrained for Models with keep_in_fp32_modules" by @kotarotanahashi in #27762DISABLE_TELEMETRY is used by @Wauplin in #28113Mixtral] Fix loss + nits by @ArthurZucker in #28115CLIPConfig by @ydshieh in #28108input_embeds docstring in encoder-decoder architectures by @gante in #28168docs/source/en/perf_infer_gpu_one.md by @ydshieh in #28198training_args.py fix missing import with accelerate with version accelerate==0.20.1 by @michaelfeil in #28171feature_extractor_type when loading an image processor file by @ydshieh in #28195Llava] Fix llava index errors by @younesbelkada in #28032from_pretrained under ZeRO-3 by @XuehaiPan in #28245_merge_input_ids_with_image_features for llava model by @VictorSanh in #28333DeepSpeed when using auto find batch size by @muellerzr in #28088cache_dir for evaluate.load() in example scripts by @aphedges in #28422TFTrainer by @gante in #28483chore] Update warning text, a word was missing by @tomaarsen in #28017finetuned_from if it is a local path by @ydshieh in #28482task arg in load_dataset in image-classification example by @regisss in #28408TokenizationUtils] Fix add_special_tokens when the token is already there by @ArthurZucker in #28520TokenizationRoformerFast] Fix the save and loading by @ArthurZucker in #28527SpeechT5Tokenization] Add copied from and fix the convert_tokens_to_string to match the fast decoding scheme by @ArthurZucker in #28522Processor by @ydshieh in #27761weights_only only if torch >= 1.13 by @ydshieh in #28506Core Tokenization] Support a fix for spm fast models by @ArthurZucker in #26678LoggingLevel context manager in 3 tests by @ydshieh in #28575processor_config.json if a processor has no extra attribute by @ydshieh in #28584The following contributors have made significant changes to the library over the last release:
Qwen2 is the new model series of large language models from the Qwen team. Previously, the Qwen series was released, including Qwen-72B, Qwen-1.8B, Qwen-VL, Qwen-Audio, etc.
Qwen2 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes.
Phi-2 is a transformer language model trained by Microsoft with exceptionally strong performance for its small size of 2.7 billion parameters. It was previously available as a custom code model, but has now been fully integrated into transformers.
phi-2 example by @susnato in #28392softmax_scale in PhiFlashAttention2. by @gugarosa in #28537The SigLIP model was proposed in Sigmoid Loss for Language Image Pre-Training by Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer. SigLIP proposes to replace the loss function used in CLIP by a simple pairwise sigmoid loss. This results in better performance in terms of zero-shot classification accuracy on ImageNet.
The VipLlava model was proposed in Making Large Multimodal Models Understand Arbitrary Visual Prompts by Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee.
VipLlava enhances the training protocol of Llava by marking images and interact with the model using natural cues like a “red bounding box” or “pointed arrow” during training.
The FastSpeech2Conformer model was proposed with the paper Recent Developments On Espnet Toolkit Boosted By Conformer by Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe, Kun Wei, Wangyou Zhang, and Yuekai Zhang.
FastSpeech 2 is a non-autoregressive model for text-to-speech (TTS) synthesis, which develops upon FastSpeech, showing improvements in training speed, inference speed and voice quality. It consists of a variance adapter; duration, energy and pitch predictor and waveform and mel-spectrogram decoder.
The Wav2Vec2-BERT model was proposed in Seamless: Multilingual Expressive and Streaming Speech Translation by the Seamless Communication team from Meta AI.
This model was pre-trained on 4.5M hours of unlabeled audio data covering more than 143 languages. It requires finetuning to be used for downstream tasks such as Automatic Speech Recognition (ASR), or Audio Classification.
Enables saving and loading transformers models in 4bit formats - you can now push bitsandbytes 4-bit weights on Hugging Face Hub. To save 4-bit models and push them on the hub, simply install the latest bitsandbytes package from pypi pip install -U bitsandbytes, load your model in 4-bit precision and call save_pretrained / push_to_hub. An example repo here
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "facebook/opt-125m"
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)
model.push_to_hub("ybelkada/opt-125m-bnb-4bit")
Docs] Add 4-bit serialization docs by @younesbelkada in #28182Enable passing in 4D attention masks to models that support it. This is useful for reducing memory footprint of certain generation tasks.
attention_mask support by @poedator in #27539Ability to customise which modules are quantized and which are not.
Awq] Enable the possibility to skip quantization for some target modules by @younesbelkada in #27950modules_in_block_to_quantize arg in GPTQconfig by @SunMarc in #27956Added fused modules support
Awq] Add llava fused modules support by @younesbelkada in #28239Mixtral / Awq] Add mixtral fused modules for Awq by @younesbelkada in #28240Llava / Vip-Llava] Add SDPA into llava by @younesbelkada in #28107Mixtral & Mistral] Add support for sdpa by @ArthurZucker in #28133All decoding strategies (temperature fallback, compression/log-prob/no-speech threshold, ...) of OpenAI's long-form transcription (see: https://github.com/openai/whisper or section 4.5 in paper) have been added. Contrary to https://github.com/openai/whisper, Transformers long-form transcription is fully compatible with pure FP16 and Batching!
For more information see: https://github.com/huggingface/transformers/pull/27658.
Assisted generation was reworked to accept arbitrary sources of candidate sequences. This enabled us to smoothly integrate ngram speculation, and opens the door for new candidate generation methods. Additionally, we've added the speculative decoding strategy on top of assisted generation: when you call assisted generation with an assistant model and do_sample=True, you'll benefit from the faster speculative decoding sampling 🏎️💨
assisted_decoding now accepts arbitrary candidate generators by @gante in #27751generate for the assistant by @gante in #28031Adding pickle protection via weights_only=True in the torch.load calls.
Unlike PyTorch, TensorFlow models build their weights "lazily" after model initialization, using the shape of their inputs to figure out what their weight shapes should be. We previously needed a full forward pass through TF models to ensure that all layers received an input they could use to build their weights, but with this change we now have proper build() methods that can correctly infer shapes and build model weights. This avoids a whole range of potential issues, as well as significantly accelerating model load times.
The last version to support PyTorch 1.10 was 4.36.x. As it has been more than 2 years, and we're looking forward to using features available in PyTorch 1.11 and up, we do not support PyTorch 1.10 for v4.37 (i.e. we don't run the tests against torch 1.10).
You can now add custom tags into your model before pushing it on the Hub! This enables you to filter models that contain that tag on the Hub with a simple URL filter. For example if you want to filter models that have trl tag you can search: https://huggingface.co/models?other=trl&sort=created
core/ FEAT] Add the possibility to push custom tags using PreTrainedModel itself by @younesbelkada in #28405 - e.g.from transformers import AutoModelForCausalLM
model_name = "HuggingFaceM4/tiny-random-LlamaForCausalLM"
model = AutoModelForCausalLM.from_pretrained(model_name)
model.add_model_tags(["tag-test"])
model.push_to_hub("llama-tagged")
Mixtral] Change mistral op order by @younesbelkada in #27955Tokenizer Serialization] Fix the broken serialisation by @ArthurZucker in #27099Whisper] raise better errors by @ArthurZucker in #27971CI slow] Fix expected values by @ArthurZucker in #27999SeamlessM4TTokenizer] Safe import by @ArthurZucker in #28026core / modeling] Fix training bug with PEFT + GC by @younesbelkada in #28031test_retain_grad_hidden_states_attentions is flaky by @gante in #28035FA-2] Fix fa-2 issue when passing config to from_pretrained by @younesbelkada in #28043Modeling / Mixtral] Fix GC + PEFT issues with Mixtral by @younesbelkada in #28061Mixtral] update conversion script to reflect new changes by @younesbelkada in #28068test_retain_grad_hidden_states_attentions by @ylacombe in #28060low_cpu_mem_usage Flag Conflict with DeepSpeed Zero 3 in from_pretrained for Models with keep_in_fp32_modules" by @kotarotanahashi in #27762DISABLE_TELEMETRY is used by @Wauplin in #28113Mixtral] Fix loss + nits by @ArthurZucker in #28115CLIPConfig by @ydshieh in #28108input_embeds docstring in encoder-decoder architectures by @gante in #28168docs/source/en/perf_infer_gpu_one.md by @ydshieh in #28198training_args.py fix missing import with accelerate with version accelerate==0.20.1 by @michaelfeil in #28171feature_extractor_type when loading an image processor file by @ydshieh in #28195Llava] Fix llava index errors by @younesbelkada in #28032from_pretrained under ZeRO-3 by @XuehaiPan in #28245_merge_input_ids_with_image_features for llava model by @VictorSanh in #28333DeepSpeed when using auto find batch size by @muellerzr in #28088cache_dir for evaluate.load() in example scripts by @aphedges in #28422TFTrainer by @gante in #28483chore] Update warning text, a word was missing by @tomaarsen in #28017finetuned_from if it is a local path by @ydshieh in #28482task arg in load_dataset in image-classification example by @regisss in #28408TokenizationUtils] Fix add_special_tokens when the token is already there by @ArthurZucker in #28520TokenizationRoformerFast] Fix the save and loading by @ArthurZucker in #28527SpeechT5Tokenization] Add copied from and fix the convert_tokens_to_string to match the fast decoding scheme by @ArthurZucker in #28522Processor by @ydshieh in #27761weights_only only if torch >= 1.13 by @ydshieh in #28506Core Tokenization] Support a fix for spm fast models by @ArthurZucker in #26678LoggingLevel context manager in 3 tests by @ydshieh in #28575processor_config.json if a processor has no extra attribute by @ydshieh in #28584The following contributors have made significant changes to the library over the last release: