release notes
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
release notes
Published 5/10/2023
MinorContains breaking changesrelease notes
Published 5/10/2023
MinorContains breaking changesTransformers Agent is a new API that lets you use the library and Diffusers by prompting an agent (which is a large language model) in natural language. That agent will then output code using a set of predefined tools, leveraging the appropriate (and state-of-the-art) models for the task the user wants to perform. It is fully multimodal and extensible by the community. Learn more in the docs
SAM (Segment Anything Model) was proposed in Segment Anything by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.
The model can be used to predict segmentation masks of any object of interest given an input image.
SAM] Correct arxiv link by @younesbelkada in #22886SAM] Change to facebook/sam-vit-base by @younesbelkada in #22891SAM] Add sam doc by @younesbelkada in #22984DocumentQuestionAnsweringPipeline only for fast ⚡ tokenizers by @ydshieh in #22745automatic-mask-generation pipeline for Segment Anything Model (SAM) by @ArthurZucker in #22840RWKV suggests a tweak in the traditional Transformer attention to make it linear. This way, the model can be used as recurrent network: passing inputs for timestamp 0 and timestamp 1 together is the same as passing inputs at timestamp 0, then inputs at timestamp 1 along with the state of timestamp 0 (see example below).
This can be more efficient than a regular Transformer and can deal with sentence of any length (even if the model uses a fixed context length for training).
The FocalNet model was proposed in Focal Modulation Networks by Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. FocalNets completely replace self-attention (used in models like ViT and Swin) by a focal modulation mechanism for modeling token interactions in vision. The authors claim that FocalNets outperform self-attention based models with similar computational costs on the tasks of image classification, object detection, and segmentation.
The Open-Llama model was proposed in Open-Llama project by community developer s-JoL.
The model is mainly based on LLaMA with some modifications, incorporating memory-efficient attention from Xformers, stable embedding from Bloom, and shared input-output embedding from PLAM. And the model is pre-trained on both Chinese and English, which gives it better performance on Chinese language tasks.
Assisted generation is a new technique that lets you speed up generation with large language models by using a smaller model as assistant. The assistant model will be the ones doing multiple forward pass while the LLM will merely validate the tokens proposed by the assistant. This can lead to speed-ups up to 10x!
To avoid duplicating the model code in multiple repos when using the code on the Hub feature, loading such models will now save in their config the repo in which the code is. This way there is one source of ground truth for code on the Hub models.
This releases has three breaking changes compared to version v4.28.0.
The first one focuses on fixing training issues for Pix2Struct. This slightly affects the results, but should result in the model training much better.
Pix2Struct] Attempts to fix training issues 🚨🚨🚨 by @younesbelkada in #23004The second one is aligning the ignore index in the LUKE model to other models in the library. This breaks the convention that models should stick to their original implementation, but it was necessary in order to align with other transformers in the library
Finally, the third breaking change aims to harmonize the training procedure for most of recent additions in transformers. It should be users' responsibility to fill_mask the padding tokens of the labels with the correct value. This PR addresses the issue that was raised by other architectures such as Luke or Pix2Struct
Blip] remove labels masking by @younesbelkada in #23024torch_dtype to str when saved_model=True in save_pretrained for TF models by @ydshieh in #22740training.mdx to Korean by @gabrielwithappy in #22670DS_BUILD_AIO=1 by @ydshieh in #22741Deta in #22437 by @ydshieh in #22750serving_output for TF composite models (encoder-decoder like models) by @ydshieh in #22743sequence_classification.mdx to Korean by @0525hhgus in #22655CpmAnt model by @ydshieh in #22766tutorial/proprecssing.mdx to Korean by @sim-so in #22578test_word_time_stamp_integration for Wav2Vec2ProcessorWithLMTest by @ydshieh in #22800custom_models.mdx to Korean by @HanNayeoniee in #22534_toctree.yml by @jungnerd in #22549tasks/translation.mdx to Korean by @wonhyeongseo in #22805LayoutLMv2 and LayoutLMv3 in some pipeline tests by @ydshieh in #22774PartialState as the device handler in the Trainer by @muellerzr in #22752auto_tutorial, training by @gabrielwithappy in #22796main by @ydshieh in #22823test_eos_token_id_int_and_list_top_k_top_sampling by @ydshieh in #22826accelerate@main in CI by @ydshieh in #22859is_symbolic_tensor predicate by @hvaara in #22878FillMaskPipelineTests by @ydshieh in #22894accelerate.mdx to Korean by @0525hhgus in #22830tasks/masked_language_modeling.mdx to Korean by @HanNayeoniee in #22838tasks/summarization.mdx to Korean by @sim-so in #22783test_codegen_sample_max_time as flaky by @ydshieh in #22953stride is too high in TokenClassificationPipeline by @boyleconnor in #22942_load_pretrained_model by @hanrui1sensetime in #22947run_scripts.mdx to Korean by @HanNayeoniee in #22793create_a_model doc to Korean by @gabrielwithappy in #22754accelerete@main in PyTorch Past CI jobs by @ydshieh in #22963DeepSpeed CI job link in Past CI by @ydshieh in #22967tasks/masked_language_modeling.mdx by @HanNayeoniee in #22965DocTest] Fix correct checkpoint by @younesbelkada in #22988serialization.mdx to Korean by @wonhyeongseo in #22806tasks/image_captioning.mdx to Korean by @sim-so in #22943token_classification.mdx to Korean by @0525hhgus in #22945PEFT] Add HFTracer support for PEFT by @younesbelkada in #23006Pix2Struct] Fix pix2struct doctest by @younesbelkada in #23023multilingual.mdx to Korean by @HanNayeoniee in #23008test_offline_mode_pipeline_exception by @ydshieh in #23022BridgeTowerModelTester by @ydshieh in #23029_test_xla_generate less flaky by @ydshieh in #22996model_sharing.mdx to Korean by @0525hhgus in #22991bigbird test file by @ydshieh in #23040BridgeTower by @ydshieh in #23039BioGPTForSequenceClassification by @awinml in #22253convnext init by @IMvision12 in #23078tasks/image_classification.mdx to Korean by @0525hhgus in #23048tasks/question_answering.mdx to Korean by @jungnerd in #23012tasks/zero_shot_image_classification.mdx to Korean by @HanNayeoniee in #23065torchscript.mdx to Korean by @sim-so in #23060Flava] Fix flava torch.distributed.nn.functional import all_gather issue by @younesbelkada in #23108Pix2Struct model to set Pix2StructTextModel to is_decoder=True by @gbarello-uipath in #23051Doctest] Fix pix2struct doctest by @younesbelkada in #23121X | Y syntax for HfArgumentParser for Python 3.10+ by @XuehaiPan in #23126_toctree.yml by @HanNayeoniee in #23112symbolic_trace by @regisss in #23105GPT-J] Fix causal mask dtype by @younesbelkada in #23147max_length warning when it is not set by @gante in #23139multiple_choice.mdx by @gabrielwithappy in #23064no_trainer scripts to pre-train Vision Transformers by @awinml in #23156logging_steps, eval_steps, and save_steps by @konstantinjdobler in #23235from_config by @DyeKuu in #23246tensorflow-probability in docker files by @ydshieh in #23260The following contributors have made significant changes to the library over the last release:
sequence_classification.mdx to Korean (#22655)accelerate.mdx to Korean (#22830)token_classification.mdx to Korean (#22945)model_sharing.mdx to Korean (#22991)tasks/image_classification.mdx to Korean (#23048)custom_models.mdx to Korean (#22534)tasks/masked_language_modeling.mdx to Korean (#22838)run_scripts.mdx to Korean (#22793)tasks/masked_language_modeling.mdx (#22965)multilingual.mdx to Korean (#23008)tasks/zero_shot_image_classification.mdx to Korean (#23065)_toctree.yml (#23112)Transformers Agent is a new API that lets you use the library and Diffusers by prompting an agent (which is a large language model) in natural language. That agent will then output code using a set of predefined tools, leveraging the appropriate (and state-of-the-art) models for the task the user wants to perform. It is fully multimodal and extensible by the community. Learn more in the docs
SAM (Segment Anything Model) was proposed in Segment Anything by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.
The model can be used to predict segmentation masks of any object of interest given an input image.
SAM] Correct arxiv link by @younesbelkada in #22886SAM] Change to facebook/sam-vit-base by @younesbelkada in #22891SAM] Add sam doc by @younesbelkada in #22984DocumentQuestionAnsweringPipeline only for fast ⚡ tokenizers by @ydshieh in #22745automatic-mask-generation pipeline for Segment Anything Model (SAM) by @ArthurZucker in #22840RWKV suggests a tweak in the traditional Transformer attention to make it linear. This way, the model can be used as recurrent network: passing inputs for timestamp 0 and timestamp 1 together is the same as passing inputs at timestamp 0, then inputs at timestamp 1 along with the state of timestamp 0 (see example below).
This can be more efficient than a regular Transformer and can deal with sentence of any length (even if the model uses a fixed context length for training).
The FocalNet model was proposed in Focal Modulation Networks by Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. FocalNets completely replace self-attention (used in models like ViT and Swin) by a focal modulation mechanism for modeling token interactions in vision. The authors claim that FocalNets outperform self-attention based models with similar computational costs on the tasks of image classification, object detection, and segmentation.
The Open-Llama model was proposed in Open-Llama project by community developer s-JoL.
The model is mainly based on LLaMA with some modifications, incorporating memory-efficient attention from Xformers, stable embedding from Bloom, and shared input-output embedding from PLAM. And the model is pre-trained on both Chinese and English, which gives it better performance on Chinese language tasks.
Assisted generation is a new technique that lets you speed up generation with large language models by using a smaller model as assistant. The assistant model will be the ones doing multiple forward pass while the LLM will merely validate the tokens proposed by the assistant. This can lead to speed-ups up to 10x!
To avoid duplicating the model code in multiple repos when using the code on the Hub feature, loading such models will now save in their config the repo in which the code is. This way there is one source of ground truth for code on the Hub models.
This releases has three breaking changes compared to version v4.28.0.
The first one focuses on fixing training issues for Pix2Struct. This slightly affects the results, but should result in the model training much better.
Pix2Struct] Attempts to fix training issues 🚨🚨🚨 by @younesbelkada in #23004The second one is aligning the ignore index in the LUKE model to other models in the library. This breaks the convention that models should stick to their original implementation, but it was necessary in order to align with other transformers in the library
Finally, the third breaking change aims to harmonize the training procedure for most of recent additions in transformers. It should be users' responsibility to fill_mask the padding tokens of the labels with the correct value. This PR addresses the issue that was raised by other architectures such as Luke or Pix2Struct
Blip] remove labels masking by @younesbelkada in #23024torch_dtype to str when saved_model=True in save_pretrained for TF models by @ydshieh in #22740training.mdx to Korean by @gabrielwithappy in #22670DS_BUILD_AIO=1 by @ydshieh in #22741Deta in #22437 by @ydshieh in #22750serving_output for TF composite models (encoder-decoder like models) by @ydshieh in #22743sequence_classification.mdx to Korean by @0525hhgus in #22655CpmAnt model by @ydshieh in #22766tutorial/proprecssing.mdx to Korean by @sim-so in #22578test_word_time_stamp_integration for Wav2Vec2ProcessorWithLMTest by @ydshieh in #22800custom_models.mdx to Korean by @HanNayeoniee in #22534_toctree.yml by @jungnerd in #22549tasks/translation.mdx to Korean by @wonhyeongseo in #22805LayoutLMv2 and LayoutLMv3 in some pipeline tests by @ydshieh in #22774PartialState as the device handler in the Trainer by @muellerzr in #22752auto_tutorial, training by @gabrielwithappy in #22796main by @ydshieh in #22823test_eos_token_id_int_and_list_top_k_top_sampling by @ydshieh in #22826accelerate@main in CI by @ydshieh in #22859is_symbolic_tensor predicate by @hvaara in #22878FillMaskPipelineTests by @ydshieh in #22894accelerate.mdx to Korean by @0525hhgus in #22830tasks/masked_language_modeling.mdx to Korean by @HanNayeoniee in #22838tasks/summarization.mdx to Korean by @sim-so in #22783test_codegen_sample_max_time as flaky by @ydshieh in #22953stride is too high in TokenClassificationPipeline by @boyleconnor in #22942_load_pretrained_model by @hanrui1sensetime in #22947run_scripts.mdx to Korean by @HanNayeoniee in #22793create_a_model doc to Korean by @gabrielwithappy in #22754accelerete@main in PyTorch Past CI jobs by @ydshieh in #22963DeepSpeed CI job link in Past CI by @ydshieh in #22967tasks/masked_language_modeling.mdx by @HanNayeoniee in #22965DocTest] Fix correct checkpoint by @younesbelkada in #22988serialization.mdx to Korean by @wonhyeongseo in #22806tasks/image_captioning.mdx to Korean by @sim-so in #22943token_classification.mdx to Korean by @0525hhgus in #22945PEFT] Add HFTracer support for PEFT by @younesbelkada in #23006Pix2Struct] Fix pix2struct doctest by @younesbelkada in #23023multilingual.mdx to Korean by @HanNayeoniee in #23008test_offline_mode_pipeline_exception by @ydshieh in #23022BridgeTowerModelTester by @ydshieh in #23029_test_xla_generate less flaky by @ydshieh in #22996model_sharing.mdx to Korean by @0525hhgus in #22991bigbird test file by @ydshieh in #23040BridgeTower by @ydshieh in #23039BioGPTForSequenceClassification by @awinml in #22253convnext init by @IMvision12 in #23078tasks/image_classification.mdx to Korean by @0525hhgus in #23048tasks/question_answering.mdx to Korean by @jungnerd in #23012tasks/zero_shot_image_classification.mdx to Korean by @HanNayeoniee in #23065torchscript.mdx to Korean by @sim-so in #23060Flava] Fix flava torch.distributed.nn.functional import all_gather issue by @younesbelkada in #23108Pix2Struct model to set Pix2StructTextModel to is_decoder=True by @gbarello-uipath in #23051Doctest] Fix pix2struct doctest by @younesbelkada in #23121X | Y syntax for HfArgumentParser for Python 3.10+ by @XuehaiPan in #23126_toctree.yml by @HanNayeoniee in #23112symbolic_trace by @regisss in #23105GPT-J] Fix causal mask dtype by @younesbelkada in #23147max_length warning when it is not set by @gante in #23139multiple_choice.mdx by @gabrielwithappy in #23064no_trainer scripts to pre-train Vision Transformers by @awinml in #23156logging_steps, eval_steps, and save_steps by @konstantinjdobler in #23235from_config by @DyeKuu in #23246tensorflow-probability in docker files by @ydshieh in #23260The following contributors have made significant changes to the library over the last release:
sequence_classification.mdx to Korean (#22655)accelerate.mdx to Korean (#22830)token_classification.mdx to Korean (#22945)model_sharing.mdx to Korean (#22991)tasks/image_classification.mdx to Korean (#23048)custom_models.mdx to Korean (#22534)tasks/masked_language_modeling.mdx to Korean (#22838)run_scripts.mdx to Korean (#22793)tasks/masked_language_modeling.mdx (#22965)multilingual.mdx to Korean (#23008)tasks/zero_shot_image_classification.mdx to Korean (#23065)_toctree.yml (#23112)