release notes
release notes
Published 7/27/2022
MinorContains breaking changesThe TensorFlow text generation method can now be wrapped with tf.function and compiled to XLA. You should be able to achieve up to 100x speedup this way. See our blog post and our benchmarks. You can also see XLA generation in action in our example notebooks, particularly for summarization and translation.
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small")
# Main changes with respect to the original generate workflow: `tf.function` and `pad_to_multiple_of`
xla_generate = tf.function(model.generate, jit_compile=True)
tokenization_kwargs = {"pad_to_multiple_of": 32, "padding": True, "return_tensors": "tf"}
# The first prompt will be slow (compiling), the others will be very fast!
input_prompts = [
f"translate English to {language}: I have four cats and three dogs."
for language in ["German", "French", "Romanian"]
]
for input_prompt in input_prompts:
tokenized_inputs = tokenizer([input_prompt], **tokenization_kwargs)
generated_text = xla_generate(**tokenized_inputs, max_new_tokens=32)
print(tokenizer.decode(generated_text[0], skip_special_tokens=True))
max_length by @gante in #18018tf.TensorArray by @gante in #17801The OWL-ViT model (short for Vision Transformer for Open-World Localization) was proposed in Simple Open-Vocabulary Object Detection with Vision Transformers by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. OWL-ViT is an open-vocabulary object detection network trained on a variety of (image, text) pairs. It can be used to query an image with one or multiple text queries to search for and detect target objects described in text.
The NLLB model was presented in No Language Left Behind: Scaling Human-Centered Machine Translation by Marta R. Costa-jussà , James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. No Language Left Behind (NLLB) is a model capable of delivering high-quality translations directly between any pair of 200+ languages — including low-resource languages like Asturian, Luganda, Urdu and more.
The MobileViT model was proposed in MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer by Sachin Mehta and Mohammad Rastegari. MobileViT introduces a new layer that replaces local processing in convolutions with global processing using transformers.
The Nezha model was proposed in NEZHA: Neural Contextualized Representation for Chinese Language Understanding by Junqiu Wei et al. NEZHA is a language model based on BERT with a collection of proven improvements, which include Functional Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy, Mixed Precision Training and the LAMB Optimizer in training the models.
The GroupViT model was proposed in GroupViT: Semantic Segmentation Emerges from Text Supervision by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. Inspired by CLIP, GroupViT is a vision-language model that can perform zero-shot semantic segmentation on any given vocabulary categories, inspired by CLIP.
The MVP model was proposed in MVP: Multi-task Supervised Pre-training for Natural Language Generation by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. MVP is a generative language model, pre-trained on a labeled pre-training corpus from 45 datasets over seven generation tasks. For each task, the model is further pre-trained using specific soft prompts to stimulate the model capacity in performing a specific task.
The CodeGen model was proposed in A Conversational Paradigm for Program Synthesis by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. CodeGen is an autoregressive language model for program synthesis trained sequentially on The Pile, BigQuery, and BigPython.
The UL2 model was presented in Unifying Language Learning Paradigms by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler. UL2 is a unified framework for pretraining models that are universally effective across datasets and setups. UL2 uses Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.
This adds the ability to support custom pipelines on the Hub and share it with everyone else. Like the code in the Hub feature for models, tokenizers etc., the user has to add trust_remote_code=True when they want to use it. Apart from this, the best way to get familiar with the feature is to look at the added documentation.
This adds a CLI to convert PT weights into TF weights, validate them, and (optionally) open a PR.
The following models have been ported to be used in TensorFlow: SegFormer, DeiT, ResNet and RegNet.
Additionally, our TF models now support loading sharded checkpoints:
The following models have been ported to be used in JAX:
Additionally, our JAX models now support loading sharded checkpoints:
The following models now have a brand new head for new tasks:
A continued community effort provides ONNX converters for an increasing number of models.
A community effort aiming to translate the documentation in several languages has been continued.
KerasMetricCallback to use XLA generation by @Rocketknight1 in #18265--make-reports by @ydshieh in #18250saved_model=True by @amyeroberts in #18153take_along_axis is computed in DeBERTa to stop confusing XLA by @Rocketknight1 in #18256(...)EncoderDecoder models by @gante in #18097no_trainer CI by @muellerzr in #18242LayoutXLM docstrings by @qqaatw in #17038device_map directly in pipeline(..) function. by @Narsil in #17902text-generation pipeline. by @Narsil in #18131pt_to_tf test by @gante in #18108TFGenerationMixin.seed_generator so it's not created at import by @Rocketknight1 in #18044bias keyword argument in TFDebertaEmbeddings by @WissamAntoun in #17940hf-internal-testing) by @mishig25 in #17939return_all_scores introduced in #17606 by @Narsil in #17906group_texts function, drop last block if smaller than block_size by @billray0259 in #17908test_number_of_steps_in_training_with_ipex by @ydshieh in #17889test_inference_instance_segmentation_head by @ydshieh in #17872test_multi_gpu_data_parallel_forward for MaskFormer by @ydshieh in #17864create_commit by @gante in #17755generate, to Seq2SeqTrainer methods evaluate and predict by @eranhirs in #17805top_k_top_p_filtering having unexpected behavior by @unifyh in #17744The following contributors have made significant changes to the library over the last release:
release notes
Published 7/27/2022
MinorContains breaking changesThe TensorFlow text generation method can now be wrapped with tf.function and compiled to XLA. You should be able to achieve up to 100x speedup this way. See our blog post and our benchmarks. You can also see XLA generation in action in our example notebooks, particularly for summarization and translation.
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small")
# Main changes with respect to the original generate workflow: `tf.function` and `pad_to_multiple_of`
xla_generate = tf.function(model.generate, jit_compile=True)
tokenization_kwargs = {"pad_to_multiple_of": 32, "padding": True, "return_tensors": "tf"}
# The first prompt will be slow (compiling), the others will be very fast!
input_prompts = [
f"translate English to {language}: I have four cats and three dogs."
for language in ["German", "French", "Romanian"]
]
for input_prompt in input_prompts:
tokenized_inputs = tokenizer([input_prompt], **tokenization_kwargs)
generated_text = xla_generate(**tokenized_inputs, max_new_tokens=32)
print(tokenizer.decode(generated_text[0], skip_special_tokens=True))
max_length by @gante in #18018tf.TensorArray by @gante in #17801The OWL-ViT model (short for Vision Transformer for Open-World Localization) was proposed in Simple Open-Vocabulary Object Detection with Vision Transformers by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. OWL-ViT is an open-vocabulary object detection network trained on a variety of (image, text) pairs. It can be used to query an image with one or multiple text queries to search for and detect target objects described in text.
The NLLB model was presented in No Language Left Behind: Scaling Human-Centered Machine Translation by Marta R. Costa-jussà , James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. No Language Left Behind (NLLB) is a model capable of delivering high-quality translations directly between any pair of 200+ languages — including low-resource languages like Asturian, Luganda, Urdu and more.
The MobileViT model was proposed in MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer by Sachin Mehta and Mohammad Rastegari. MobileViT introduces a new layer that replaces local processing in convolutions with global processing using transformers.
The Nezha model was proposed in NEZHA: Neural Contextualized Representation for Chinese Language Understanding by Junqiu Wei et al. NEZHA is a language model based on BERT with a collection of proven improvements, which include Functional Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy, Mixed Precision Training and the LAMB Optimizer in training the models.
The GroupViT model was proposed in GroupViT: Semantic Segmentation Emerges from Text Supervision by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. Inspired by CLIP, GroupViT is a vision-language model that can perform zero-shot semantic segmentation on any given vocabulary categories, inspired by CLIP.
The MVP model was proposed in MVP: Multi-task Supervised Pre-training for Natural Language Generation by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. MVP is a generative language model, pre-trained on a labeled pre-training corpus from 45 datasets over seven generation tasks. For each task, the model is further pre-trained using specific soft prompts to stimulate the model capacity in performing a specific task.
The CodeGen model was proposed in A Conversational Paradigm for Program Synthesis by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. CodeGen is an autoregressive language model for program synthesis trained sequentially on The Pile, BigQuery, and BigPython.
The UL2 model was presented in Unifying Language Learning Paradigms by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler. UL2 is a unified framework for pretraining models that are universally effective across datasets and setups. UL2 uses Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.
This adds the ability to support custom pipelines on the Hub and share it with everyone else. Like the code in the Hub feature for models, tokenizers etc., the user has to add trust_remote_code=True when they want to use it. Apart from this, the best way to get familiar with the feature is to look at the added documentation.
This adds a CLI to convert PT weights into TF weights, validate them, and (optionally) open a PR.
The following models have been ported to be used in TensorFlow: SegFormer, DeiT, ResNet and RegNet.
Additionally, our TF models now support loading sharded checkpoints:
The following models have been ported to be used in JAX:
Additionally, our JAX models now support loading sharded checkpoints:
The following models now have a brand new head for new tasks:
A continued community effort provides ONNX converters for an increasing number of models.
A community effort aiming to translate the documentation in several languages has been continued.
KerasMetricCallback to use XLA generation by @Rocketknight1 in #18265--make-reports by @ydshieh in #18250saved_model=True by @amyeroberts in #18153take_along_axis is computed in DeBERTa to stop confusing XLA by @Rocketknight1 in #18256(...)EncoderDecoder models by @gante in #18097no_trainer CI by @muellerzr in #18242LayoutXLM docstrings by @qqaatw in #17038device_map directly in pipeline(..) function. by @Narsil in #17902text-generation pipeline. by @Narsil in #18131pt_to_tf test by @gante in #18108TFGenerationMixin.seed_generator so it's not created at import by @Rocketknight1 in #18044bias keyword argument in TFDebertaEmbeddings by @WissamAntoun in #17940hf-internal-testing) by @mishig25 in #17939return_all_scores introduced in #17606 by @Narsil in #17906group_texts function, drop last block if smaller than block_size by @billray0259 in #17908test_number_of_steps_in_training_with_ipex by @ydshieh in #17889test_inference_instance_segmentation_head by @ydshieh in #17872test_multi_gpu_data_parallel_forward for MaskFormer by @ydshieh in #17864create_commit by @gante in #17755generate, to Seq2SeqTrainer methods evaluate and predict by @eranhirs in #17805top_k_top_p_filtering having unexpected behavior by @unifyh in #17744The following contributors have made significant changes to the library over the last release:
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.