release notes
release notes
Published 6/16/2022
MinorContains breaking changesYou can now use the big model inference of Accelerate directly in any call to from_pretrained by specifying device_map="auto" (or your own device_map). It will automatically load the model taking advantage of your GPU(s) then offloading what doesn't fit in RAM, or even on the hard drive if you don't have RAM. Your model can then be used normally for inference without anything else to do.
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained(
"bigscience/T0pp", revision="sharded", device_map="auto"
)
The BLOOM model has been proposed with its various versions through the BigScience Workshop. The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on different 46 languages including code.
The Convolutional vision Transformer (CvT) improves the Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.
GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile, whose weights are made freely and openly available to the public through a permissive license. GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models.
LayoutLMv3 simplifies LayoutLMv2 by using patch embeddings (as in ViT) instead of leveraging a CNN backbone, and pre-trains the model on 3 objectives: masked language modeling (MLM), masked image modeling (MIM) and word-patch alignment (WPA).
LeViT improves the Vision Transformer (ViT) in performance and efficiency by a few architectural differences such as activation maps with decreasing resolutions in Transformers and the introduction of an attention bias to integrate positional information.
LongT5 model is an extension of T5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. It is capable of handling input sequences of a length up to 16,384 tokens.
The M-CTC-T model is a 1B-param transformer encoder, with a CTC head over 8065 character labels and a language identification head over 60 language ID labels. It is trained on Common Voice (version 6.1, December 2020 release) and VoxPopuli. After training on Common Voice and VoxPopuli, the model is trained on Common Voice only. The labels are unnormalized character-level transcripts (punctuation and capitalization are not removed). The model takes as input Mel filterbank features from a 16Khz audio signal.
This Transformer is used for deep reinforcement learning. To use it, you need to create sequences from actions, states and rewards from all previous timesteps. This model will treat all these elements together as one big sequence (a trajectory).
The Wav2Vec2-Conformer is an updated version of fairseq S2T: Fast Speech-to-Text. It requires more parameters than Wav2Vec2, but also yields an improved word error rate.
Data2VecVision for semantic segmentation, OPT and Swin are now available in TensorFlow.
OPT is now available in Flax.
A community effort has been started to translate the documentation in two new languages: Italian and Portuguese.
BloomForSequenceClassification and BloomForTokenClassification classes by @haileyschoelkopf in #17639RepoNotFoundError when not authenticated by @SBrandeis in #17651float16. by @Narsil in #17637top_k argument to text-classification pipeline. by @Narsil in #17606train_new_from_iterator in the case of byte-level tokenizers by @SaulLu in #17549pt-to-tf by @gante in #17588tokenizer type annotation in pipeline(...) by @willfrey in #17500PreTrainedTokenizerBase.add_tokens() by @Witiko in #17119test_inference_no_head by @ydshieh in #17395imageGPT auto feature extractor. by @Narsil in #16871device_map="auto" to OPT by @sgugger in #17382batch_size test to QA pipeline. by @Narsil in #17330max_seq_len in QA pipeline by @Narsil in #17316test_torch_encode_plus_sent_to_model by @SaulLu in #17231The following contributors have made significant changes to the library over the last release:
release notes
Published 6/16/2022
MinorContains breaking changesYou can now use the big model inference of Accelerate directly in any call to from_pretrained by specifying device_map="auto" (or your own device_map). It will automatically load the model taking advantage of your GPU(s) then offloading what doesn't fit in RAM, or even on the hard drive if you don't have RAM. Your model can then be used normally for inference without anything else to do.
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained(
"bigscience/T0pp", revision="sharded", device_map="auto"
)
The BLOOM model has been proposed with its various versions through the BigScience Workshop. The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on different 46 languages including code.
The Convolutional vision Transformer (CvT) improves the Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.
GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile, whose weights are made freely and openly available to the public through a permissive license. GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models.
LayoutLMv3 simplifies LayoutLMv2 by using patch embeddings (as in ViT) instead of leveraging a CNN backbone, and pre-trains the model on 3 objectives: masked language modeling (MLM), masked image modeling (MIM) and word-patch alignment (WPA).
LeViT improves the Vision Transformer (ViT) in performance and efficiency by a few architectural differences such as activation maps with decreasing resolutions in Transformers and the introduction of an attention bias to integrate positional information.
LongT5 model is an extension of T5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. It is capable of handling input sequences of a length up to 16,384 tokens.
The M-CTC-T model is a 1B-param transformer encoder, with a CTC head over 8065 character labels and a language identification head over 60 language ID labels. It is trained on Common Voice (version 6.1, December 2020 release) and VoxPopuli. After training on Common Voice and VoxPopuli, the model is trained on Common Voice only. The labels are unnormalized character-level transcripts (punctuation and capitalization are not removed). The model takes as input Mel filterbank features from a 16Khz audio signal.
This Transformer is used for deep reinforcement learning. To use it, you need to create sequences from actions, states and rewards from all previous timesteps. This model will treat all these elements together as one big sequence (a trajectory).
The Wav2Vec2-Conformer is an updated version of fairseq S2T: Fast Speech-to-Text. It requires more parameters than Wav2Vec2, but also yields an improved word error rate.
Data2VecVision for semantic segmentation, OPT and Swin are now available in TensorFlow.
OPT is now available in Flax.
A community effort has been started to translate the documentation in two new languages: Italian and Portuguese.
BloomForSequenceClassification and BloomForTokenClassification classes by @haileyschoelkopf in #17639RepoNotFoundError when not authenticated by @SBrandeis in #17651float16. by @Narsil in #17637top_k argument to text-classification pipeline. by @Narsil in #17606train_new_from_iterator in the case of byte-level tokenizers by @SaulLu in #17549pt-to-tf by @gante in #17588tokenizer type annotation in pipeline(...) by @willfrey in #17500PreTrainedTokenizerBase.add_tokens() by @Witiko in #17119test_inference_no_head by @ydshieh in #17395imageGPT auto feature extractor. by @Narsil in #16871device_map="auto" to OPT by @sgugger in #17382batch_size test to QA pipeline. by @Narsil in #17330max_seq_len in QA pipeline by @Narsil in #17316test_torch_encode_plus_sent_to_model by @SaulLu in #17231The following contributors have made significant changes to the library over the last release:
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.