release notes
release notes
Published 2/4/2021
Contains breaking changesTwo new models are released as part of the Wav2Vec2 implementation: Wav2Vec2Model and Wav2Vec2ForMaskedLM, in PyTorch.
Wav2Vec2 is a multi-modal model, combining speech and text. It's the first multi-modal model of its kind we welcome in Transformers.
The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=wav2vec2
Available notebooks:
Contributions:
Future Additions
The ConvBERT model was proposed in ConvBERT: Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
Six new models are released as part of the ConvBERT implementation: ConvBertModel, ConvBertForMaskedLM, ConvBertForSequenceClassification, ConvBertForTokenClassification, ConvBertForQuestionAnswering and ConvBertForMultipleChoice. These models are available both in PyTorch and TensorFlow.
Contributions:
The BORT model was proposed in Optimal Subarchitecture Extraction for BERT by Amazon's Adrian de Wynter and Daniel J. Perry. It is an optimal subset of architectural parameters for the BERT, which the authors refer to as “Bort”.
The BORT model can be loaded directly in the BERT architecture, therefore all BERT model heads are available for BORT.
Contributions:
When executing a script with Trainer using Amazon SageMaker and enabling SageMaker's data parallelism library, Trainer will automatically use the smdistributed library. All maintained examples have been tested with this functionality. Here is an overview of SageMaker data parallelism library.
A new Community Page has been added to the docs. These contain all the notebooks contributed by the community, as well as some community projects built around Transformers. Feel free to open a PR if you want your project to be showcased!
DeBERTa now has more model heads available.
BART, mBART, Marian, Pegasus and Blenderbot now have decoder-only model architectures. They can therefore be used in decoder-only settings.
None.
past_key_values in GPT-2 #9596 (@forest1988)report_to training arguments to control the integrations used #9735 (@sgugger)Trainer.hyperparameter_search docstring #9762 (@sorami)skip_special_tokens=True to FillMaskPipeline #9783 (@Narsil)test_head_masking = True flags in test files #9858 (@stancld)return_full_text parameter to TextGenerationPipeline. #9852 (@Narsil)from_slow in fast tokenizers build and fixes some bugs #9987 (@sgugger)encoder_no_repeat_ngram_size to generate. #9984 (@Narsil)release notes
Published 2/4/2021
Contains breaking changesTwo new models are released as part of the Wav2Vec2 implementation: Wav2Vec2Model and Wav2Vec2ForMaskedLM, in PyTorch.
Wav2Vec2 is a multi-modal model, combining speech and text. It's the first multi-modal model of its kind we welcome in Transformers.
The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=wav2vec2
Available notebooks:
Contributions:
Future Additions
The ConvBERT model was proposed in ConvBERT: Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
Six new models are released as part of the ConvBERT implementation: ConvBertModel, ConvBertForMaskedLM, ConvBertForSequenceClassification, ConvBertForTokenClassification, ConvBertForQuestionAnswering and ConvBertForMultipleChoice. These models are available both in PyTorch and TensorFlow.
Contributions:
The BORT model was proposed in Optimal Subarchitecture Extraction for BERT by Amazon's Adrian de Wynter and Daniel J. Perry. It is an optimal subset of architectural parameters for the BERT, which the authors refer to as “Bort”.
The BORT model can be loaded directly in the BERT architecture, therefore all BERT model heads are available for BORT.
Contributions:
When executing a script with Trainer using Amazon SageMaker and enabling SageMaker's data parallelism library, Trainer will automatically use the smdistributed library. All maintained examples have been tested with this functionality. Here is an overview of SageMaker data parallelism library.
A new Community Page has been added to the docs. These contain all the notebooks contributed by the community, as well as some community projects built around Transformers. Feel free to open a PR if you want your project to be showcased!
DeBERTa now has more model heads available.
BART, mBART, Marian, Pegasus and Blenderbot now have decoder-only model architectures. They can therefore be used in decoder-only settings.
None.
past_key_values in GPT-2 #9596 (@forest1988)report_to training arguments to control the integrations used #9735 (@sgugger)Trainer.hyperparameter_search docstring #9762 (@sorami)skip_special_tokens=True to FillMaskPipeline #9783 (@Narsil)test_head_masking = True flags in test files #9858 (@stancld)return_full_text parameter to TextGenerationPipeline. #9852 (@Narsil)from_slow in fast tokenizers build and fixes some bugs #9987 (@sgugger)encoder_no_repeat_ngram_size to generate. #9984 (@Narsil)🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.