release notes
release notes
Published 1/13/2021
MinorContains breaking changesFour new models are released as part of the LED implementation: LEDModel, LEDForConditionalGeneration, LEDForSequenceClassification, LEDForQuestionAnswering, in PyTorch. The first two models have a TensorFlow version.
LED is the encoder-decoder variant of the Longformer model by allenai.
The LED model was proposed in Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=led
Available notebooks:
Contributions:
The PyTorch generation function now allows to return:
scores - the logits generated at each stepattentions - all attention weights at each generation stephidden_states - all hidden states at each generation stepby simply adding return_dict_in_generate to the config or as an input to .generate()
Tweet:
Notebooks for a better explanation:
PR:
The TensorFlow version of the BERT-like models have been updated and are now twice as fast as the previous versions.
This version introduces a new API for TensorFlow saved models, which can now be exported with model.save_pretrained("path", saved_model=True) and easily loaded into a TensorFlow Serving environment.
Initial support for DeepSpeed to accelerate distributed training on several GPUs. This is an experimental feature that hasn't been fully tested yet, but early results are very encouraging (see this comment). Stay tuned for more details in the coming weeks!
The encoder-decoder version of the templates is now part of Transformers! Adding an encoder-decoder model is made very easy with this addition. More information can be found in the README.
The initialization process has been changed to only import what is required. Therefore, when using only PyTorch models, TensorFlow will not be imported and vice-versa. In the best situations the import of a transformers model now takes only a few hundreds of milliseconds (~200ms) compared to more than a few seconds (~3s) in previous versions.
Some models now have improved documentation. The LayoutLM model has seen a general overhaul in its documentation thanks to @NielsRogge.
The tokenizer-only models Bertweet, Herbert and Phobert now have their own documentation pages thanks to @Qbiwan.
There are no breaking changes between the previous version and this one. This will be the first version to require TensorFlow >= 2.3.
label_smoothing_factor training arg #9282 (@sgugger)past_key_values return a tuple of tuple as a default #9381 (@patrickvonplaten)--model_parallel #9451 (@stas00)prepare_seq2seq_batch #9524 (@sgugger)release notes
Published 1/13/2021
MinorContains breaking changesFour new models are released as part of the LED implementation: LEDModel, LEDForConditionalGeneration, LEDForSequenceClassification, LEDForQuestionAnswering, in PyTorch. The first two models have a TensorFlow version.
LED is the encoder-decoder variant of the Longformer model by allenai.
The LED model was proposed in Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=led
Available notebooks:
Contributions:
The PyTorch generation function now allows to return:
scores - the logits generated at each stepattentions - all attention weights at each generation stephidden_states - all hidden states at each generation stepby simply adding return_dict_in_generate to the config or as an input to .generate()
Tweet:
Notebooks for a better explanation:
PR:
The TensorFlow version of the BERT-like models have been updated and are now twice as fast as the previous versions.
This version introduces a new API for TensorFlow saved models, which can now be exported with model.save_pretrained("path", saved_model=True) and easily loaded into a TensorFlow Serving environment.
Initial support for DeepSpeed to accelerate distributed training on several GPUs. This is an experimental feature that hasn't been fully tested yet, but early results are very encouraging (see this comment). Stay tuned for more details in the coming weeks!
The encoder-decoder version of the templates is now part of Transformers! Adding an encoder-decoder model is made very easy with this addition. More information can be found in the README.
The initialization process has been changed to only import what is required. Therefore, when using only PyTorch models, TensorFlow will not be imported and vice-versa. In the best situations the import of a transformers model now takes only a few hundreds of milliseconds (~200ms) compared to more than a few seconds (~3s) in previous versions.
Some models now have improved documentation. The LayoutLM model has seen a general overhaul in its documentation thanks to @NielsRogge.
The tokenizer-only models Bertweet, Herbert and Phobert now have their own documentation pages thanks to @Qbiwan.
There are no breaking changes between the previous version and this one. This will be the first version to require TensorFlow >= 2.3.
label_smoothing_factor training arg #9282 (@sgugger)past_key_values return a tuple of tuple as a default #9381 (@patrickvonplaten)--model_parallel #9451 (@stas00)prepare_seq2seq_batch #9524 (@sgugger)🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.