release notes
release notes
Published 4/6/2021
MinorContains breaking changesSeven new models are released as part of the BigBird implementation: BigBirdModel, BigBirdForPreTraining, BigBirdForMaskedLM, BigBirdForCausalLM, BigBirdForSequenceClassification, BigBirdForMultipleChoice, BigBirdForQuestionAnswering in PyTorch.
BigBird is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. In addition to sparse attention, BigBird also applies global attention as well as random attention to the input sequence.
The BigBird model was proposed in Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
It is released with an accompanying blog post: Understanding BigBird's Block Sparse Attention
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=big_bird
Two new models are released as part of the GPT Neo implementation: GPTNeoModel, GPTNeoForCausalLM in PyTorch.
GPT-Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. EleutherAI's primary goal is to replicate a GPT-3 DaVinci-sized model and open-source it to the public.
The implementation within Transformers is a GPT2-like causal language model trained on the Pile dataset.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=gpt_neo
Features have been added to some examples, and additional examples have been added.
Based on the accelerate library, examples completely exposing the training loop are now part of the library. For easy customization if you want to try a new research idea!
examples/multiple-choice/run_swag_no_trainer.py #10934 (@stancld)examples/run_ner_no_trainer.py #10902 (@stancld)examples/language_modeling/run_mlm_no_trainer.py #11001 (@hemildesai)examples/language_modeling/run_clm_no_trainer.py #11026 (@hemildesai)Thanks to the amazing contributions of @bhadreshpsavani, all examples with Trainer are now standardized and all support the predict stage and will return/save metrics in the same fashion.
The Trainer now supports SageMaker model parallelism out of the box, the old SageMakerTrainer is deprecated as a consequence and will be removed in version 5.
FLAX support has been widened to support all model heads of the BERT architecture, alongside a general conversion script for checkpoints in PyTorch to be used in FLAX.
Auto models now have a FLAX implementation.
pipeline.framework would actually contain a fully qualified model. #10970 (@Narsil)release notes
Published 4/6/2021
MinorContains breaking changesSeven new models are released as part of the BigBird implementation: BigBirdModel, BigBirdForPreTraining, BigBirdForMaskedLM, BigBirdForCausalLM, BigBirdForSequenceClassification, BigBirdForMultipleChoice, BigBirdForQuestionAnswering in PyTorch.
BigBird is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. In addition to sparse attention, BigBird also applies global attention as well as random attention to the input sequence.
The BigBird model was proposed in Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
It is released with an accompanying blog post: Understanding BigBird's Block Sparse Attention
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=big_bird
Two new models are released as part of the GPT Neo implementation: GPTNeoModel, GPTNeoForCausalLM in PyTorch.
GPT-Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. EleutherAI's primary goal is to replicate a GPT-3 DaVinci-sized model and open-source it to the public.
The implementation within Transformers is a GPT2-like causal language model trained on the Pile dataset.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=gpt_neo
Features have been added to some examples, and additional examples have been added.
Based on the accelerate library, examples completely exposing the training loop are now part of the library. For easy customization if you want to try a new research idea!
examples/multiple-choice/run_swag_no_trainer.py #10934 (@stancld)examples/run_ner_no_trainer.py #10902 (@stancld)examples/language_modeling/run_mlm_no_trainer.py #11001 (@hemildesai)examples/language_modeling/run_clm_no_trainer.py #11026 (@hemildesai)Thanks to the amazing contributions of @bhadreshpsavani, all examples with Trainer are now standardized and all support the predict stage and will return/save metrics in the same fashion.
The Trainer now supports SageMaker model parallelism out of the box, the old SageMakerTrainer is deprecated as a consequence and will be removed in version 5.
FLAX support has been widened to support all model heads of the BERT architecture, alongside a general conversion script for checkpoints in PyTorch to be used in FLAX.
Auto models now have a FLAX implementation.
pipeline.framework would actually contain a fully qualified model. #10970 (@Narsil)🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.