release notes
release notes
Published 5/12/2022
MinorContains breaking changesDisclaimer: this release is the first release with no Python 3.6 support.
The OPT model was proposed in Open Pre-trained Transformer Language Models by Meta AI. OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3.
The FLAVA model was proposed in FLAVA: A Foundational Language And Vision Alignment Model by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela and is accepted at CVPR 2022.
The paper aims at creating a single unified foundation model which can work across vision, language as well as vision-and-language multimodal tasks.
The YOLOS model was proposed in You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. YOLOS proposes to just leverage the plain Vision Transformer (ViT) for object detection, inspired by DETR. It turns out that a base-sized encoder-only Transformer can also achieve 42 AP on COCO, similar to DETR and much more complex frameworks such as Faster R-CNN.
The RegNet model was proposed in Designing Network Design Spaces by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
The authors design search spaces to perform Neural Architecture Search (NAS). They first start from a high dimensional search space and iteratively reduce the search space by empirically applying constraints based on the best-performing models sampled by the current search space.
The TAPEX model was proposed in TAPEX: Table Pre-training via Learning a Neural SQL Executor by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. TAPEX pre-trains a BART model to solve synthetic SQL queries, after which it can be fine-tuned to answer natural language questions related to tabular data, as well as performing table fact checking.
The Data2Vec model was proposed in data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli. Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.
The vision model is added in v4.19.0.
PyTorch recently upstreamed the Fairscale FSDP into PyTorch Distributed with additional optimizations. This PR is aimed at integrating it into Trainer API.
It enables Distributed Training at Scale. It's a wrapper for sharding Module parameters across data parallel workers. This is inspired by Xu et al. as well as the ZeRO Stage 3 from DeepSpeed. PyTorch FSDP will focus more on production readiness and long-term support. This includes better integration with ecosystems and improvements on performance, usability, reliability, debuggability and composability.
New example scripts were added for image classification and semantic segmentation. Both now have versions that leverage the Trainer API and Accelerate.
To continue democratizing good machine learning, we're making the Transformers documentation more accessible to non-English speakers; starting with Spanish (572M speakers worldwide).
DataCollatorWithPadding by @secsilm in #16662SpmConverter for sentencepiece's model using the byte fallback feature by @SaulLu in #16629no_trainer scripts by @muellerzr in #16703Tapex in table question answering pipeline. by @Narsil in #16663.from_pretrained] Raise a warning if model weights are not in float32 by @sanchit-gandhi in #16762from_pretrained(..., low_cpu_mem_usage=True) + tests by @stas00 in #16657LayoutLMv2 tokenization docstrings by @qqaatw in #16187rum_clm.py seeking text column name twice by @dandelin in #16624attention_mask on gpt2 by @wiio12 in #16829Speech2TextTokenizer by Speech2TextFeatureExtractor in some docstrings by @SaulLu in #16835num_return_sequences>1. by @Narsil in #16828array key in raw dictionnaries in ASR pipeline. by @Narsil in #16827convert_file_size_to_int by @mariosasko in #16891distributed_concat with scalar tensor by @Yard1 in #16963decoder_module by @sanchit-gandhi in #17036token_type_ids by @deutschmn in #17082mobilebert onnx configs by @manandey in #17029question_answering pipeline. by @Narsil in #17143Trainer by @Yard1 in #17166The following contributors have made significant changes to the library over the last release:
release notes
Published 5/12/2022
MinorContains breaking changesDisclaimer: this release is the first release with no Python 3.6 support.
The OPT model was proposed in Open Pre-trained Transformer Language Models by Meta AI. OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3.
The FLAVA model was proposed in FLAVA: A Foundational Language And Vision Alignment Model by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela and is accepted at CVPR 2022.
The paper aims at creating a single unified foundation model which can work across vision, language as well as vision-and-language multimodal tasks.
The YOLOS model was proposed in You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. YOLOS proposes to just leverage the plain Vision Transformer (ViT) for object detection, inspired by DETR. It turns out that a base-sized encoder-only Transformer can also achieve 42 AP on COCO, similar to DETR and much more complex frameworks such as Faster R-CNN.
The RegNet model was proposed in Designing Network Design Spaces by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
The authors design search spaces to perform Neural Architecture Search (NAS). They first start from a high dimensional search space and iteratively reduce the search space by empirically applying constraints based on the best-performing models sampled by the current search space.
The TAPEX model was proposed in TAPEX: Table Pre-training via Learning a Neural SQL Executor by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. TAPEX pre-trains a BART model to solve synthetic SQL queries, after which it can be fine-tuned to answer natural language questions related to tabular data, as well as performing table fact checking.
The Data2Vec model was proposed in data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli. Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.
The vision model is added in v4.19.0.
PyTorch recently upstreamed the Fairscale FSDP into PyTorch Distributed with additional optimizations. This PR is aimed at integrating it into Trainer API.
It enables Distributed Training at Scale. It's a wrapper for sharding Module parameters across data parallel workers. This is inspired by Xu et al. as well as the ZeRO Stage 3 from DeepSpeed. PyTorch FSDP will focus more on production readiness and long-term support. This includes better integration with ecosystems and improvements on performance, usability, reliability, debuggability and composability.
New example scripts were added for image classification and semantic segmentation. Both now have versions that leverage the Trainer API and Accelerate.
To continue democratizing good machine learning, we're making the Transformers documentation more accessible to non-English speakers; starting with Spanish (572M speakers worldwide).
DataCollatorWithPadding by @secsilm in #16662SpmConverter for sentencepiece's model using the byte fallback feature by @SaulLu in #16629no_trainer scripts by @muellerzr in #16703Tapex in table question answering pipeline. by @Narsil in #16663.from_pretrained] Raise a warning if model weights are not in float32 by @sanchit-gandhi in #16762from_pretrained(..., low_cpu_mem_usage=True) + tests by @stas00 in #16657LayoutLMv2 tokenization docstrings by @qqaatw in #16187rum_clm.py seeking text column name twice by @dandelin in #16624attention_mask on gpt2 by @wiio12 in #16829Speech2TextTokenizer by Speech2TextFeatureExtractor in some docstrings by @SaulLu in #16835num_return_sequences>1. by @Narsil in #16828array key in raw dictionnaries in ASR pipeline. by @Narsil in #16827convert_file_size_to_int by @mariosasko in #16891distributed_concat with scalar tensor by @Yard1 in #16963decoder_module by @sanchit-gandhi in #17036token_type_ids by @deutschmn in #17082mobilebert onnx configs by @manandey in #17029question_answering pipeline. by @Narsil in #17143Trainer by @Yard1 in #17166The following contributors have made significant changes to the library over the last release:
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.