LLaMA

The LLaMA model was proposed in LLaMA: Open and Efficient Foundation Language Models. It is a collection of foundation language models ranging from 7B to 65B parameters. You can request access to the weights here then use the conversion script to generate a checkpoint compatible with Hugging Face

LLaMA Implementation by @zphang in #21955

Pix2Struct, MatCha, DePlot

Pix2Struct is a pretrained image-to-text model for purely visual language understanding, which can be finetuned on tasks containing visually-situated language. Pix2Struct has been fine-tuned on various tasks and datasets, ranging from image captioning and visual question answering (VQA) over different inputs (books, charts, science diagrams) to captioning UI components, and others.

Add Pix2Struct by @younesbelkada in #21400
Add DePlot + MatCha on transformers by @younesbelkada in #22528

Mega

MEGA proposes a new approach to self-attention with each encoder layer having a multi-headed exponential moving average in addition to a single head of standard dot-product attention, giving the attention mechanism stronger positional biases. This allows MEGA to perform competitively to Transformers on standard benchmarks including LRA while also having significantly fewer parameters. MEGA’s compute efficiency allows it to scale to very long sequences, making it an attractive option for long-document NLP tasks.

Add Mega: Moving Average Equipped Gated Attention by @mnaylor5 in #21766

GPTBigCode

The model is a an optimized GPT2 model with support for Multi-Query Attention.

Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) by @jlamypoirier in #22575

NLLB-MoE

The mixture of experts version of the NLLB release has been added to the library.

NLLB-MoE Adds the moe model by @ArthurZucker in #22024

Serializing 8bit models

[bnb] Let's make serialization of int8 models possible by @younesbelkada in #22177

You can now push 8bit models and/or load 8bit models directly from the Hub, save memory and load your 8bit models faster! An example repo here

Breaking Changes

Ordering of height and width for the BLIP image processor

Notes from the PR:

The BLIP image processor incorrectly passed in the dimensions to resize in the order (width, height). This is reordered to be correct.

In most cases, this won't have an effect as the default height and width are the same. However, this is not backwards compatible for custom configurations with different height, width settings and direct calls to the resize method with different height, width values.

🚨🚨🚨 Fix ordering of height, width for BLIP image processor by @amyeroberts in #22466

Prefix tokens for the NLLB tokenizer

The big problem was the prefix and suffix tokens of the NLLB tokenizer.

Previous behaviour:

>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
>>> tokenizer("How was your day?").input_ids
[13374, 1398, 4260, 4039, 248130, 2, 256047]
>>> # 2: '</s>'
>>> # 256047 : 'eng_Latn'

New behaviour

>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
>>> tokenizer("How was your day?").input_ids
[256047, 13374, 1398, 4260, 4039, 248130, 2]

In case you have pipelines that were relying on the old behavior, here is how you would enable it once again:

>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M", legacy_behaviour = True)

🚨🚨🚨 [NLLB Tokenizer] Fix the prefix tokens 🚨🚨🚨 by @ArthurZucker in #22313

TensorFlow ports

The BLIP model is now available in TensorFlow.

Add TF port of BLIP by @Rocketknight1 in #22090

Export TF Generate with a TF tokenizer

As the title says, this PR adds the possibility to export TF generate with a TF-native tokenizer -- the full thing in a single TF graph.

Generate: Export TF generate with a TF tokenizer by @gante in #22310

Task guides

A new task guide has been added, focusing on depth-estimation.

Depth estimation task guide by @MKhalusova in #22205

Bugfixes and improvements

Load optimizer state on CPU to avoid CUDA OOM by @sgugger in #22159
Run all tests by default by @sgugger in #22162
Fix: unfinished_sequences with correct device by @Stxr in #22184
Revert 22152 MaskedImageCompletionOutput changes by @amyeroberts in #22187
Regression pipeline device by @sgugger in #22190
Update BridgeTowerForContrastiveLearning by @abhiwand in #22145
t5 remove data dependency by @prathikr in #22097
Fix DeepSpeed CI by @ydshieh in #22194
Fix typo in Align docs by @alaradirik in #22199
Update expected values in MgpstrModelIntegrationTest by @ydshieh in #22195
Italian Translation of migration.mdx by @Baelish03 in #22183
Update tiny model creation script by @ydshieh in #22202
Temporarily fix ONNX model exporting error by @SatyaJandhyalaAtMS in #21830
[XGLM] Add accelerate support for XGLM by @younesbelkada in #22207
fixes a typo in WhisperFeatureExtractor docs. by @susnato in #22208
Hotfix for natten issue with torch 2.0.0 on CircleCI by @ydshieh in #22218
fix typos in llama.mdx by @keturn in #22223
fix code example in mgp-str doc by @wdp-007 in #22219
Use dash==2.8.1 for now for daily CI by @ydshieh in #22227
LLaMA house-keeping by @sgugger in #22216
fix AutoTP in deepspeed could not work for bloom by @sywangyi in #22196
Add LlamaForSequenceClassification by @lewtun in #22209
Removed .mdx extension in two links by @MKhalusova in #22230
fix(docs): fix task guide links in model docs by @Seb0 in #22226
Fix natten by @alihassanijr in #22229
Revert "Use dash==2.8.1 for now for daily CI" by @ydshieh in #22233
Fix Unnecessary move of tensors from CPU to GPU in LlamaRotaryEmbedding by @ma787639046 in #22234
[trainer] param count for deepspeed zero3 by @stas00 in #22193
Update training_args.py -- a nightly install is not required anymore for torch.compile by @pminervini in #22266
[Docs] fix typos in some tokenizer docs by @yesinkim in #22256
Italian translation perf_infer_cpu by @nickprock in #22243
[Trainer] Add optional communication backends for torch.distributed when using GPU by @heya5 in #22247
Fix the gradient checkpointing bug of the llama model by @yqy2001 in #22270
Fix balanced and auto device_map by @sgugger in #22271
Rework a bit the LLaMA conversion script by @sgugger in #22236
Proper map location for optimizer load by @sgugger in #22273
Fix doc links by @amyeroberts in #22274
Move torch.compile() wrapping after DDP/FSDP wrapping to ensure correct graph breaks during training by @ani300 in #22279
Example of pad_to_multiple_of for padding and truncation guide & docstring update by @MKhalusova in #22278
Update vision docstring bool masked pos by @amyeroberts in #22237
replace_8bit_linear modules_to_not_convert default value fix by @BlackSamorez in #22238
Fix error in mixed precision training of TFCvtModel by @gcuder in #22267
More doctests by @ydshieh in #22268
fix more doctests by @ydshieh in #22292
Add translation perf_infer_gpu_one for it by @davidegazze in #22296
Restore fp16 support on xla gpu device by @ymwangg in #22300
Correct NATTEN function signatures and force new version by @alihassanijr in #22298
[deepspeed] offload + non-cpuadam optimizer exception doc by @stas00 in #22044
Final update of doctest by @ydshieh in #22299
Add MaskedImageModelingOutput by @alaradirik in #22212
Enable traced model for text-generation task by @jiqing-feng in #22265
add low_cpu_mem_usage option in run_clm.py example which will benefit… by @sywangyi in #22288
fix: Allow only test_file in pytorch and flax summarization by @connor-henderson in #22293
Fix position embeddings for GPT-J and CodeGen by @njhill in #22069
Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer by @silentghoul-spec in #22302
Enforce max_memory for device_map strategies by @sgugger in #22311
Beef up Llama tests by @gante in #22314
docs: Resolve incorrect type typo in trainer methods by @tomaarsen in #22316
Chunkable token classification pipeline by @luccailliau in #21771
Fix PipelineTests skip conditions by @ydshieh in #22320
[deepspeed zero3] need generate(synced_gpus=True, ...) by @stas00 in #22242
[gptj] support older pytorch version by @stas00 in #22325
Move common properties to BackboneMixin by @amyeroberts in #21855
Backbone add mixin tests by @amyeroberts in #22542
Backbone add out indices by @amyeroberts in #22493
[MBart] Add accelerate support for MBart by @younesbelkada in #22309
Fixed gradient checkpoint bug for TimeSeriesTransformer by @mollerup23 in #22272
Mention why one needs to specify max_steps in Trainer by @lhoestq in #22333
Fix various imports by @sgugger in #22281
Minor typo in pipeline FillMaskPipeline's documentation. by @SamuelLarkin in #22339
Added type hints to TFDeiTModel by @Batese2001 in #22327
Fix --bf16 option support for Neuron after PR #22300 by @jeffhataws in #22307
Generate: add test for left-padding support by @gante in #22322
Enable training Llama with model or pipeline parallelism by @kooshi in #22329
Automatically create/update tiny models by @ydshieh in #22275
[HFTracer] Make embeddings ops take on the dtype of the weight by @jamesr66a in #22347
Fix typo in Greedy Search Description by @awinml in #22345
Generate: Add GPTNeoX integration test by @gante in #22346
Update docker files to use official torch 2.0.0 by @ydshieh in #22357
Pin tensorflow-text to go with tensorflow by @sgugger in #22362
Improve error message by @Mahrkeenerh in #22361
TensorFlow: pin maximum version to 2.12 by @gante in #22364
Resnet flax by @Shubhamai in #21472
[Trainer] add disclaimer that full_determinism is slow by @stas00 in #22368
[safetensors] don't use in torch<1.10 by @stas00 in #22370
TensorFlow: additional missing cmake dependencies in CI by @gante in #22383
Changed world_size() to get_world_size() bugfix by @Charlie-Bell in #22381
Translated documentation in italian by @nickprock in #22388
Adapt find_tied_parameters to handle breaking change in Accelerate by @sgugger in #22360
load_in_8bit now respects 'balanced' device maps in multi-gpu environments by @kooshi in #22377
Wav2Vec2ProcessorWithLM can return N best hypotheses now by @vsokolovskii in #22235
Seq2seq trainer generation config arg by @Natooz in #22323
Generate: support for left-padding on GPTNeoX and Llama by @gante in #22382
[bnb] Force requires_grad to be False by @younesbelkada in #22396
Transformers env safetensors by @sgugger in #22400
[Pix2Struct] Add support to resize embeddings by @NielsRogge in #22394
Trainer: move Seq2SeqTrainer imports under the typing guard by @gante in #22401
Trainer: missing None check by @gante in #22404
Hardware Auto-Setup for Examples by @dongreenberg in #22319
[neptune] fix checkpoint bug with relative out_dir by @kshitij12345 in #22102
Fix bug in perplexity guide calculations and update perplexity numbers. Fixes #22348 by @fpgaminer in #22411
[performance] ensure causal_mask is created directly on device by @jeffra in #22378
MBart: Fix docs and doctests by @gante in #22422
Add clean_up_tokenization_spaces to config by @ArthurZucker in #22341
Hyperparameter search reporting to W&B by @NoB0 in #22440
[bnb] fix bnb failing test by @younesbelkada in #22439
[Generate] Add conditional generation for multimodal models by @younesbelkada in #22424
Don't hard error when cache version can't be converted to int by @sgugger in #22427
Use real tokenizers if tiny version(s) creation has issue(s) by @ydshieh in #22428
Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head))" by @sgugger in #22444
[Pix2Struct] Fix slow test by @younesbelkada in #22448
Revert "Fix --bf16 option support for Neuron after PR #22300" by @jeffhataws in #22451
Update Neptune docs by @normandy7 in #22452
Avoid using personal HF token in CI by @ydshieh in #22453
Update release instructions by @sgugger in #22454
Pin ruff by @sgugger in #22455
Update: ignore padding support for TransfoXL training when n_clusters==0 by @StefanHeng in #22457
Rescale image back if it was scaled during PIL conversion by @amyeroberts in #22458
Skip flaky NLLB Moe test for now by @amyeroberts in #22463
Guard imports of PreTrainedTokenizerFast on is_tokenizers_available by @hvaara in #22285
[NLLB-MoE] model_type update for auto mapping by @ArthurZucker in #22470
Llama: support for max_position_embeddings by @gante in #22471
Docs fix: Multinomial sampling decoding needs "num_beams=1", since by default it is usually not 1. by @manueldeprada in #22473
(Re-)Enable Nightly + Past CI by @ydshieh in #22393
Relax eos_token_id < 0 checks in generate() from ValueError to warning by @lewtun in #22472
Update Wav2Vec2ProcessorWithLM doc example by @ydshieh in #22474
Making sure we can use safetensors to serialize all the time. by @Narsil in #22437
Update Neptune callback docstring by @normandy7 in #22497
Test fetch v2 by @sgugger in #22367
Update convert_llama_weights_to_hf.py by @Ricardokevins in #22525
[Time-Series] fix past_observed_mask type by @elisim in #22076
Fix llama tokenizer by @ArthurZucker in #22402
[WIP] docs: ko: sagemaker.mdx by @jungnerd in #22509
added biogpt token classifier by @upjabir in #22447
Generate: TextIteratorStreamer (streamer for gradio) by @gante in #22501
Fix convert_opt_original_pytorch_checkpoint_to_pytorch.py typo by @larekrow in #22526
llama docs: fix conversion script url by @python273 in #22514
fix LayoutLMv3TokenizerFast subword label after 'Ġ' token by @thibaultdouzon in #21695
[BLIP] fix cross attentions for BlipTextEncoder by @zhbh01 in #22515
[Trainer] Force is_model_parallel when model is loaded in multiple GPUs using accelerate by @younesbelkada in #22532
[T5] Enable naive Pipeline Parallelism training for T5 by @younesbelkada in #22535
Fix missing metrics with multiple eval datasets by @hawkeoni in #22536
[setup] drop deprecated distutils usage by @XuehaiPan in #22531
Generate: Enable easier TextStreamer customization by @vblagoje in #22516
[setup] migrate setup script to pyproject.toml by @XuehaiPan in #22539
Update test_image_processing_pix2struct.py by @younesbelkada in #22543
Fix OPTForQuestionAnswering doc string by @curlup in #22481
Generate: Add text streamer decoding options by @gante in #22544
🔥py38 + torch 2 🔥🔥🔥🚀 by @ydshieh in #22204
Time to Say Goodbye, torch 1.7 and 1.8 by @ydshieh in #22291
[Roformer] Fixing a bug in RoFormerEncoder where it was ignoring the length of past_key_values when generating as a decoder by @TheWall9 in #22416
Implemented safetensors checkpoints save/load for Trainer by @ViktorooReps in #22498
Remove hack for dynamic modules and use Python functions instead by @sgugger in #22537
[bnb] Fix typo by @younesbelkada in #22556
Add id2label and label2id to model's config in run_xnil by @maziyarpanahi in #22558
Soft error whisper. by @Narsil in #22475
corrected the code comment for the output of find_pruneable_heads_and_indices by @SunHaozhe in #22557
Flax Regnet by @Shubhamai in #21867
fix _no_split_modules for Whisper model by @pacman100 in #22486
Fix inverted conditional in TF common test! by @Rocketknight1 in #22540
Generate: TextIteratorStreamer timeout by @gante in #22576
Move back doctest instructions to setup.cfg by @sgugger in #22587
Tests: disable accelerate_tests mark warnings by @gante in #22585
Fix PT-TF equivalence test for GPT1 by @Rocketknight1 in #22586
Add thousands separator in training summary by @qmeeus in #22583
docs: ko: complete _toctree.yml by @wonhyeongseo in #22581
Sync preprocesses before loading the processor at run_speech_recognition_ctc.py by @mpenagar in #21926
Fix a typo in one of the BLIP pretrained checkpoint names by @Rocketknight1 in #22588
Adding support for BPE merge creation from scores instead of ids. by @Narsil in #22582
Use native TF checkpoints for the BLIP TF tests by @Rocketknight1 in #22593
feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart by @kaustubh-s1 in #22591
Adding Llama FastTokenizer support. by @Narsil in #22264
Revert error back into warning for byte fallback conversion. by @Narsil in #22607
Seq2SeqTrainer: use unwrapped model to retrieve the generation config by @gante in #22584
Make tiny model creation + pipeline testing more robust by @ydshieh in #22500
docs: Fix broken link to generation strategies by @connor-henderson in #22623
update_pip_test_mapping by @ydshieh in #22606
A script to add/update pipeline_model_mapping systematically by @ydshieh in #22180
[bnb] 8bit models should not be converted to DDP by @younesbelkada in #22628
LlamaTokenizerFast Fix (.., from_slow=True). by @Narsil in #22630
[Blip] Fix slow tests and doctests with correct values by @younesbelkada in #22632
Update tiny model summary file for recent models by @ydshieh in #22637
fix FSDP version related issues by @pacman100 in #22489
🌐[i18n-KO] Translate autoclass_tutorial to Korean and Fix the typo of quicktour by @gabrielwithappy in #22533
Move labels to the same device as logits for LlamaForSequenceClassification and Blip2 by @xssChauhan in #22596
Fix typo by @Ronalmoo in #22650
Fix MegaModel CI by @ydshieh in #22652
🌐 [i18n-KO] Translated pipeline_tutorial.mdx to Korean by @wonhyeongseo in #22508
Small nit, by @ArthurZucker in #22653
[tokenization] do not push special file by @ArthurZucker in #22657
[OPT] Fix default attention mask size by @ArthurZucker in #22649
Generate: add API warning to streamers by @gante in #22659
Revert migration of setup to pyproject.toml by @sgugger in #22658
moved labels to the same device as logits for BLOOM, GPT Neo, GPT NeoX, RoBERTa and VIT models by @iamarunbrahma in #22663
Model parallelism: Moving labels to the same device as logits for BridgeTower models by @shahad-mahmud in #22676
(feat): Moving labels to same device as logits for Deit by @xssChauhan in #22679
Make dynamic code work with offline mode by @sgugger in #22661
Fix quantization docs typo by @python273 in #22666
use func to check can_generate by @xin3he in #22643
add GPTNeoXForSequenceClassification by @Asugawara in #22671
Model parallelism: Moving labels to same devices as the logits are by @shahad-mahmud in #22691
Update some MarkupLM tests' expected values by @ydshieh in #22667
Make it easier to develop without a dev install by @sgugger in #22697
Enable naive Pipeline Parallelism training for Gpt neox japanese and san japanese by @mayankagarwals in #22702
Clarify stride option by @luccailliau in #22684
Remove 2 failing ONNX conversion tests by @ydshieh in #22660
Replace -100s in predictions by the pad token by @sgugger in #22693
Fix decorator order by @ydshieh in #22708
Update input values for docstring by @amyeroberts in #22631
remove wrong doc in readme by @ArthurZucker in #22723
Added parallel device usage for GPT-J by @jprivera44 in #22713
add model resources for CPMAnt (new) by @pioliverse in #20906
Modify pipeline_tutorial.mdx by @ARKA1112 in #22726
[tests] switch to torchrun by @stas00 in #22712
torch.distributed group initialization for torch_neuron disabled when optimum-neuron is installed by @michaelbenayoun in #22728
add fast support and option by @ArthurZucker in #22724
Update warning levels by @NielsRogge in #22727
Fix docstrings for TF BLIP by @Rocketknight1 in #22618
[Doctest] Add configuration_m2m_100.py by @elabongaatuo in #22733
[Doctest] Add configuration_mvp.py by @elabongaatuo in #22735
Indexing fix for gpt_bigcode by @jlamypoirier in #22737
Make vilt, switch_transformers compatible with model parallelism by @Xrenya in #22703
[Pix2struct] Simplify generation by @NielsRogge in #22527

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@zphang
- LLaMA Implementation (#21955)
@Seb0
- fix(docs): fix task guide links in model docs (#22226)
@mnaylor5
- Add Mega: Moving Average Equipped Gated Attention (#21766)
@Shubhamai
- Resnet flax (#21472)
- Flax Regnet (#21867)
@wonhyeongseo
- docs: ko: complete _toctree.yml (#22581)
- 🌐 [i18n-KO] Translated pipeline_tutorial.mdx to Korean (#22508)
@jlamypoirier
- Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575)
- Indexing fix for gpt_bigcode (#22737)
@pioliverse
- add model resources for CPMAnt (new) (#20906)

LLaMA

LLaMA Implementation by @zphang in #21955

Pix2Struct, MatCha, DePlot

Add Pix2Struct by @younesbelkada in #21400
Add DePlot + MatCha on transformers by @younesbelkada in #22528

Mega

Add Mega: Moving Average Equipped Gated Attention by @mnaylor5 in #21766

GPTBigCode

The model is a an optimized GPT2 model with support for Multi-Query Attention.

Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) by @jlamypoirier in #22575

NLLB-MoE

The mixture of experts version of the NLLB release has been added to the library.

NLLB-MoE Adds the moe model by @ArthurZucker in #22024

Serializing 8bit models

[bnb] Let's make serialization of int8 models possible by @younesbelkada in #22177

You can now push 8bit models and/or load 8bit models directly from the Hub, save memory and load your 8bit models faster! An example repo here

Breaking Changes

Ordering of height and width for the BLIP image processor

Notes from the PR:

The BLIP image processor incorrectly passed in the dimensions to resize in the order (width, height). This is reordered to be correct.

🚨🚨🚨 Fix ordering of height, width for BLIP image processor by @amyeroberts in #22466

Prefix tokens for the NLLB tokenizer

The big problem was the prefix and suffix tokens of the NLLB tokenizer.

Previous behaviour:

>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
>>> tokenizer("How was your day?").input_ids
[13374, 1398, 4260, 4039, 248130, 2, 256047]
>>> # 2: '</s>'
>>> # 256047 : 'eng_Latn'

New behaviour

>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
>>> tokenizer("How was your day?").input_ids
[256047, 13374, 1398, 4260, 4039, 248130, 2]

In case you have pipelines that were relying on the old behavior, here is how you would enable it once again:

>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M", legacy_behaviour = True)

🚨🚨🚨 [NLLB Tokenizer] Fix the prefix tokens 🚨🚨🚨 by @ArthurZucker in #22313

TensorFlow ports

The BLIP model is now available in TensorFlow.

Add TF port of BLIP by @Rocketknight1 in #22090

Export TF Generate with a TF tokenizer

As the title says, this PR adds the possibility to export TF generate with a TF-native tokenizer -- the full thing in a single TF graph.

Generate: Export TF generate with a TF tokenizer by @gante in #22310

Task guides

A new task guide has been added, focusing on depth-estimation.

Depth estimation task guide by @MKhalusova in #22205

Bugfixes and improvements

Load optimizer state on CPU to avoid CUDA OOM by @sgugger in #22159
Run all tests by default by @sgugger in #22162
Fix: unfinished_sequences with correct device by @Stxr in #22184
Revert 22152 MaskedImageCompletionOutput changes by @amyeroberts in #22187
Regression pipeline device by @sgugger in #22190
Update BridgeTowerForContrastiveLearning by @abhiwand in #22145
t5 remove data dependency by @prathikr in #22097
Fix DeepSpeed CI by @ydshieh in #22194
Fix typo in Align docs by @alaradirik in #22199
Update expected values in MgpstrModelIntegrationTest by @ydshieh in #22195
Italian Translation of migration.mdx by @Baelish03 in #22183
Update tiny model creation script by @ydshieh in #22202
Temporarily fix ONNX model exporting error by @SatyaJandhyalaAtMS in #21830
[XGLM] Add accelerate support for XGLM by @younesbelkada in #22207
fixes a typo in WhisperFeatureExtractor docs. by @susnato in #22208
Hotfix for natten issue with torch 2.0.0 on CircleCI by @ydshieh in #22218
fix typos in llama.mdx by @keturn in #22223
fix code example in mgp-str doc by @wdp-007 in #22219
Use dash==2.8.1 for now for daily CI by @ydshieh in #22227
LLaMA house-keeping by @sgugger in #22216
fix AutoTP in deepspeed could not work for bloom by @sywangyi in #22196
Add LlamaForSequenceClassification by @lewtun in #22209
Removed .mdx extension in two links by @MKhalusova in #22230
fix(docs): fix task guide links in model docs by @Seb0 in #22226
Fix natten by @alihassanijr in #22229
Revert "Use dash==2.8.1 for now for daily CI" by @ydshieh in #22233
Fix Unnecessary move of tensors from CPU to GPU in LlamaRotaryEmbedding by @ma787639046 in #22234
[trainer] param count for deepspeed zero3 by @stas00 in #22193
Update training_args.py -- a nightly install is not required anymore for torch.compile by @pminervini in #22266
[Docs] fix typos in some tokenizer docs by @yesinkim in #22256
Italian translation perf_infer_cpu by @nickprock in #22243
[Trainer] Add optional communication backends for torch.distributed when using GPU by @heya5 in #22247
Fix the gradient checkpointing bug of the llama model by @yqy2001 in #22270
Fix balanced and auto device_map by @sgugger in #22271
Rework a bit the LLaMA conversion script by @sgugger in #22236
Proper map location for optimizer load by @sgugger in #22273
Fix doc links by @amyeroberts in #22274
Move torch.compile() wrapping after DDP/FSDP wrapping to ensure correct graph breaks during training by @ani300 in #22279
Example of pad_to_multiple_of for padding and truncation guide & docstring update by @MKhalusova in #22278
Update vision docstring bool masked pos by @amyeroberts in #22237
replace_8bit_linear modules_to_not_convert default value fix by @BlackSamorez in #22238
Fix error in mixed precision training of TFCvtModel by @gcuder in #22267
More doctests by @ydshieh in #22268
fix more doctests by @ydshieh in #22292
Add translation perf_infer_gpu_one for it by @davidegazze in #22296
Restore fp16 support on xla gpu device by @ymwangg in #22300
Correct NATTEN function signatures and force new version by @alihassanijr in #22298
[deepspeed] offload + non-cpuadam optimizer exception doc by @stas00 in #22044
Final update of doctest by @ydshieh in #22299
Add MaskedImageModelingOutput by @alaradirik in #22212
Enable traced model for text-generation task by @jiqing-feng in #22265
add low_cpu_mem_usage option in run_clm.py example which will benefit… by @sywangyi in #22288
fix: Allow only test_file in pytorch and flax summarization by @connor-henderson in #22293
Fix position embeddings for GPT-J and CodeGen by @njhill in #22069
Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer by @silentghoul-spec in #22302
Enforce max_memory for device_map strategies by @sgugger in #22311
Beef up Llama tests by @gante in #22314
docs: Resolve incorrect type typo in trainer methods by @tomaarsen in #22316
Chunkable token classification pipeline by @luccailliau in #21771
Fix PipelineTests skip conditions by @ydshieh in #22320
[deepspeed zero3] need generate(synced_gpus=True, ...) by @stas00 in #22242
[gptj] support older pytorch version by @stas00 in #22325
Move common properties to BackboneMixin by @amyeroberts in #21855
Backbone add mixin tests by @amyeroberts in #22542
Backbone add out indices by @amyeroberts in #22493
[MBart] Add accelerate support for MBart by @younesbelkada in #22309
Fixed gradient checkpoint bug for TimeSeriesTransformer by @mollerup23 in #22272
Mention why one needs to specify max_steps in Trainer by @lhoestq in #22333
Fix various imports by @sgugger in #22281
Minor typo in pipeline FillMaskPipeline's documentation. by @SamuelLarkin in #22339
Added type hints to TFDeiTModel by @Batese2001 in #22327
Fix --bf16 option support for Neuron after PR #22300 by @jeffhataws in #22307
Generate: add test for left-padding support by @gante in #22322
Enable training Llama with model or pipeline parallelism by @kooshi in #22329
Automatically create/update tiny models by @ydshieh in #22275
[HFTracer] Make embeddings ops take on the dtype of the weight by @jamesr66a in #22347
Fix typo in Greedy Search Description by @awinml in #22345
Generate: Add GPTNeoX integration test by @gante in #22346
Update docker files to use official torch 2.0.0 by @ydshieh in #22357
Pin tensorflow-text to go with tensorflow by @sgugger in #22362
Improve error message by @Mahrkeenerh in #22361
TensorFlow: pin maximum version to 2.12 by @gante in #22364
Resnet flax by @Shubhamai in #21472
[Trainer] add disclaimer that full_determinism is slow by @stas00 in #22368
[safetensors] don't use in torch<1.10 by @stas00 in #22370
TensorFlow: additional missing cmake dependencies in CI by @gante in #22383
Changed world_size() to get_world_size() bugfix by @Charlie-Bell in #22381
Translated documentation in italian by @nickprock in #22388
Adapt find_tied_parameters to handle breaking change in Accelerate by @sgugger in #22360
load_in_8bit now respects 'balanced' device maps in multi-gpu environments by @kooshi in #22377
Wav2Vec2ProcessorWithLM can return N best hypotheses now by @vsokolovskii in #22235
Seq2seq trainer generation config arg by @Natooz in #22323
Generate: support for left-padding on GPTNeoX and Llama by @gante in #22382
[bnb] Force requires_grad to be False by @younesbelkada in #22396
Transformers env safetensors by @sgugger in #22400
[Pix2Struct] Add support to resize embeddings by @NielsRogge in #22394
Trainer: move Seq2SeqTrainer imports under the typing guard by @gante in #22401
Trainer: missing None check by @gante in #22404
Hardware Auto-Setup for Examples by @dongreenberg in #22319
[neptune] fix checkpoint bug with relative out_dir by @kshitij12345 in #22102
Fix bug in perplexity guide calculations and update perplexity numbers. Fixes #22348 by @fpgaminer in #22411
[performance] ensure causal_mask is created directly on device by @jeffra in #22378
MBart: Fix docs and doctests by @gante in #22422
Add clean_up_tokenization_spaces to config by @ArthurZucker in #22341
Hyperparameter search reporting to W&B by @NoB0 in #22440
[bnb] fix bnb failing test by @younesbelkada in #22439
[Generate] Add conditional generation for multimodal models by @younesbelkada in #22424
Don't hard error when cache version can't be converted to int by @sgugger in #22427
Use real tokenizers if tiny version(s) creation has issue(s) by @ydshieh in #22428
Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head))" by @sgugger in #22444
[Pix2Struct] Fix slow test by @younesbelkada in #22448
Revert "Fix --bf16 option support for Neuron after PR #22300" by @jeffhataws in #22451
Update Neptune docs by @normandy7 in #22452
Avoid using personal HF token in CI by @ydshieh in #22453
Update release instructions by @sgugger in #22454
Pin ruff by @sgugger in #22455
Update: ignore padding support for TransfoXL training when n_clusters==0 by @StefanHeng in #22457
Rescale image back if it was scaled during PIL conversion by @amyeroberts in #22458
Skip flaky NLLB Moe test for now by @amyeroberts in #22463
Guard imports of PreTrainedTokenizerFast on is_tokenizers_available by @hvaara in #22285
[NLLB-MoE] model_type update for auto mapping by @ArthurZucker in #22470
Llama: support for max_position_embeddings by @gante in #22471
Docs fix: Multinomial sampling decoding needs "num_beams=1", since by default it is usually not 1. by @manueldeprada in #22473
(Re-)Enable Nightly + Past CI by @ydshieh in #22393
Relax eos_token_id < 0 checks in generate() from ValueError to warning by @lewtun in #22472
Update Wav2Vec2ProcessorWithLM doc example by @ydshieh in #22474
Making sure we can use safetensors to serialize all the time. by @Narsil in #22437
Update Neptune callback docstring by @normandy7 in #22497
Test fetch v2 by @sgugger in #22367
Update convert_llama_weights_to_hf.py by @Ricardokevins in #22525
[Time-Series] fix past_observed_mask type by @elisim in #22076
Fix llama tokenizer by @ArthurZucker in #22402
[WIP] docs: ko: sagemaker.mdx by @jungnerd in #22509
added biogpt token classifier by @upjabir in #22447
Generate: TextIteratorStreamer (streamer for gradio) by @gante in #22501
Fix convert_opt_original_pytorch_checkpoint_to_pytorch.py typo by @larekrow in #22526
llama docs: fix conversion script url by @python273 in #22514
fix LayoutLMv3TokenizerFast subword label after 'Ġ' token by @thibaultdouzon in #21695
[BLIP] fix cross attentions for BlipTextEncoder by @zhbh01 in #22515
[Trainer] Force is_model_parallel when model is loaded in multiple GPUs using accelerate by @younesbelkada in #22532
[T5] Enable naive Pipeline Parallelism training for T5 by @younesbelkada in #22535
Fix missing metrics with multiple eval datasets by @hawkeoni in #22536
[setup] drop deprecated distutils usage by @XuehaiPan in #22531
Generate: Enable easier TextStreamer customization by @vblagoje in #22516
[setup] migrate setup script to pyproject.toml by @XuehaiPan in #22539
Update test_image_processing_pix2struct.py by @younesbelkada in #22543
Fix OPTForQuestionAnswering doc string by @curlup in #22481
Generate: Add text streamer decoding options by @gante in #22544
🔥py38 + torch 2 🔥🔥🔥🚀 by @ydshieh in #22204
Time to Say Goodbye, torch 1.7 and 1.8 by @ydshieh in #22291
[Roformer] Fixing a bug in RoFormerEncoder where it was ignoring the length of past_key_values when generating as a decoder by @TheWall9 in #22416
Implemented safetensors checkpoints save/load for Trainer by @ViktorooReps in #22498
Remove hack for dynamic modules and use Python functions instead by @sgugger in #22537
[bnb] Fix typo by @younesbelkada in #22556
Add id2label and label2id to model's config in run_xnil by @maziyarpanahi in #22558
Soft error whisper. by @Narsil in #22475
corrected the code comment for the output of find_pruneable_heads_and_indices by @SunHaozhe in #22557
Flax Regnet by @Shubhamai in #21867
fix _no_split_modules for Whisper model by @pacman100 in #22486
Fix inverted conditional in TF common test! by @Rocketknight1 in #22540
Generate: TextIteratorStreamer timeout by @gante in #22576
Move back doctest instructions to setup.cfg by @sgugger in #22587
Tests: disable accelerate_tests mark warnings by @gante in #22585
Fix PT-TF equivalence test for GPT1 by @Rocketknight1 in #22586
Add thousands separator in training summary by @qmeeus in #22583
docs: ko: complete _toctree.yml by @wonhyeongseo in #22581
Sync preprocesses before loading the processor at run_speech_recognition_ctc.py by @mpenagar in #21926
Fix a typo in one of the BLIP pretrained checkpoint names by @Rocketknight1 in #22588
Adding support for BPE merge creation from scores instead of ids. by @Narsil in #22582
Use native TF checkpoints for the BLIP TF tests by @Rocketknight1 in #22593
feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart by @kaustubh-s1 in #22591
Adding Llama FastTokenizer support. by @Narsil in #22264
Revert error back into warning for byte fallback conversion. by @Narsil in #22607
Seq2SeqTrainer: use unwrapped model to retrieve the generation config by @gante in #22584
Make tiny model creation + pipeline testing more robust by @ydshieh in #22500
docs: Fix broken link to generation strategies by @connor-henderson in #22623
update_pip_test_mapping by @ydshieh in #22606
A script to add/update pipeline_model_mapping systematically by @ydshieh in #22180
[bnb] 8bit models should not be converted to DDP by @younesbelkada in #22628
LlamaTokenizerFast Fix (.., from_slow=True). by @Narsil in #22630
[Blip] Fix slow tests and doctests with correct values by @younesbelkada in #22632
Update tiny model summary file for recent models by @ydshieh in #22637
fix FSDP version related issues by @pacman100 in #22489
🌐[i18n-KO] Translate autoclass_tutorial to Korean and Fix the typo of quicktour by @gabrielwithappy in #22533
Move labels to the same device as logits for LlamaForSequenceClassification and Blip2 by @xssChauhan in #22596
Fix typo by @Ronalmoo in #22650
Fix MegaModel CI by @ydshieh in #22652
🌐 [i18n-KO] Translated pipeline_tutorial.mdx to Korean by @wonhyeongseo in #22508
Small nit, by @ArthurZucker in #22653
[tokenization] do not push special file by @ArthurZucker in #22657
[OPT] Fix default attention mask size by @ArthurZucker in #22649
Generate: add API warning to streamers by @gante in #22659
Revert migration of setup to pyproject.toml by @sgugger in #22658
moved labels to the same device as logits for BLOOM, GPT Neo, GPT NeoX, RoBERTa and VIT models by @iamarunbrahma in #22663
Model parallelism: Moving labels to the same device as logits for BridgeTower models by @shahad-mahmud in #22676
(feat): Moving labels to same device as logits for Deit by @xssChauhan in #22679
Make dynamic code work with offline mode by @sgugger in #22661
Fix quantization docs typo by @python273 in #22666
use func to check can_generate by @xin3he in #22643
add GPTNeoXForSequenceClassification by @Asugawara in #22671
Model parallelism: Moving labels to same devices as the logits are by @shahad-mahmud in #22691
Update some MarkupLM tests' expected values by @ydshieh in #22667
Make it easier to develop without a dev install by @sgugger in #22697
Enable naive Pipeline Parallelism training for Gpt neox japanese and san japanese by @mayankagarwals in #22702
Clarify stride option by @luccailliau in #22684
Remove 2 failing ONNX conversion tests by @ydshieh in #22660
Replace -100s in predictions by the pad token by @sgugger in #22693
Fix decorator order by @ydshieh in #22708
Update input values for docstring by @amyeroberts in #22631
remove wrong doc in readme by @ArthurZucker in #22723
Added parallel device usage for GPT-J by @jprivera44 in #22713
add model resources for CPMAnt (new) by @pioliverse in #20906
Modify pipeline_tutorial.mdx by @ARKA1112 in #22726
[tests] switch to torchrun by @stas00 in #22712
torch.distributed group initialization for torch_neuron disabled when optimum-neuron is installed by @michaelbenayoun in #22728
add fast support and option by @ArthurZucker in #22724
Update warning levels by @NielsRogge in #22727
Fix docstrings for TF BLIP by @Rocketknight1 in #22618
[Doctest] Add configuration_m2m_100.py by @elabongaatuo in #22733
[Doctest] Add configuration_mvp.py by @elabongaatuo in #22735
Indexing fix for gpt_bigcode by @jlamypoirier in #22737
Make vilt, switch_transformers compatible with model parallelism by @Xrenya in #22703
[Pix2struct] Simplify generation by @NielsRogge in #22527

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@zphang
- LLaMA Implementation (#21955)
@Seb0
- fix(docs): fix task guide links in model docs (#22226)
@mnaylor5
- Add Mega: Moving Average Equipped Gated Attention (#21766)
@Shubhamai
- Resnet flax (#21472)
- Flax Regnet (#21867)
@wonhyeongseo
- docs: ko: complete _toctree.yml (#22581)
- 🌐 [i18n-KO] Translated pipeline_tutorial.mdx to Korean (#22508)
@jlamypoirier
- Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575)
- Indexing fix for gpt_bigcode (#22737)
@pioliverse
- add model resources for CPMAnt (new) (#20906)

huggingface/transformers

huggingface/transformers

v4.28.0

v4.28.0

LLaMA

Pix2Struct, MatCha, DePlot

Mega

GPTBigCode

NLLB-MoE

Serializing 8bit models

Breaking Changes

Ordering of height and width for the BLIP image processor

Prefix tokens for the NLLB tokenizer

TensorFlow ports

Export TF Generate with a TF tokenizer

Task guides

Bugfixes and improvements

Significant community contributions

LLaMA

Pix2Struct, MatCha, DePlot

Mega

GPTBigCode

NLLB-MoE

Serializing 8bit models

Breaking Changes

Ordering of height and width for the BLIP image processor

Prefix tokens for the NLLB tokenizer

TensorFlow ports

Export TF Generate with a TF tokenizer

Task guides

Bugfixes and improvements

Significant community contributions