release notes
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
release notes
Published 2/5/2026
MinorContains breaking changesK-EXAONE is a large-scale multilingual language model developed by LG AI Research. Built using a Mixture-of-Experts architecture, K-EXAONE features 236 billion total parameters, with 23 billion active during inference. Performance evaluations across various benchmarks demonstrate that K-EXAONE excels in reasoning, agentic capabilities, general knowledge, multilingual understanding, and long-context processing.
PP-DocLayoutV3 is a unified and high-efficiency model designed for comprehensive layout analysis. It addresses the challenges of complex physical distortions—such as skewing, curving, and adverse lighting—by integrating instance segmentation and reading order prediction into a single, end-to-end framework.
Youtu-LLM is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks.
GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. The model integrates the CogViT visual encoder pre-trained on large-scale image–text data, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. Combined with a two-stage pipeline of layout analysis and parallel recognition based on PP-DocLayout-V3, GLM-OCR delivers robust and high-quality OCR performance across diverse document layouts.
🚨 T5Gemma2 model structure (#43633) - Makes sure that the attn implementation is set to all sub-configs. The config.encoder.text_config was not getting its attn set because we aren't passing it to PreTrainedModel.init. We can't change the model structure without breaking so I manually re-added a call to self.adjust_attn_implemetation in modeling code
🚨 Generation cache preparation (#43679) - Refactors cache initialization in generation to ensure sliding window configurations are now properly respected. Previously, some models (like Afmoe) created caches without passing the model config, causing sliding window limits to be ignored. This is breaking because models with sliding window attention will now enforce their window size limits during generation, which may change generation behavior or require adjusting sequence lengths in existing code.
🚨 Delete duplicate code in backbone utils (#43323) - This PR cleans up backbone utilities. Specifically, we have currently 5 different config attr to decide which backbone to load, most of which can be merged into one and seem redundant After this PR, we'll have only one config.backbone_config as a single source of truth. The models will load the backbone from_config and load pretrained weights only if the checkpoint has any weights saved. The overall idea is same as in other composite models. A few config arguments are removed as a result.
🚨 Refactor DETR to updated standards (#41549) - standardizes the DETR model to be closer to other vision models in the library.
🚨Fix floating-point precision in JanusImageProcessor resize (#43187) - replaces an int() with round(), expect light numerical differences
🚨 Remove deprecated AnnotionFormat (#42983) - removes a missnamed class in favour of AnnotationFormat.
feat] Allow loading T5Gemma2Encoder with AutoModel (#43559) by @tomaarsenimage_sizes input param (#43678) by @kaixuanliuAttn] Fixup interface usage after refactor (#43706) by @vasqunum_frames in ASR pipeline (#43546) by @jiqing-fengPreTrainedTokenizerBase (#43675) by @tarekziadeFP8Expert for DeepSeek R1 (#43616) by @yiliu30HunYuan] Fix RoPE init (#43411) by @vasquSam] Fixup training flags (#43567) by @vasquprocess_bad_commit_report.py: avoid items to appear in null author in the report (#43662) by @ydshiehKeyError in check_bad_commit.py (#43655) by @ydshiehtied_weight_keys in-place (#43619) by @zucchini-nlpRope] Revert #43410 and make inheritance implicit again (#43620) by @vasqumake_batched_video with 5D arrays (#43486) by @zucchini-nlputils/fetch_hub_objects_for_ci.py: avoid too many requests and/or timeout (#43584) by @ydshiehMistralConverter.extract_vocab_merges_from_model (#43557) by @tarekziadetemplates folder (#43536) by @CyrilvallezModular] Allow to add new bases that are not present in the inherited class (#43556) by @vasqupad_token_id (#43453) by @Sai-Suraj-27RoPE] Make explicit inheritance (#43410) by @vasquShieldGemma2IntegrationTest::test_model (#43343) by @sywangyiSamHQModelIntegrationTest::test_inference_mask_generation_batched_points_batched_images for XPU (#43511) by @sywangyisuper() (#43280) by @zucchini-nlppytest-random-order for reproducible test randomization (#43483) by @tarekziademarkuplm & perception_lm integration tests (#43464) by @Sai-Suraj-27The following contributors have made significant changes to the library over the last release:
PreTrainedTokenizerBase (#43675)MistralConverter.extract_vocab_merges_from_model (#43557)pytest-random-order for reproducible test randomization (#43483)Attn] Fixup interface usage after refactor (#43706)HunYuan] Fix RoPE init (#43411)Sam] Fixup training flags (#43567)Rope] Revert #43410 and make inheritance implicit again (#43620)Modular] Allow to add new bases that are not present in the inherited class (#43556)RoPE] Make explicit inheritance (#43410)process_bad_commit_report.py: avoid items to appear in null author in the report (#43662)KeyError in check_bad_commit.py (#43655)utils/fetch_hub_objects_for_ci.py: avoid too many requests and/or timeout (#43584)release notes
Published 2/5/2026
MinorContains breaking changesK-EXAONE is a large-scale multilingual language model developed by LG AI Research. Built using a Mixture-of-Experts architecture, K-EXAONE features 236 billion total parameters, with 23 billion active during inference. Performance evaluations across various benchmarks demonstrate that K-EXAONE excels in reasoning, agentic capabilities, general knowledge, multilingual understanding, and long-context processing.
PP-DocLayoutV3 is a unified and high-efficiency model designed for comprehensive layout analysis. It addresses the challenges of complex physical distortions—such as skewing, curving, and adverse lighting—by integrating instance segmentation and reading order prediction into a single, end-to-end framework.
Youtu-LLM is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks.
GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. The model integrates the CogViT visual encoder pre-trained on large-scale image–text data, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. Combined with a two-stage pipeline of layout analysis and parallel recognition based on PP-DocLayout-V3, GLM-OCR delivers robust and high-quality OCR performance across diverse document layouts.
🚨 T5Gemma2 model structure (#43633) - Makes sure that the attn implementation is set to all sub-configs. The config.encoder.text_config was not getting its attn set because we aren't passing it to PreTrainedModel.init. We can't change the model structure without breaking so I manually re-added a call to self.adjust_attn_implemetation in modeling code
🚨 Generation cache preparation (#43679) - Refactors cache initialization in generation to ensure sliding window configurations are now properly respected. Previously, some models (like Afmoe) created caches without passing the model config, causing sliding window limits to be ignored. This is breaking because models with sliding window attention will now enforce their window size limits during generation, which may change generation behavior or require adjusting sequence lengths in existing code.
🚨 Delete duplicate code in backbone utils (#43323) - This PR cleans up backbone utilities. Specifically, we have currently 5 different config attr to decide which backbone to load, most of which can be merged into one and seem redundant After this PR, we'll have only one config.backbone_config as a single source of truth. The models will load the backbone from_config and load pretrained weights only if the checkpoint has any weights saved. The overall idea is same as in other composite models. A few config arguments are removed as a result.
🚨 Refactor DETR to updated standards (#41549) - standardizes the DETR model to be closer to other vision models in the library.
🚨Fix floating-point precision in JanusImageProcessor resize (#43187) - replaces an int() with round(), expect light numerical differences
🚨 Remove deprecated AnnotionFormat (#42983) - removes a missnamed class in favour of AnnotationFormat.
feat] Allow loading T5Gemma2Encoder with AutoModel (#43559) by @tomaarsenimage_sizes input param (#43678) by @kaixuanliuAttn] Fixup interface usage after refactor (#43706) by @vasqunum_frames in ASR pipeline (#43546) by @jiqing-fengPreTrainedTokenizerBase (#43675) by @tarekziadeFP8Expert for DeepSeek R1 (#43616) by @yiliu30HunYuan] Fix RoPE init (#43411) by @vasquSam] Fixup training flags (#43567) by @vasquprocess_bad_commit_report.py: avoid items to appear in null author in the report (#43662) by @ydshiehKeyError in check_bad_commit.py (#43655) by @ydshiehtied_weight_keys in-place (#43619) by @zucchini-nlpRope] Revert #43410 and make inheritance implicit again (#43620) by @vasqumake_batched_video with 5D arrays (#43486) by @zucchini-nlputils/fetch_hub_objects_for_ci.py: avoid too many requests and/or timeout (#43584) by @ydshiehMistralConverter.extract_vocab_merges_from_model (#43557) by @tarekziadetemplates folder (#43536) by @CyrilvallezModular] Allow to add new bases that are not present in the inherited class (#43556) by @vasqupad_token_id (#43453) by @Sai-Suraj-27RoPE] Make explicit inheritance (#43410) by @vasquShieldGemma2IntegrationTest::test_model (#43343) by @sywangyiSamHQModelIntegrationTest::test_inference_mask_generation_batched_points_batched_images for XPU (#43511) by @sywangyisuper() (#43280) by @zucchini-nlppytest-random-order for reproducible test randomization (#43483) by @tarekziademarkuplm & perception_lm integration tests (#43464) by @Sai-Suraj-27The following contributors have made significant changes to the library over the last release:
PreTrainedTokenizerBase (#43675)MistralConverter.extract_vocab_merges_from_model (#43557)pytest-random-order for reproducible test randomization (#43483)Attn] Fixup interface usage after refactor (#43706)HunYuan] Fix RoPE init (#43411)Sam] Fixup training flags (#43567)Rope] Revert #43410 and make inheritance implicit again (#43620)Modular] Allow to add new bases that are not present in the inherited class (#43556)RoPE] Make explicit inheritance (#43410)process_bad_commit_report.py: avoid items to appear in null author in the report (#43662)KeyError in check_bad_commit.py (#43655)utils/fetch_hub_objects_for_ci.py: avoid too many requests and/or timeout (#43584)