release notes
release notes
Published 4/22/2026
MinorContains breaking changesOpenAI Privacy Filter is a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text. It is intended for high-throughput data sanitization workflows where teams need a model that they can run on-premises that is fast, context-aware, and tunable. The model labels an input sequence in a single forward pass, then decodes coherent spans with a constrained Viterbi procedure, predicting probability distributions over 8 privacy-related output categories for each input token.
Links: Documentation
Privacy Filter] Add model (#45580) by @vasqu in #45580Qianfan-OCR is a 4B-parameter end-to-end document intelligence model developed by Baidu that performs direct image-to-text conversion without traditional multi-stage OCR pipelines. It supports a broad range of prompt-driven tasks including structured document parsing, table extraction, chart understanding, document question answering, and key information extraction all within one unified model. The model features a unique "Layout-as-Thought" capability that generates structured layout representations before producing final outputs, making it particularly effective for complex documents with mixed element types.
Links: Documentation | Paper
SAM3-LiteText is a lightweight variant of SAM3 that replaces the heavy SAM3 text encoder (353M parameters) with a compact MobileCLIP-based text encoder optimized through knowledge distillation, while keeping the SAM3 ViT-H image encoder intact. This reduces text encoder parameters by up to 88% while maintaining segmentation performance comparable to the original model. The model enables efficient vision-language segmentation by addressing the redundancy found in text prompting for segmentation tasks.
Links: Documentation | Paper
SLANet and SLANet_plus are lightweight models designed for table structure recognition, focusing on accurately recognizing table structures in documents and natural scenes. The model improves accuracy and inference speed by adopting a CPU-friendly lightweight backbone network PP-LCNet, a high-low-level feature fusion module CSP-PAN, and a feature decoding module SLA Head that aligns structural and positional information. SLANet was developed by Baidu PaddlePaddle Vision Team as part of their table structure recognition solutions.
Links: Documentation
The internal rotary_fn is no longer registered as a hidden kernel function, so any code referencing self.rotary_fn(...) within an Attention module will break and must be updated to call the function directly instead.
The transformers serve command received several enhancements, including a new /v1/completions endpoint for legacy text completion, multimodal support for audio and video inputs, improved tool-calling via parse_response, proper forwarding of tool_calls/tool_call_id fields, a 400 error on model mismatch when the server is pinned to a specific model, and fixes for the response API. Documentation was also updated to cover new serving options such as --compile and --model-timeout.
transformers serve (#44558) by @rain-1 in [#44558]transformers serve is pinned (#45443) by @qgallouedec in [#45443]parse_response (#45485) by @SunMarc in [#45485]tool_calls/tool_call_id in processor inputs (#45418) by @qgallouedec in [#45418]Several vision-related bug fixes were applied in this release, including correcting Qwen2.5-VL temporal RoPE scaling for still images, fixing missing/mismatched image processor backends for Emu3 and BLIP, resolving modular image processor class duplication, and preventing accelerate from incorrectly splitting vision encoders in PeVideo/PeAudioVideo models. Image loading performance was also improved by leveraging torchvision's native decode_image in the torchvision backend, yielding up to ~17% speedup over PIL-based loading.
decode_image to load images in the torchvision backend (#45195) by @yonigozlan in [#45195]Fixed several bugs affecting distributed training, including silently wrong results or NaN loss with Expert Parallelism, NaN weights on non-rank-0 FSDP processes, and a resize failure in PP-DocLayoutV3; additionally added support for loading adapters with Tensor Parallelism, added MoE to the Gemma4 TP plan, and published documentation for TP training.
Fixed a docstring typo in streamer classes, resolved a Kimi-K2.5 tokenizer regression and _patch_mistral_regex AttributeError, and patched a streaming generation crash for Qwen3VLProcessor caused by incorrect _tokenizer attribute access. Additional housekeeping included moving the GPT-SW3 instruct tokenizer to an internal testing repo and fixing a global state leak in the tokenizer registry during tests.
Tokenizers] Move gpt sw3 tokenizer out (#45404) by @vasqu in [#45404]test_processors (#45318) by @tarekziade in [#45318]Cache handling was improved for Gemma4 and Gemma3n models by dissociating KV state sharing from the Cache class, ensuring KV states are always shared regardless of whether a Cache is used. Additionally, the image cache for Paddle models was updated to align with the latest API.
Audio models gained vLLM compatibility through targeted fixes across several model implementations, while reliability improvements were also made including exponential back-off retries for audio file downloads, a crash fix in the text-to-speech pipeline when generation configs contain None values, and corrected test failures for Kyutai Speech-To-Text.
text-to-speech pipeline crash when generation config contains None values (#45107) by @jiqing-feng in [#45107]Privacy Filter] Add model (#45580) by @vasqu in [#45580]pass (inherits from DSV3 MoE) (#45572) by @casinca in [#45572]DeepseekV3MoE and remote official implementation (#45441) by @casinca in [#45441]prepare_decoder_input_ids_from_labels (#45516) by @Tokarak in [#45516]TextToAudioPipeline missing <bos> token (#45525) by @jiqing-feng in [#45525]Conversion Mapping] Small fixups (#45483) by @vasqu in [#45483]get_image_size method (#45461) by @JiauZhang in [#45461]fix] Always early return for non-Mistral models in _patch_mistral_regex (#45444) by @tomaarsen in [#45444]fix] Make Qwen2_5OmniProcessor warning a lot less noisy via warning_once (#45455) by @tomaarsen in [#45455]step3_vl to MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS (#45449) by @hmellor in [#45449]fix] PEFT integration fixes preventing save/load & integration (#45428) by @tomaarsen in [#45428]apply_chat_template crash on tool_call messages without content (#45348) by @qgallouedec in [#45348]trackio integration to use Buckets and "freeze" Space after training (#45329) by @abidlabs in [#45329]cohere_asr: fix device issue for test_model_parallel_beam_search (#45214) by @kaixuanliu in [#45214][transformers] prefix in non-verbose mode (#45316) by @zucchini-nlp in [#45316]pi0 model (#45011) by @kaixuanliu in [#45011]grouped_mm (#45001) by @Sai-Suraj-27 in [#45001]Wav2Vec2Config.vocab_size type to allow None (#45108) by @jiqing-feng in [#45108]hasattr(torch.backends.cudnn, "conv") to conftest.py (#45263) by @ydshieh in [#45263]SmolVLM video processor resize using wrong interpolation after backend refactor (#45258) by @ydshieh in [#45258]Qwen2IntegrationTest (#45268) by @ydshieh in [#45268]torch.backends.cudnn.conv.fp32_precision explicitly. (#45248) by @ydshieh in [#45248]torch 2.11 (#45243) by @ydshieh in [#45243]get_test_info.py (related to tiny model creation) (#45238) by @ydshieh in [#45238]test_register_result_handler (#45188) by @SunMarc in [#45188]The following contributors have made significant changes to the library over the last release:
transformers serve (#44558)test_processors (#45318)hasattr(torch.backends.cudnn, "conv") to conftest.py (#45263)SmolVLM video processor resize using wrong interpolation after backend refactor (#45258)Qwen2IntegrationTest (#45268)torch.backends.cudnn.conv.fp32_precision explicitly. (#45248)torch 2.11 (#45243)get_test_info.py (related to tiny model creation) (#45238)release notes
Published 4/22/2026
MinorContains breaking changesOpenAI Privacy Filter is a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text. It is intended for high-throughput data sanitization workflows where teams need a model that they can run on-premises that is fast, context-aware, and tunable. The model labels an input sequence in a single forward pass, then decodes coherent spans with a constrained Viterbi procedure, predicting probability distributions over 8 privacy-related output categories for each input token.
Links: Documentation
Privacy Filter] Add model (#45580) by @vasqu in #45580Qianfan-OCR is a 4B-parameter end-to-end document intelligence model developed by Baidu that performs direct image-to-text conversion without traditional multi-stage OCR pipelines. It supports a broad range of prompt-driven tasks including structured document parsing, table extraction, chart understanding, document question answering, and key information extraction all within one unified model. The model features a unique "Layout-as-Thought" capability that generates structured layout representations before producing final outputs, making it particularly effective for complex documents with mixed element types.
Links: Documentation | Paper
SAM3-LiteText is a lightweight variant of SAM3 that replaces the heavy SAM3 text encoder (353M parameters) with a compact MobileCLIP-based text encoder optimized through knowledge distillation, while keeping the SAM3 ViT-H image encoder intact. This reduces text encoder parameters by up to 88% while maintaining segmentation performance comparable to the original model. The model enables efficient vision-language segmentation by addressing the redundancy found in text prompting for segmentation tasks.
Links: Documentation | Paper
SLANet and SLANet_plus are lightweight models designed for table structure recognition, focusing on accurately recognizing table structures in documents and natural scenes. The model improves accuracy and inference speed by adopting a CPU-friendly lightweight backbone network PP-LCNet, a high-low-level feature fusion module CSP-PAN, and a feature decoding module SLA Head that aligns structural and positional information. SLANet was developed by Baidu PaddlePaddle Vision Team as part of their table structure recognition solutions.
Links: Documentation
The internal rotary_fn is no longer registered as a hidden kernel function, so any code referencing self.rotary_fn(...) within an Attention module will break and must be updated to call the function directly instead.
The transformers serve command received several enhancements, including a new /v1/completions endpoint for legacy text completion, multimodal support for audio and video inputs, improved tool-calling via parse_response, proper forwarding of tool_calls/tool_call_id fields, a 400 error on model mismatch when the server is pinned to a specific model, and fixes for the response API. Documentation was also updated to cover new serving options such as --compile and --model-timeout.
transformers serve (#44558) by @rain-1 in [#44558]transformers serve is pinned (#45443) by @qgallouedec in [#45443]parse_response (#45485) by @SunMarc in [#45485]tool_calls/tool_call_id in processor inputs (#45418) by @qgallouedec in [#45418]Several vision-related bug fixes were applied in this release, including correcting Qwen2.5-VL temporal RoPE scaling for still images, fixing missing/mismatched image processor backends for Emu3 and BLIP, resolving modular image processor class duplication, and preventing accelerate from incorrectly splitting vision encoders in PeVideo/PeAudioVideo models. Image loading performance was also improved by leveraging torchvision's native decode_image in the torchvision backend, yielding up to ~17% speedup over PIL-based loading.
decode_image to load images in the torchvision backend (#45195) by @yonigozlan in [#45195]Fixed several bugs affecting distributed training, including silently wrong results or NaN loss with Expert Parallelism, NaN weights on non-rank-0 FSDP processes, and a resize failure in PP-DocLayoutV3; additionally added support for loading adapters with Tensor Parallelism, added MoE to the Gemma4 TP plan, and published documentation for TP training.
Fixed a docstring typo in streamer classes, resolved a Kimi-K2.5 tokenizer regression and _patch_mistral_regex AttributeError, and patched a streaming generation crash for Qwen3VLProcessor caused by incorrect _tokenizer attribute access. Additional housekeeping included moving the GPT-SW3 instruct tokenizer to an internal testing repo and fixing a global state leak in the tokenizer registry during tests.
Tokenizers] Move gpt sw3 tokenizer out (#45404) by @vasqu in [#45404]test_processors (#45318) by @tarekziade in [#45318]Cache handling was improved for Gemma4 and Gemma3n models by dissociating KV state sharing from the Cache class, ensuring KV states are always shared regardless of whether a Cache is used. Additionally, the image cache for Paddle models was updated to align with the latest API.
Audio models gained vLLM compatibility through targeted fixes across several model implementations, while reliability improvements were also made including exponential back-off retries for audio file downloads, a crash fix in the text-to-speech pipeline when generation configs contain None values, and corrected test failures for Kyutai Speech-To-Text.
text-to-speech pipeline crash when generation config contains None values (#45107) by @jiqing-feng in [#45107]Privacy Filter] Add model (#45580) by @vasqu in [#45580]pass (inherits from DSV3 MoE) (#45572) by @casinca in [#45572]DeepseekV3MoE and remote official implementation (#45441) by @casinca in [#45441]prepare_decoder_input_ids_from_labels (#45516) by @Tokarak in [#45516]TextToAudioPipeline missing <bos> token (#45525) by @jiqing-feng in [#45525]Conversion Mapping] Small fixups (#45483) by @vasqu in [#45483]get_image_size method (#45461) by @JiauZhang in [#45461]fix] Always early return for non-Mistral models in _patch_mistral_regex (#45444) by @tomaarsen in [#45444]fix] Make Qwen2_5OmniProcessor warning a lot less noisy via warning_once (#45455) by @tomaarsen in [#45455]step3_vl to MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS (#45449) by @hmellor in [#45449]fix] PEFT integration fixes preventing save/load & integration (#45428) by @tomaarsen in [#45428]apply_chat_template crash on tool_call messages without content (#45348) by @qgallouedec in [#45348]trackio integration to use Buckets and "freeze" Space after training (#45329) by @abidlabs in [#45329]cohere_asr: fix device issue for test_model_parallel_beam_search (#45214) by @kaixuanliu in [#45214][transformers] prefix in non-verbose mode (#45316) by @zucchini-nlp in [#45316]pi0 model (#45011) by @kaixuanliu in [#45011]grouped_mm (#45001) by @Sai-Suraj-27 in [#45001]Wav2Vec2Config.vocab_size type to allow None (#45108) by @jiqing-feng in [#45108]hasattr(torch.backends.cudnn, "conv") to conftest.py (#45263) by @ydshieh in [#45263]SmolVLM video processor resize using wrong interpolation after backend refactor (#45258) by @ydshieh in [#45258]Qwen2IntegrationTest (#45268) by @ydshieh in [#45268]torch.backends.cudnn.conv.fp32_precision explicitly. (#45248) by @ydshieh in [#45248]torch 2.11 (#45243) by @ydshieh in [#45243]get_test_info.py (related to tiny model creation) (#45238) by @ydshieh in [#45238]test_register_result_handler (#45188) by @SunMarc in [#45188]The following contributors have made significant changes to the library over the last release:
transformers serve (#44558)test_processors (#45318)hasattr(torch.backends.cudnn, "conv") to conftest.py (#45263)SmolVLM video processor resize using wrong interpolation after backend refactor (#45258)Qwen2IntegrationTest (#45268)torch.backends.cudnn.conv.fp32_precision explicitly. (#45248)torch 2.11 (#45243)get_test_info.py (related to tiny model creation) (#45238)🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.