release notes
release notes
Published 5/8/2025
Pre-ReleasePre-releaseA new model is added to transformers: BitNet
It is added on top of the v4.51.3 release, and can be installed from the following tag: v4.51.3-BitNet-preview.
In order to install this version, please install with the following command:
pip install git+https://github.com/huggingface/transformers@v4.51.3-BitNet-preview
If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving.
As the tag implies, this tag is a preview of the BitNet model. This tag is a tagged version of the main branch and does not follow semantic versioning. This model will be included in the next minor release: v4.52.0.
Trained on a corpus of 4 trillion tokens, this model demonstrates that native 1-bit LLMs can achieve performance comparable to leading open-weight, full-precision models of similar size, while offering substantial advantages in computational efficiency (memory, energy, latency).
BitNet can be found on the Huggingface Hub.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "microsoft/bitnet-b1.58-2B-4T"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16
)
# Apply the chat template
messages = [
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "How are you?"},
]
chat_input = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
# Generate response
chat_outputs = model.generate(chat_input, max_new_tokens=50)
response = tokenizer.decode(chat_outputs[0][chat_input.shape[-1]:], skip_special_tokens=True) # Decode only the response part
print("\nAssistant Response:", response)
release notes
Published 5/8/2025
Pre-ReleasePre-releaseA new model is added to transformers: BitNet
It is added on top of the v4.51.3 release, and can be installed from the following tag: v4.51.3-BitNet-preview.
In order to install this version, please install with the following command:
pip install git+https://github.com/huggingface/transformers@v4.51.3-BitNet-preview
If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving.
As the tag implies, this tag is a preview of the BitNet model. This tag is a tagged version of the main branch and does not follow semantic versioning. This model will be included in the next minor release: v4.52.0.
Trained on a corpus of 4 trillion tokens, this model demonstrates that native 1-bit LLMs can achieve performance comparable to leading open-weight, full-precision models of similar size, while offering substantial advantages in computational efficiency (memory, energy, latency).
BitNet can be found on the Huggingface Hub.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "microsoft/bitnet-b1.58-2B-4T"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16
)
# Apply the chat template
messages = [
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "How are you?"},
]
chat_input = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
# Generate response
chat_outputs = model.generate(chat_input, max_new_tokens=50)
response = tokenizer.decode(chat_outputs[0][chat_input.shape[-1]:], skip_special_tokens=True) # Decode only the response part
print("\nAssistant Response:", response)
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.