v4.51.3-BitNet-preview

release notes

Published 5/8/2025

Pre-ReleasePre-release

Release notes

A new model is added to transformers: BitNet It is added on top of the v4.51.3 release, and can be installed from the following tag: v4.51.3-BitNet-preview.

In order to install this version, please install with the following command:

pip install git+https://github.com/huggingface/transformers@v4.51.3-BitNet-preview

If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving.

As the tag implies, this tag is a preview of the BitNet model. This tag is a tagged version of the main branch and does not follow semantic versioning. This model will be included in the next minor release: v4.52.0.

BitNet

Trained on a corpus of 4 trillion tokens, this model demonstrates that native 1-bit LLMs can achieve performance comparable to leading open-weight, full-precision models of similar size, while offering substantial advantages in computational efficiency (memory, energy, latency).

Usage example

BitNet can be found on the Huggingface Hub.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "microsoft/bitnet-b1.58-2B-4T"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16
)

# Apply the chat template
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "How are you?"},
]
chat_input = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)

# Generate response
chat_outputs = model.generate(chat_input, max_new_tokens=50)
response = tokenizer.decode(chat_outputs[0][chat_input.shape[-1]:], skip_special_tokens=True) # Decode only the response part
print("\nAssistant Response:", response)

v4.51.3-BitNet-preview

release notes

Published 5/8/2025

Pre-ReleasePre-release

Release notes

A new model is added to transformers: BitNet It is added on top of the v4.51.3 release, and can be installed from the following tag: v4.51.3-BitNet-preview.

In order to install this version, please install with the following command:

pip install git+https://github.com/huggingface/transformers@v4.51.3-BitNet-preview

If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving.

BitNet

Usage example

BitNet can be found on the Huggingface Hub.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "microsoft/bitnet-b1.58-2B-4T"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16
)

# Apply the chat template
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "How are you?"},
]
chat_input = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)

# Generate response
chat_outputs = model.generate(chat_input, max_new_tokens=50)
response = tokenizer.decode(chat_outputs[0][chat_input.shape[-1]:], skip_special_tokens=True) # Decode only the response part
print("\nAssistant Response:", response)

Latest release

Version v5.9.0is out. See relase notes.

v4.51.3-BitNet-preview

release notes

Published 5/8/2025

Pre-ReleasePre-release

Release notes

A new model is added to transformers: BitNet It is added on top of the v4.51.3 release, and can be installed from the following tag: v4.51.3-BitNet-preview.

In order to install this version, please install with the following command:

pip install git+https://github.com/huggingface/transformers@v4.51.3-BitNet-preview

If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving.

BitNet

Usage example

BitNet can be found on the Huggingface Hub.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "microsoft/bitnet-b1.58-2B-4T"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16
)

# Apply the chat template
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "How are you?"},
]
chat_input = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)

# Generate response
chat_outputs = model.generate(chat_input, max_new_tokens=50)
response = tokenizer.decode(chat_outputs[0][chat_input.shape[-1]:], skip_special_tokens=True) # Decode only the response part
print("\nAssistant Response:", response)