量化_1
2025年6月6日
10:28
量化通常用于推理,训练时建议全精度(FP32/BF16)+梯度裁剪/缩放。
但在实际训练的过程中,由于只有16G显存,如果不量化模型的话,batch_size只有设置很小的数如1,导致train_loss不稳定。
不建议同时使用4-bit量化和BF16训练。两者组合极可能导致错误或训练失败。优先选择全精度训练,或仅对推理模型量化。
如果用SentenceTransformer训练,模型是stella_en_1.5B,量化参数如下,当mine_hard_negatives时发现positive的相似度是nan,而且train_loss是0。
Metric Positive Negative Difference
Count 6,775 13,550
Mean nan 0.4492 nan
Median nan 0.4470 nan
Std nan 0.0342 nan
Min nan 0.3391 nan
25% nan 0.4241 nan
50% nan 0.4470 nan
75% nan 0.4739 nan
Max nan 0.5625 nan
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
model = SentenceTransformer(
params.model_name,
trust_remote_code=True,
model_kwargs={
"quantization_config": bnb_config,
# "device_map": "auto",
},
)
查找到reddit上的相关问题:
-weird, what model is this? Some models use special methods that cannot just be quanted. But what model is it?
-Stella 1.5 billion。It’s an embedding model.
-GGUF only really works for chat models, not for prediction models like this. Try quanting BERT and the same thing will happen.
可能是stella模型的问题,这种模型不能量化,不是chat_models。
已使用 OneNote 创建。