量化_1

202566

10:28

量化通常用于推理,训练时建议全精度(FP32/BF16)+梯度裁剪/缩放。

但在实际训练的过程中,由于只有16G显存,如果不量化模型的话,batch_size只有设置很小的数如1,导致train_loss不稳定。

不建议同时使用4-bit量化和BF16训练。两者组合极可能导致错误或训练失败。优先选择全精度训练,或仅对推理模型量化。

 

如果用SentenceTransformer训练,模型是stella_en_1.5B,量化参数如下,当mine_hard_negatives时发现positive的相似度是nan,而且train_loss0

Metric       Positive       Negative     Difference
Count           6,775         13,550              
Mean              nan         0.4492            nan
Median            nan         0.4470            nan
Std               nan         0.0342            nan
Min               nan         0.3391            nan
25%               nan         0.4241            nan
50%               nan         0.4470            nan
75%               nan         0.4739            nan
Max               nan         0.5625            nan

 

       bnb_config = BitsAndBytesConfig(

                        load_in_4bit=True,

                        bnb_4bit_quant_type="nf4",

                        bnb_4bit_compute_dtype=torch.bfloat16,

                        bnb_4bit_use_double_quant=True,

                        )

        model = SentenceTransformer(

                    params.model_name,

                    trust_remote_code=True,

                    model_kwargs={

                        "quantization_config": bnb_config,

                        # "device_map": "auto",

                    },

                )

查找到reddit上的相关问题:

-weird, what model is this? Some models use special methods that cannot just be    quanted. But what model is it?

-Stella 1.5 billion。It’s an embedding model.

-GGUF only really works for chat models, not for prediction models like this. Try quanting BERT and the same thing will happen.

可能是stella模型的问题,这种模型不能量化,不是chat_models

 

 

 

 

已使用 OneNote 创建。