BGE

202581

20:09

BGE-M3

BGE-M3 is a compound and powerful embedding model distinguished for its versatility in:

  • Multi-Functionality: It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval.
  • Multi-Linguality: It can support more than 100 working languages.
  • Multi-Granularity: It is able to process inputs of different granularities, spanning from short sentences to long documents of up to 8192 tokens.

 

 

Dense retrieval就是正常的CLS pool,然后计算向量相似度。

 

Sparse retrieval就是取querypassage的公共token,取出公共tokenquerypassage相应位置上的embedding,然后经过线性层+relu激活层,得到weight,最后将所有weight求平方和,作为相似度分数。

 

 

ColBert一致,query的每个tokenpassage的每个token计算相似度,取最大的相似度,然后求平均。

 

 

已使用 OneNote 创建。