遇到的问题

2025年5月11日

12:06

1. import vllm时报错：

ImportError: /workspace/vllm-abo/vllm/_C.abi3.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSsb

总结:因为vllm和torch版本不兼容，pip install vllm即可

解决方法：

torch-cu 版本不对，卸载原有的torch，安装其他版本的torch-cu

kaggle环境下原来的版本是torch2.5.1-cu124，先卸载，再安装torch2.5.1-cu121

!pip uninstall -y torch torchvision torchaudio

!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

!pip show torch

然后再安装其他库

!pip install svgwrite

!pip install cairosvg bitsandbytes

!pip install git+https://github.com/openai/CLIP.git

!pip install -q opencv-python scikit-image pillow

!pip install vtracer

!pip install google_re2

!pip install vllm

未成功的方法：如果先安装其他库，再卸载并安装torch,最后再安装 vllm，还是会报错。必须先卸载并安装torch，可能因为torch是其他库的基础？

!pip install svgwrite

!pip install cairosvg bitsandbytes

!pip install git+https://github.com/openai/CLIP.git

!pip install -q opencv-python scikit-image pillow

!pip install vtracer

!pip install google_re2

!pip uninstall -y torch torchvision torchaudio

!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

!pip show torch

!pip install vllm

这样不行。

总结：torch-cu版本不对，更换合适的torch版本。原因可能是vllm用的torch版本和vllm依赖的库用的torch版本不一致。

更新：import vllm的时候会卸载掉旧版本的依赖，例如uninstall torch 2.5.1再安装新版本的依赖 install torch 2.6.0，所以!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 这个操作是无效的。

报错原因：

参考：https://github.com/vllm-project/vllm/issues/13608

I had the same problem on a GH200 cluster, I think it is caused by a mismatch of torch versions used by some libraries. during their installation. To avoid this, try to make whatever library you are installing to use the existing torch version already installed (e.g., with FLASH_ATTENTION_SKIP_CUDA_BUILD=1, --no-build-isolation). For example, the following is how I was able to finally run VLLM without this issue:

pip3 install packaging
pip3 install wheel

# install torch if you haven't yet
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip3 install accelerate bitsandbytes datasets evaluate huggingface-hub peft tokenizers transformers trl
pip3 install ray[default,client]

pip3 install triton
pip3 install nvidia-cutlass

# flash attention, using existing torch
NVCC_APPEND_FLAGS='-allow-unsupported-compiler' MAX_JOBS=4 FLASH_ATTENTION_SKIP_CUDA_BUILD=1 pip3 install flash-attn --no-build-isolation --use-pep517

# VLLM, using existing torch
git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout ed6e9075d31e32c8548b480a47d1ffb77da1f54c
python3 use_existing_torch.py
pip3 install -r requirements-build.txt
pip3 install --editable . --no-build-isolation
cd ../

# I had some isssues with these libraries, maybe you can skip right away to xformers installation
pip3 uninstall -y $(pip3 list --format=freeze | grep opencv)
pip3 install opencv-python-headless
pip3 uninstall -y pynvml
pip3 install nvidia-ml-py
pip3 install compressed-tensors

# xformers, using existing torch
git clone https://github.com/facebookresearch/xformers.git
cd xformers
git submodule update --init --recursive
MAX_JOBS=4 pip3 install --use-pep517 . --no-build-isolation
cd../

I guess the problem is that vLLM make use of some compiled library from its dependency that doesn't work with the latest pytorch. What I did to resolve this

install latest pytorch
download and compile from source these libraries: xformers, flashinfer, flash-attention
recompile vLLM with full build

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
pip install nvidia-cutlass

git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
python setup.py install

git clone https://github.com/flashinfer-ai/flashinfer.git --recursive
cd flashinfer
FLASHINFER_ENABLE_AOT=1 pip install -e . -v --no-deps --no-build-isolati. .

git clone https://github.com/facebookresearch/xformers.git
     cd xformers
     git submodule update --init --recursive
     MAX_JOBS=4 pip3 install --use-pep517 . --no-build-isolation

cd vllm
python3 use_existing_torch.py
pip3 install -r requirements/build.txt
pip3 install --editable . --no-build-isolation

However, for me, I have performance issues with pytorch 2.8 and triton, so I switch to using Pytorch 2.7 with cuda 12.6, and everything is normal now.

2.关于clip库以及diffusion库和vllm库无法同时使用的bug★★★★★

首先是我找不到错误在哪，Notebook 在interactive 环境下可以运行，但是提交版本时运行出错，看Log看不出来，需要看output部分。

因为提交版本时默认会强制执行 kaggle_evaluation.test(package.Model) 。

报错信息：

RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module:

即vllm的 python multiprocess runtime error ，官方有解决方案：

https://docs.vllm.ai/en/stable/getting_started/troubleshooting.html

但是用if main=='__main__' 包裹后，虽然不报错了，但是是因为压根就没有执行 llm = vllm.llm()这条语句，所以不报错。

另一个notebook，也用了vllm，但是没有报错，所以我比较了两个notebook的output，发现报错的notebook的output log中出现这样的warning，而没报错的notebook中没出现，

WARNING 05-12 03:21:27 [utils.py:2382] We must use the `spawn` multiprocessing start method. Overriding VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. See https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html#python-multiprocessing for more information. Reason: CUDA is initialized

关键词就是Reason: CUDA is initialized，google后，找到github上的一个相关问题，import transformer_engine initializes CUDA https://github.com/NVIDIA/TransformerEngine/issues/872，即Import 某个库时会 initialize cuda。然后我排查到了 from diffusers import StableDiffusionPipeline, DDIMScheduler #这句话会initialize cuda，如果cuda is initialized，这会导致vllm报上面出现的错。

但是我把import stablediffusion这句话放在vllm.llm()之后还是不行，又报了新的错，但还是和vllm的多进程有关的错误。

★重点来了！！！！！！

当我再查看notebook运行的log部分时，我发现了报错的notebook的log中有下面两个log info：

INFO 05-12 08:21:08 [importing.py:53] Triton module has been replaced with a placeholder.

INFO 05-12 08:21:10 [__init__.py:239] Automatically detected platform cuda.

而没报错的notebook的log中只有一个info：

INFO 05-12 08:21:10 [__init__.py:239] Automatically detected platform cuda.

这个INFO是import vllm时产生的。我查看了notebook，二者的区别在于import vllm的位置不同，没报错的notebook是首先import vllm，再import其他包，而报错的notebook是先Import其他包，再import vllm。然后我排查到了是import clip这句话，会导致后续import vllm时，出现 [importing.py:53] Triton module has been replaced with a placeholder 这条INFO，于是我先import vllm再import clip，终于问题解决了。。。。。。。。。

总结：这个bug是关于vllm的多进程报错，其实包含了两个bug！！！！！！！

一个是diffusion库和vllm库的不兼容， from diffusers import StableDiffusionPipeline, DDIMScheduler #这句话会initialize cuda，如果cuda is initialized，会导致vllm报多进程错误。解决办法：先运行llm_model = vllm.LLM()，再import diffusers.

另一个是clip库和vllm库的不兼容，import clip这句话，会导致后续import vllm时，出现 [importing.py:53] Triton module has been replaced with a placeholder 这条INFO，进而导致vllm报多进程错误。解决办法：先import vllm，再运行import clip.

PS：但是这两个bug在交互时环境中不会出现，只有import 你的代码包，在另一个进程中运行时会报错，即kaggle_evaluation.test(Model)时会报错。

总结：vllm对torch的版本要求较高，必须与vllm匹配的特定的torch版本才最好，而且vllm的多进程更容易报错。

3.关于transformer model.generate中 num_return_sequences参数，加不加这个参数返回的都是列表，不加的话，列表只有一个元素，直接outputs[0]。加num_return_sequences参数之后，生成速度变得特别慢！！！！！！！！！！！

加num_return_sequences参数：

outputs = model.generate(

**inputs,

max_new_tokens=512,

do_sample=True,

temperature=0.8,

top_p=0.95, # Add nucleus sampling

num_return_sequences = 2,

)

outputs = model.generate(

input_ids=input_ids, max_length=40, temperature=0.7, num_return_sequences=3, do_sample=True

) # generate 3 candidates using sampling

for i in range(3): # 3 output sequences were generated

print(f"Generated {i}: {tokenizer.decode(outputs[i], skip_special_tokens=True)}")

不加num_return_sequences参数：

outputs = model.generate(max_length=40) # do greedy decoding

print(f"Generated: {tokenizer.decode(outputs[0], skip_special_tokens=True)}")