cuda 多进程

2025年6月4日

9:52

关于同时用kaggle的两块T4来进行大模型推理：

cuda多进程，必须指定 mp.set_start_method('spawn') ，否则会报cuda初始化错误。但是指定了 mp.set_start_method('spawn') 之后，由于Jupyter notebook环境中__main__模块的特殊性，子进程无法导入__main__模块，开多进程会报错。只能将代码保存为py文件，然后执行该py文件。

from IPython.core.magic import register_cell_magic

@register_cell_magic

def save2file(line, cell):

'save python code block to a file'

with open(line, 'wt') as fd:

fd.write(cell)

%%save2file tmp.py

import time

import kagglehub

import torch

from tqdm import tqdm

import multiprocessing as mp

from diffusers import StableDiffusionPipeline

def generate_n_images(pipe,prompt,n):

for i in tqdm(range(n)):

image = pipe(prompt).images[0]

pass

if __name__ == "__main__":

mp.set_start_method('spawn') # 必须在所有代码之前设置

sd_model_path = kagglehub.model_download('stabilityai/stable-diffusion-v2/PyTorch/1/1')

pipe1 = StableDiffusionPipeline.from_pretrained(sd_model_path)

pipe1.to('cuda:0')

pipe2 = StableDiffusionPipeline.from_pretrained(sd_model_path)

pipe2.to('cuda:1')

prompt = "a lighthouse overlooking the ocean"

time1 = time.time()

# 定義線程

p_list = []

p1 = mp.Process(target=generate_n_images, args=(pipe1, prompt, 2))

p_list.append(p1)

p2 = mp.Process(target=generate_n_images, args=(pipe2, prompt, 2))

p_list.append(p2)

# 開始工作

for p in p_list:

p.start()

# 調整多程順序

for p in p_list:

p.join()

time2 = time.time()

print("计算用时：", time2 - time1)

还有个很优雅的做法，在draw with llm比赛中，第5名的代码中通过

os.system('CUDA_VISIBLE_DEVICES=0 CONFIG="0" CPU_ONLY="1" USE_VQA="0" PORT="8000" AEST_SVG_PATH="svg_new_0.709.svg" python /tmp/server_cpu_gpu.py &')

开了一个服务端进程，然后

sdxl_proc = mp.Process(target=api_worker_sdxl, args=(f"{self.sdxl_url}/generate_sd_image", sdxl_queue, vtracer_queue, gpu_queue, 1))

这样开子进程时就不涉及到cuda了，只需要负责和服务端口通信就行了。不像是这种：p1 = mp.Process(target=generate_n_images, args=(pipe1, prompt, 2))在process的参数中传递了pipe，跟cuda有关的参数。