[DO NOT MERGE][llm][ci] Test vllm's CUDA_VISIBLE_DEVICES fix#64189
[DO NOT MERGE][llm][ci] Test vllm's CUDA_VISIBLE_DEVICES fix#64189jeffreywang88 wants to merge 1 commit into
CUDA_VISIBLE_DEVICES fix#64189Conversation
… release tests Clone the cvd-fix branch (jeffreywang88/vllm) -- vllm-project/vllm#45026's net diff cherry-picked onto releases/v0.23.0 -- and overlay its 21 runtime vllm/*.py files onto the installed vllm 0.23.0 wheel, in both docker/ray-llm/Dockerfile and ci/docker/llm.build.Dockerfile. PR #45026 ("stop setting CUDA_VISIBLE_DEVICES internally, add --device-ids") supersedes the vllm-cuda-visible-devices-patch (vLLM #44466), which took the opposite approach; the two cannot coexist, so the patch step, its *.wanda.yaml srcs entries, and the patch file are removed. The PR's sm100_cutlass_mla_kernel.cu change is omitted (needs a wheel recompile and only affects SM100/MLA). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request replaces the old vllm-cuda-visible-devices-patch by cloning a specific branch of vllm (cvd-fix) and overlaying the modified Python files directly onto the installed vllm package in both ci/docker/llm.build.Dockerfile and docker/ray-llm/Dockerfile. The review feedback recommends adding mkdir -p before copying these files to ensure that any missing target subdirectories are created, thereby preventing potential build failures.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| v1/executor/ray_utils.py \ | ||
| v1/worker/gpu_worker.py \ | ||
| v1/worker/worker_base.py; do | ||
| cp "/tmp/vllm-cvd-overlay/vllm/${f}" "${VLLM_SITE}/${f}" |
There was a problem hiding this comment.
If any of the target subdirectories (such as v1/engine or distributed/kv_transfer/...) do not exist in the installed vllm package, the cp command will fail with a No such file or directory error. Creating the destination directory structure using mkdir -p before copying avoids this potential build failure.
mkdir -p "$(dirname "${VLLM_SITE}/${f}")"
cp "/tmp/vllm-cvd-overlay/vllm/${f}" "${VLLM_SITE}/${f}"
| v1/executor/ray_utils.py \ | ||
| v1/worker/gpu_worker.py \ | ||
| v1/worker/worker_base.py; do | ||
| cp "/tmp/vllm-cvd-overlay/vllm/${f}" "${VLLM_SITE}/${f}" |
There was a problem hiding this comment.
If any of the target subdirectories (such as v1/engine or distributed/kv_transfer/...) do not exist in the installed vllm package, the cp command will fail with a No such file or directory error. Creating the destination directory structure using mkdir -p before copying avoids this potential build failure.
mkdir -p "$(dirname "${VLLM_SITE}/${f}")"
cp "/tmp/vllm-cvd-overlay/vllm/${f}" "${VLLM_SITE}/${f}"
|
Release test with latest fix ( |
TEST ONLY; DO NOT MERGE
Description
Related issues
Additional information