Reminder
System Info
日志记录显示:sleep: error while loading shared libraries: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short
我的yaml文件配置如下,关键配置已用粗体标出:
apiVersion: apps/v1
kind: Deployment
metadata:
name: llamafactory
namespace: xxx
labels:
workload.user.cattle.io/workloadselector: apps.deployment-jmai-llamafactory
spec:
replicas: 1
selector:
matchLabels:
workload.user.cattle.io/workloadselector: apps.deployment-jmai-llamafactory
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # 保证零停机更新
type: RollingUpdate
template:
metadata:
labels:
xxx
spec:
schedulerName: volcano
runtimeClassName: nvidia
terminationGracePeriodSeconds: 30
containers:
- name: llamafactory
image: goharbor.jomoo.cn/llmos-ai/llamafactory:0.9.5
imagePullPolicy: IfNotPresent
command:
- llamafactory-cli
- webui
- '--host'
- '0.0.0.0'
- '--port'
- '7860'
ports:
- containerPort: 7860
name: http
protocol: TCP
env:
- name: NVIDIA_VISIBLE_DEVICES
value: all
- name: NVIDIA_DRIVER_CAPABILITIES
value: compute,utility
- name: NVIDIA_DISABLE_REQUIRE
value: 'true'
- name: LD_LIBRARY_PATH
value: >-
/usr/lib/wsl/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu
resources:
limits:
cpu: '16'
memory: 16000Mi
volcano.sh/vgpu-memory: '12288'
volcano.sh/vgpu-number: '1'
requests:
cpu: '4'
memory: 8000Mi
volcano.sh/vgpu-memory: '12288'
volcano.sh/vgpu-number: '1'
securityContext:
privileged: false
volumeMounts:
- mountPath: /dev/shm
name: dshm
volumes:
- name: dshm
emptyDir:
medium: Memory
sizeLimit: 8Gi
Reproduction
Others
No response
Reminder
System Info
日志记录显示:sleep: error while loading shared libraries: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short
我的yaml文件配置如下,关键配置已用粗体标出:
apiVersion: apps/v1
kind: Deployment
metadata:
name: llamafactory
namespace: xxx
labels:
workload.user.cattle.io/workloadselector: apps.deployment-jmai-llamafactory
spec:
replicas: 1
selector:
matchLabels:
workload.user.cattle.io/workloadselector: apps.deployment-jmai-llamafactory
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # 保证零停机更新
type: RollingUpdate
template:
metadata:
labels:
xxx
spec:
schedulerName: volcano
runtimeClassName: nvidia
terminationGracePeriodSeconds: 30
containers:
- name: llamafactory
image: goharbor.jomoo.cn/llmos-ai/llamafactory:0.9.5
imagePullPolicy: IfNotPresent
command:
- llamafactory-cli
- webui
- '--host'
- '0.0.0.0'
- '--port'
- '7860'
ports:
- containerPort: 7860
name: http
protocol: TCP
env:
- name: NVIDIA_VISIBLE_DEVICES
value: all
- name: NVIDIA_DRIVER_CAPABILITIES
value: compute,utility
- name: NVIDIA_DISABLE_REQUIRE
value: 'true'
- name: LD_LIBRARY_PATH
value: >-
/usr/lib/wsl/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu
resources:
limits:
cpu: '16'
memory: 16000Mi
volcano.sh/vgpu-memory: '12288'
volcano.sh/vgpu-number: '1'
requests:
cpu: '4'
memory: 8000Mi
volcano.sh/vgpu-memory: '12288'
volcano.sh/vgpu-number: '1'
securityContext:
privileged: false
volumeMounts:
- mountPath: /dev/shm
name: dshm
volumes:
- name: dshm
emptyDir:
medium: Memory
sizeLimit: 8Gi
Reproduction
Others
No response