跳转至

将 Ollama 部署到 Kubernetes

先决条件

  • Ollama:https://ollama.com/download
  • Kubernetes 集群。此示例将使用 Google Kubernetes Engine。

步骤

1. 创建 Ollama 命名空间、守护集和服务

kubectl apply -f cpu.yaml

2. 将 Ollama 服务端口转发以便本地连接和使用

kubectl -n ollama port-forward service/ollama 11434:80

3. 拉取并运行模型,例如 orca-mini:3b

ollama run orca-mini:3b

(可选)硬件加速

在 Kubernetes 中使用硬件加速需要 NVIDIA 的 k8s-device-plugin。更多细节请访问链接。

配置完成后,创建一个支持 GPU 的 Ollama 部署。

kubectl apply -f gpu.yaml

源码

cpu.yaml

---
apiVersion: v1
kind: Namespace
metadata:
  name: ollama
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  namespace: ollama
spec:
  selector:
    matchLabels:
      name: ollama
  template:
    metadata:
      labels:
        name: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - name: http
          containerPort: 11434
          protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  name: ollama
  namespace: ollama
spec:
  type: ClusterIP
  selector:
    name: ollama
  ports:
  - port: 80
    name: http
    targetPort: http
    protocol: TCP

gpu.yaml

---
apiVersion: v1
kind: Namespace
metadata:
  name: ollama
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  namespace: ollama
spec:
  strategy:
    type: Recreate
  selector:
    matchLabels:
      name: ollama
  template:
    metadata:
      labels:
        name: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        env:
        - name: PATH
          value: /usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
        - name: LD_LIBRARY_PATH
          value: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: compute,utility
        ports:
        - name: http
          containerPort: 11434
          protocol: TCP
        resources:
          limits:
            nvidia.com/gpu: 1
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
---
apiVersion: v1
kind: Service
metadata:
  name: ollama
  namespace: ollama
spec:
  type: ClusterIP
  selector:
    name: ollama
  ports:
  - port: 80
    name: http
    targetPort: http
    protocol: TCP

仓库地址

kuberenets