跳转至

在 Fly.io GPU 实例上运行 Ollama

Ollama 在 Fly.io GPU 实例上几乎不需要任何配置就能运行。如果你还没有 GPU 的访问权限,你需要在等待名单上申请访问权限。一旦你被接受,你将收到一封电子邮件,里面有如何开始的指导。

使用 fly apps create 创建一个新应用:

fly apps create

然后在一个新文件夹中创建一个如下所示的 fly.toml 文件:

app = "sparkling-violet-709"
primary_region = "ord"
vm.size = "a100-40gb" # see https://fly.io/docs/gpus/gpu-quickstart/ for more info

[build]
  image = "ollama/ollama"

[http_service]
  internal_port = 11434
  force_https = false
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0
  processes = ["app"]

[mounts]
  source = "models"
  destination = "/root/.ollama"
  initial_size = "100gb"

接着为你的应用创建一个新的私有 IPv6 地址

fly ips allocate-v6 --private

然后部署你的应用:

fly deploy

最后,你可以使用新的 Fly.io Machine 交互式访问它:

fly machine run -e OLLAMA_HOST=http://your-app-name.flycast --shell ollama/ollama
$ ollama run openchat:7b-v3.5-fp16
>>> How do I bake chocolate chip cookies?
 To bake chocolate chip cookies, follow these steps:

1. Preheat the oven to 375°F (190°C) and line a baking sheet with parchment paper or silicone baking mat.

2. In a large bowl, mix together 1 cup of unsalted butter (softened), 3/4 cup granulated sugar, and 3/4
cup packed brown sugar until light and fluffy.

3. Add 2 large eggs, one at a time, to the butter mixture, beating well after each addition. Stir in 1
teaspoon of pure vanilla extract.

4. In a separate bowl, whisk together 2 cups all-purpose flour, 1/2 teaspoon baking soda, and 1/2 teaspoon
salt. Gradually add the dry ingredients to the wet ingredients, stirring until just combined.

5. Fold in 2 cups of chocolate chips (or chunks) into the dough.

6. Drop rounded tablespoons of dough onto the prepared baking sheet, spacing them about 2 inches apart.

7. Bake for 10-12 minutes, or until the edges are golden brown. The centers should still be slightly soft.

8. Allow the cookies to cool on the baking sheet for a few minutes before transferring them to a wire rack
to cool completely.

Enjoy your homemade chocolate chip cookies!

当你这样设置后,它将在你完成使用后自动关闭。然后当你再次访问它时,它将自动重新开启。这是在你不使用 GPU 实例时节省费用的好方法。如果你想要一个持久的唤醒使用连接到你的 Ollama 实例,你可以设置一个通过 WireGuard 连接到你的 Fly 网络。然后你可以在 http://your-app-name.flycast 访问你的 Ollama 实例。