导入模型¶
本指南将引导您导入 GGUF、PyTorch 或 Safetensors 模型。
导入 (GGUF)¶
Step 1: Write a Modelfile
¶
Start by creating a Modelfile
. This file is the blueprint for your model, specifying weights, parameters, prompt templates and more.
(Optional) many chat models require a prompt template in order to answer correctly. A default prompt template can be specified with the TEMPLATE
instruction in the Modelfile
:
Step 2: Create the Ollama model¶
Finally, create a model from your Modelfile
:
Step 3: Run your model¶
Next, test the model with ollama run
:
Importing (PyTorch & Safetensors)¶
Importing from PyTorch and Safetensors is a longer process than importing from GGUF. Improvements that make it easier are a work in progress.
Setup¶
First, clone the ollama/ollama
repo:
and then fetch its llama.cpp
submodule:
Next, install the Python dependencies:
python3 -m venv llm/llama.cpp/.venv
source llm/llama.cpp/.venv/bin/activate
pip install -r llm/llama.cpp/requirements.txt
Then build the quantize
tool:
Clone the HuggingFace repository (optional)¶
If the model is currently hosted in a HuggingFace repository, first clone that repository to download the raw model.
Install Git LFS, verify it's installed, and then clone the model's repository:
Convert the model¶
Note: some model architectures require using specific convert scripts. For example, Qwen models require running
convert-hf-to-gguf.py
instead ofconvert.py
Quantize the model¶
Step 3: Write a Modelfile
¶
Next, create a Modelfile
for your model:
Step 4: Create the Ollama model¶
Finally, create a model from your Modelfile
:
Step 5: Run your model¶
Next, test the model with ollama run
:
Publishing your model (optional – early alpha)¶
Publishing models is in early alpha. If you'd like to publish your model to share with others, follow these steps:
- Create an account
- Copy your Ollama public key:
- macOS:
cat ~/.ollama/id_ed25519.pub
- Windows:
type %USERPROFILE%\.ollama\id_ed25519.pub
- Linux:
cat /usr/share/ollama/.ollama/id_ed25519.pub
- Add your public key to your Ollama account
Next, copy your model to your username's namespace:
Then push the model:
After publishing, your model will be available at https://ollama.com/<your username>/example
.
Quantization reference¶
The quantization options are as follow (from highest highest to lowest levels of quantization). Note: some architectures such as Falcon do not support K quants.
q2_K
q3_K
q3_K_S
q3_K_M
q3_K_L
q4_0
(recommended)q4_1
q4_K
q4_K_S
q4_K_M
q5_0
q5_1
q5_K
q5_K_S
q5_K_M
q6_K
q8_0
f16