Ollama 提供了强大的 REST API,使开发者能够方便地与大语言模型进行交互。通过 Ollama API,用户可以发送请求并接收模型生成的响应,应用于自然语言处理、文本生成等任务。本文将详细介绍生成补全、对话生成的基本操作,并对创建模型、复制模型、删除模型等常见操作也进行了说明。
POST /api/generate
使用指定的模型生成给定提示的响应。这是一个流式端点,因此会有一系列响应。最终的响应对象将包括来自请求的统计信息和其他数据。
model
: (必需)模型名称prompt
: 要生成响应的提示suffix
: 模型响应后的文本images
: (可选)一个base64编码的图像列表(用于多模态模型,如llava
)
高级参数(可选):
format
: 返回响应的格式。目前唯一接受的值是json
options
: 其他模型参数,如temperature
、seed
等system
: 系统消息template
: 要使用的提示模板context
: 从先前对/generate
的请求中返回的上下文参数,可以用于保持简短的对话记忆stream
: 如果设置为false
,响应将作为单个响应对象返回,而不是一系列对象流raw
: 如果设置为true
,将不会对提示进行任何格式化。如果您在请求API时指定了完整的模板提示,可以选择使用raw
参数keep_alive
: 控制模型在请求后保留在内存中的时间(默认:5m
)
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "为什么草是绿的?"
}'
Tip
如果在 Windows 操作系统中使用 curl
命令,请下载 curl for Windows, 解压缩文件,然后找到该文件目录下的 bin 子文件, 复制文件地址添加环境变量。
在命令行窗口(注意不是 PowerShell)使用以下命令,检查是否成功添加。
curl --help
显示如下,表示成功添加。
Tip
在 Windows 命令行窗口下使用 curl
请求命令时,注意使用转义双引号。示例命令如下。
curl http://localhost:11434/api/generate -d "{\"model\": \"llama3.1\", \"prompt\": \"为什么草是绿的\"}"
显示如下表示请求成功。
返回的是一个 JSON 对象流:
{
"model":"llama3.1",
"created_at":"2024-08-08T02:54:08.184732629Z",
"response":"植物",
"done":false
}
流中的最终响应还包括有关生成的附加数据:
- context: 用于此响应的对话编码,可以在下一个请求中发送以保持对话记忆
- total_duration: 生成响应所花费的时间(纳秒)
- load_duration: 加载模型所花费的时间(纳秒)
- prompt_eval_count: 提示中的标记数量
- prompt_eval_duration: 评估提示所花费的时间(纳秒)
- eval_count: 响应中的标记数量
- eval_duration: 生成响应所花费的时间(纳秒)
- response: 如果响应是流式的则为空,如果不是流式的,这将包含完整的响应要计算响应生成速度(每秒生成的标记数,token/s), 即
eval_count
/eval_duration
* 10^9。
最终响应:
{
"model":"llama3.1",
"created_at":"2024-08-08T02:54:10.819603411Z",
"response":"",
"done":true,
"done_reason":"stop",
"context":[1,2,3],
"total_duration":8655401792,
"load_duration":5924129727,
"prompt_eval_count":17,
"prompt_eval_duration":29196000,
"eval_count":118,
"eval_duration":2656329000
}
将 stream
设置为 false
,可以一次收到所有回复。
示例请求
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "为什么草是绿的?",
"stream": false
}'
示例响应
{
"model":"llama3.1",
"created_at":"2024-08-08T07:13:34.418567351Z",
"response":"答案:叶子含有大量的叶绿素。",
"done":true,
"done_reason":"stop",
"context":[1,2,3],
"total_duration":2902435095,
"load_duration":2605831520,
"prompt_eval_count":17,
"prompt_eval_duration":29322000,
"eval_count":13,
"eval_duration":266499000
}
当 format
设置为 json
时,输出将是 JSON 格式。但请注意,要在 prompt
里指示模型以 JSON 格式响应,否则模型可能生成大量空格。
示例请求
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "为什么草是绿的?以JSON格式输出答案",
"format": "json",
"stream": false
}'
示例响应
{
"model":"llama3.1",
"created_at":"2024-08-08T07:21:24.950883454Z",
"response":"{\n \"颜色原因\": \"叶子中含有光合作用所需的叶绿素\",\n \"作用\": \"进行光合作用吸收太阳能\"\n}",
"done":true,
"done_reason":"stop",
"context":[1,2,3],
"total_duration":3492279981,
"load_duration":2610591203,
"prompt_eval_count":22,
"prompt_eval_duration":28804000,
"eval_count":40,
"eval_duration":851206000
}
response
的值将是一个包含类似于以下内容的 JSON 的字符串:
{
"颜色原因": "叶子中含有光合作用所需的叶绿素",
"作用": "进行光合作用吸收太阳能"
}
要向多模态模型(如 llava
或 bakllava
)提交图像,请提供 base64 编码的 images
列表:
示例请求
curl http://localhost:11434/api/generate -d '{
"model": "llava",
"prompt":"描述这张图片",
"stream": false,
"images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"
]
}'
示例响应
{
"model":"llava",
"created_at":"2024-08-08T07:33:55.481713465Z",
"response":" The image shows a cartoon of an animated character that resembles a cute pig with large eyes and a smiling face. It appears to be in motion, indicated by the lines extending from its arms and tail, giving it a dynamic feel as if it is waving or dancing. The style of the image is playful and simplistic, typical of line art or stickers. The character's design has been stylized with exaggerated features such as large ears and a smiling expression, which adds to its charm. ",
"done":true,
"done_reason":"stop",
"context":[1,2,3],
"total_duration":2960501550,
"load_duration":4566012,
"prompt_eval_count":1,
"prompt_eval_duration":758437000,
"eval_count":108,
"eval_duration":2148818000
}
将 seed
设置为一个固定数值,得到可复现的输出:
示例请求
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "为什么草是绿的?",
"stream": false,
"options": {
"seed": 1001
}
}'
示例响应
{
"model":"llama3.1",
"created_at":"2024-08-08T07:42:28.397780058Z",
"response":"答案:因为叶子中含有大量的氯离子。",
"done":true,
"done_reason":"stop",
"context":[1,2,3],
"total_duration":404791556,
"load_duration":18317351,
"prompt_eval_count":17,
"prompt_eval_duration":22453000,
"eval_count":16,
"eval_duration":321267000}
POST /api/chat
使用指定模型生成聊天中的下一条消息。这也是一个流式端点,因此会有一系列响应。如果将 "stream"
设置为 false
,则可以禁用流式传输。最终的响应对象将包括请求的统计信息和附加数据。
model
:(必需)模型名称messages
:聊天消息,这可以用于保持聊天记忆tools
:模型支持使用的工具。需要将stream
设置为false
message
对象具有以下字段:
role
:消息的角色,可以是system
、user
、assistant
或tool
content
:消息的内容images
(可选):要包含在消息中的图像列表(对于如llava
之类的多模态模型)tool_calls
(可选):模型想要使用的工具列表
高级参数(可选):
format
:返回响应的格式。目前唯一接受的值是json
options
:其他模型参数,如temperature
、seed
等stream
:如果为false
,响应将作为单个响应对象返回,而不是一系列对象流keep_alive
:控制请求后模型在内存中保持加载的时间(默认:5m
)
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"messages": [
{
"role": "user",
"content": "为什么草是绿的?"
}
]
}'
返回 JSON 对象流:
{
"model":"llama3.1",
"created_at":"2024-08-08T03:54:36.933701041Z",
"message":{
"role":"assistant",
"content":"因为"
},
"done":false
}
最终响应:
{
"model":"llama3.1",
"created_at":"2024-08-08T03:54:37.187621765Z",
"message":{
"role":"assistant",
"content":""
},
"done_reason":"stop",
"done":true,
"total_duration":5730533217,
"load_duration":5370535786,
"prompt_eval_count":17,
"prompt_eval_duration":29621000,
"eval_count":13,
"eval_duration":273810000
}
非流式输出、JSON模式、多模态输入、可复现输出的参数设置与 回答API
的一致。
发送带有对话历史记录的聊天消息。可以使用相同的方法开始多轮对话或思维链提示。
示例请求
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"messages": [
{
"role": "user",
"content": "为什么草是绿色的?"
},
{
"role": "assistant",
"content": "因为草里面含有叶绿素。"
},
{
"role": "user",
"content": "为什么叶绿素让草看起来是绿色的?"
}
],
"stream": false
}'
示例响应
{
"model":"llama3.1",
"created_at":"2024-08-08T07:53:28.849517802Z",
"message":{
"role":"assistant",
"content":"这是一个更复杂的问题!\n\n叶绿素是一种称为黄素的色素,这些色素可以吸收光能。在日光下,绿色草叶中的叶绿素会吸收蓝光和红光,但反射出黄色和绿色的光,所以我们看到草看起来是绿色的。\n\n简单来说,叶绿素让草看起来是绿色的,因为它反射了我们的眼睛可以看到的绿光,而不反射我们看到的其他颜色。"
},
"done_reason":"stop",
"done":true,
"total_duration":5065572138,
"load_duration":2613559070,
"prompt_eval_count":48,
"prompt_eval_duration":37825000,
"eval_count":106,
"eval_duration":2266694000}
POST /api/create
推荐将 modelfile
设置为 Modelfile 的内容,而不仅仅是设置 path
。远程模型创建还必须使用 创建Blob 显式地创建所有文件 blobs、字段(如 FROM
和 ADAPTER
),并将该值设置为响应中指示的路径。
name
:要创建的模型名称modelfile
(可选):Modelfile 的内容stream
(可选):如果为false
,响应将作为单个响应对象返回,而不是对象流path
(可选):Modelfile 的路径
curl http://localhost:11434/api/create -d '{
"name": "mario",
"modelfile": "FROM llama3\nSYSTEM You are mario from Super Mario Bros."
}'
一串 JSON 对象。注意,最后的 JSON 对象显示 "status": "success"
提示创建成功。
{"status":"reading model metadata"}
{"status":"creating system layer"}
{"status":"using already created layer sha256:22f7f8ef5f4c791c1b03d7eb414399294764d7cc82c7e94aa81a1feb80a983a2"}
{"status":"using already created layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b"}
{"status":"using already created layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d"}
{"status":"using already created layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988"}
{"status":"using already created layer sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9"}
{"status":"writing layer sha256:df30045fe90f0d750db82a058109cecd6d4de9c90a3d75b19c09e5f64580bb42"}
{"status":"writing layer sha256:f18a68eb09bf925bb1b669490407c1b1251c5db98dc4d3d81f3088498ea55690"}
{"status":"writing manifest"}
{"status":"success"}
HEAD /api/blobs/:digest
确保用于 FROM 或 ADAPTER 字段的文件 blob 存在于服务器上。这是在检查你的 Ollama 服务器而不是 Ollama.ai。
digest
:blob 的 SHA256 摘要
curl -I http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
如果 blob 存在则返回 "200 OK",如果不存在则返回 "404 Not Found"。
POST /api/blobs/:digest
从服务器上的文件创建一个 blob。返回服务器文件路径。
digest
:文件的预期 SHA256 摘要
curl -T model.bin -X POST http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
如果 blob 成功创建则返回 201 Created,如果使用的摘要不符合预期则返回 400 Bad Request。
POST /api/copy
复制一个模型,使用另一个名称复制已有模型。
curl http://localhost:11434/api/copy -d '{
"source": "llama3.1",
"destination": "llama3-backup"
}'
如果成功则返回 "200 OK",如果源模型不存在则返回 "404 Not Found"。
DELETE /api/delete
删除模型及其数据。
name
:要删除的模型名称
curl -X DELETE http://localhost:11434/api/delete -d '{
"name": "llama3.1"
}'
如果成功则返回 "200 OK",如果要删除的模型不存在则返回 "404 Not Found"。
GET /api/ps
列出当前加载到内存中的模型。
curl http://localhost:11434/api/ps
{
"models":[
{
"name":"llama3.1:latest",
"model":"llama3.1:latest",
"size":6654289920,
"digest":"75382d0899dfaaa6ce331cf680b72bd6812c7f05e5158c5f2f43c6383e21d734",
"details":{
"parent_model":"",
"format":"gguf",
"family":"llama",
"families":["llama"],
"parameter_size":"8.0B",
"quantization_level":"Q4_0"
},
"expires_at":"2024-08-08T14:06:52.883023476+08:00",
"size_vram":6654289920
}
]
}
GET /api/tags
列出本地可用的模型。
curl http://localhost:11434/api/tags
{
"models":[
{
"name":"llama3.1:latest",
"model":"llama3.1:latest",
"modified_at":"2024-08-07T17:54:22.533937636+08:00",
"size":4661230977,
"digest":"75382d0899dfaaa6ce331cf680b72bd6812c7f05e5158c5f2f43c6383e21d734",
"details":{
"parent_model":"",
"format":"gguf",
"family":"llama",
"families":["llama"],
"parameter_size":"8.0B",
"quantization_level":"Q4_0"
}
}
]
}
POST /api/show
显示有关模型的信息,包括详细信息、modelfile、模板、参数、许可证、系统提示。
name
:要显示的模型名称verbose
(可选):如果设置为true
,则返回详细响应字段的完整数据
curl http://localhost:11434/api/show -d '{
"name": "llama3.1"
}'
{
"license":"...",
"modelfile":"...",
"parameters":"...",
"template":"...",
"details":{
"parent_model":"",
"format":"gguf",
"family":"llama",
"families":["llama"],
"parameter_size":"8.0B",
"quantization_level":"Q4_0"
},
"model_info":{
"general.architecture":"llama",
"general.basename":"Meta-Llama-3.1",
"general.file_type":2,
"general.finetune":"Instruct",
"general.languages":["en","de","fr","it","pt","hi","es","th"],
"general.license":"llama3.1",
"general.parameter_count":8030261312,
"general.quantization_version":2,
"general.size_label":"8B",
"general.tags":["facebook","meta","pytorch","llama","llama-3","text-generation"],
"general.type":"model",
"llama.attention.head_count":32,
"llama.attention.head_count_kv":8,
"llama.attention.layer_norm_rms_epsilon":0.00001,
"llama.block_count":32,
"llama.context_length":131072,
"llama.embedding_length":4096,
"llama.feed_forward_length":14336,
"llama.rope.dimension_count":128,
"llama.rope.freq_base":500000,
"llama.vocab_size":128256,
"tokenizer.ggml.bos_token_id":128000,
"tokenizer.ggml.eos_token_id":128009,
"tokenizer.ggml.merges":null,
"tokenizer.ggml.model":"gpt2",
"tokenizer.ggml.pre":"llama-bpe",
"tokenizer.ggml.token_type":null,
"tokenizer.ggml.tokens":null
},
"modified_at":"2024-08-07T17:54:22.533937636+08:00"
}
POST /api/pull
从 ollama
库下载模型。中断的拉取操作会从断点继续下载,多个调用将共享相同的下载进度。
name
:要拉取的模型名称insecure
(可选):允许对库进行不安全连接。建议仅在开发期间,从自己的库中拉取时使用此选项。stream
(可选):如果为false
,响应将作为单个响应对象返回,而不是对象流。
curl http://localhost:11434/api/pull -d '{
"name": "llama3.1"
}'
如果 stream
未指定或设置为 true
,则返回一串 JSON 对象:
第一个对象是清单:
{
"status": "pulling manifest"
}
然后是一系列下载响应。在下载完成之前,可能不会包含 completed
键。要下载的文件数量取决于清单中指定的层数。
{
"status": "downloading digestname",
"digest": "digestname",
"total": 2142590208,
"completed": 241970
}
在所有文件下载完成后,最后的响应是:
{
"status": "verifying sha256 digest"
}
{
"status": "writing manifest"
}
{
"status": "removing any unused layers"
}
{
"status": "success"
}
如果 stream
设置为 false,则响应是一个单一的 JSON 对象:
{
"status": "success"
}
POST /api/push
将模型上传到模型库。需要先注册 ollama.ai 并添加公钥。
name
:要推送的模型名称,格式为<namespace>/<model>:<tag>
insecure
(可选):允许对库进行不安全连接。仅在开发期间推送到自己的库时使用此选项。stream
(可选):如果为false
,响应将作为单个响应对象返回,而不是对象流。
curl http://localhost:11434/api/push -d '{
"name": "mattw/pygmalion:latest"
}'
如果 stream
未指定或设置为 true
,则返回一串 JSON 对象:
{ "status": "retrieving manifest" }
然后是一系列上传响应:
{
"status": "starting upload",
"digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total": 1928429856
}
最后,当上传完成时:
{"status":"pushing manifest"}
{"status":"success"}
如果 stream
设置为 false
,则响应是一个单一的 JSON 对象:
{ "status": "success" }
POST /api/embed
从模型生成嵌入。
model
:要从中生成嵌入的模型名称input
:要生成嵌入的文本或文本列表
高级参数:
truncate
:截断每个输入的末尾以适应上下文长度。如果为false
并且超过上下文长度,则返回错误。默认值为true
options
:其他模型参数,如temperature
、seed
等keep_alive
:控制请求后模型在内存中保持加载的时间(默认:5m
)
curl http://localhost:11434/api/embed -d '{
"model": "llama3.1",
"input": "为什么草是绿的?"
}'
{
"model":"llama3.1",
"embeddings":[[
-0.008059342,-0.013182715,0.019781841,0.012018124,-0.024847334,
-0.0031902494,-0.02714767,0.015282277,0.060032737,...
]],
"total_duration":3041671009,
"load_duration":2864335471,
"prompt_eval_count":7}
curl http://localhost:11434/api/embed -d '{
"model": "llama3.1",
"input": ["为什么草是绿的?","为什么天是蓝的?"]
}'
{
"model":"llama3.1",
"embeddings":[[
-0.008471201,-0.013031566,0.019300476,0.011618419,-0.025197424,
-0.0024164673,-0.02669075,0.015766116,0.059984162,...
],[
-0.012765694,-0.012822924,0.015915949,0.006415892,-0.02327763,
0.004859615,-0.017922137,0.019488193,0.05638235,...
]],
"total_duration":195481419,
"load_duration":1318886,
"prompt_eval_count":14
}
Ollama API 在发生错误时会返回相应的错误代码和消息。常见错误包括:
- 400 Bad Request:请求格式错误。
- 404 Not Found:请求的资源不存在。
- 500 Internal Server Error:服务器内部错误。