我是基于CHatGPT实现的AI助手,在此网站上负责整理和概括文章
# 如何在 docker 中部署 llama3 大语言模型和 RedNode
# 在开始之前:
# 环境要求:
- Docker 和 Docker Compose 已安装。
- Llama3 模型文件 已下载,并位于项目的根目录。
- Llama3 和 Node-RED 的 Docker 镜像已经准备好,或我们将从官方仓库拉取。
# 操作步骤
首先,项目根目录包含以下文件和目录:
project/
│
├── docker-compose.yml # Docker Compose 配置文件
├── llamasvr.py # Llama3 的服务启动文件
└── Llama-3-ELYZA-JP-8B-q4_k_m.gguf # Llama3 模型文件
编写 docker-compose.yml
文件,下面是用于同时部署 Llama3 和 Node-RED 的 Docker Compose 文件内容:
version: '3' | |
services: | |
llama3: | |
image: llama3i # 你使用的 Llama3 镜像 | |
container_name: llama3_container | |
ports: | |
- "8000:8000" # Llama3 服务的端口 | |
volumes: | |
- .:/app # 将当前目录映射到容器的 /app 目录 | |
networks: | |
- mynetwork | |
command: python /app/llamasvr.py # 启动 Llama3 服务 | |
nodered: | |
image: nodered/node-red # Node-RED 官方镜像 | |
container_name: nodered_container | |
ports: | |
- "1880:1880" # Node-RED 的默认端口 | |
networks: | |
- mynetwork | |
networks: | |
mynetwork: | |
driver: bridge # 共享网络,使两个容器可以互相通信 |
Llama3 服务器启动文件 ( llamasvr.py
):
from http.server import BaseHTTPRequestHandler, HTTPServer | |
import json | |
from llama_cpp import Llama | |
# 加载 Llama 模型 | |
llm = Llama(model_path="/app/Llama-3-ELYZA-JP-8B-q4_k_m.gguf", n_ctx=4098) | |
class LlamaServer(BaseHTTPRequestHandler): | |
def do_POST(self): | |
content_length = int(self.headers['Content-Length']) | |
post_data = self.rfile.read(content_length) | |
data = json.loads(post_data) | |
# 解析输入文本 | |
input_text = data.get('input_text', '') | |
# 调用 Llama 模型生成回复 | |
result = llm.generate(input_text) | |
# 返回生成的结果 | |
self.send_response(200) | |
self.send_header('Content-Type', 'application/json') | |
self.end_headers() | |
self.wfile.write(json.dumps({"generated_text": result}).encode('utf-8')) | |
def run(): | |
server_address = ('', 8000) | |
httpd = HTTPServer(server_address, LlamaServer) | |
print('Running Llama server...') | |
httpd.serve_forever() | |
if __name__ == '__main__': | |
run() |
# 构建和启动容器
使用以下命令构建并启动容器:
# 构建并启动服务 | |
docker-compose up --build |
# 在 Node-RED 中配置流程
-
打开 Node-RED(默认在
http://localhost:1880
)。 -
添加以下节点:
-
inject
节点:用于触发请求。 -
function
节点:设置请求的 JSON 数据,例如{"input_text": "Hello Llama3"}
。 -
http request
节点:设置 URL 为http://llama3_container:8000
,请求方法为POST
。 -
debug
节点:查看返回的结果。
function 节点代码:
[
{
"id": "f6f2187d.f17ca8",
"type": "tab",
"label": "Flow 1",
"disabled": false,
"info": ""
},
{
"id": "inject_trigger",
"type": "inject",
"z": "f6f2187d.f17ca8",
"name": "Trigger Request",
"props": [
{
"p": "payload"
}
],
"repeat": "",
"crontab": "",
"once": false,
"onceDelay": 0.1,
"topic": "",
"payload": "",
"payloadType": "date",
"x": 160,
"y": 140,
"wires": [
[
"set_payload"
]
]
},
{
"id": "set_payload",
"type": "function",
"z": "f6f2187d.f17ca8",
"name": "Set Payload",
"func": "msg.headers = {\n \"Content-Type\": \"application/json\"\n};\n\nmsg.payload = {\n \"input_text\": \"方世昊\"\n};\n\nreturn msg;",
"outputs": 1,
"timeout": "",
"noerr": 0,
"initialize": "",
"finalize": "",
"libs": [],
"x": 340,
"y": 140,
"wires": [
[
"791174083fe1a4c6"
]
]
},
{
"id": "791174083fe1a4c6",
"type": "http request",
"z": "f6f2187d.f17ca8",
"name": "POST Request",
"method": "POST",
"ret": "obj",
"paytoqs": "ignore",
"url": "http://llama3_container:8000",
"tls": "",
"persist": false,
"proxy": "",
"insecureHTTPParser": false,
"authType": "",
"senderr": false,
"headers": [],
"x": 540,
"y": 140,
"wires": [
[
"debug_output"
]
]
},
{
"id": "debug_output",
"type": "debug",
"z": "f6f2187d.f17ca8",
"name": "Response",
"active": true,
"tosidebar": true,
"console": false,
"tostatus": false,
"complete": "payload",
"targetType": "msg",
"statusVal": "",
"statusType": "auto",
"x": 720,
"y": 140,
"wires": []
}
]
-
现在就可以进行通信了,在 Node-RED 界面输入想要询问的内容,接下来要做的就是等待抽象返回值。