写在前面

  • 因离线环境为arm64环境,且不好复制相应信息,这边docker环境部署及之前的相应内容均在X86上演示。arm64的相应命令也会同步贴出,请注意辨别。

系统环境

openEuler-22.03-LTS-x86_64
Authorized users only. All activities may be monitored and reported.
Last login: Mon Jul 28 17:35:39 2025
Welcome to 5.10.0-60.18.0.50.oe2203.x86_64
System information as of time:  2025年 07月 28日 星期一 17:36:17 CST
System load:    0.08
Processes:      239
Memory used:    7.9%
Swap used:      0%
Usage On:       3%
IP address:     192.168.229.131
Users online:   2

安装基础软件包

1.更改国内源

sudo cp /etc/yum.repos.d/openEuler.repo /etc/yum.repos.d/openEuler.repo.bak
# 备份原有软件源配置文件

sudo vim /etc/yum.repos.d/openEuler.repo
# 编辑软件源配置文件

将文件中的原有仓库地址替换为国内镜像源,例如替换为华为云镜像源

[OS]
name=OS
baseurl=https://repo.huaweicloud.com/openeuler/openEuler-22.03-LTS/OS/$basearch/
enabled=1
gpgcheck=1
gpgkey=https://repo.huaweicloud.com/openeuler/openEuler-22.03-LTS/OS/$basearch/RPM-GPG-KEY-openEuler

[everything]
name=everything
baseurl=https://repo.huaweicloud.com/openeuler/openEuler-22.03-LTS/everything/$basearch/
enabled=1
gpgcheck=1
gpgkey=https://repo.huaweicloud.com/openeuler/openEuler-22.03-LTS/everything/$basearch/RPM-GPG-KEY-openEuler

[EPOL]
name=EPOL
baseurl=https://repo.huaweicloud.com/openeuler/openEuler-22.03-LTS/EPOL/$basearch/
enabled=1
gpgcheck=1
gpgkey=https://repo.huaweicloud.com/openeuler/openEuler-22.03-LTS/OS/$basearch/RPM-GPG-KEY-openEuler

清除并生成缓存

sudo dnf clean all
sudo dnf makecache

源地址:https://forum.openeuler.org/t/topic/768

2.安装必要工具链

yum install tar wget curl vim nano -y

安装docker

在其他能联网的机器上准备对应包
1753750351-949137-image.png

演示机使用:X86:docker-28.3.2.tgz

离线机使用:ARM64:docker-28.3.2.tgz

  • 上传到对应目录

1753750460-20052-image.png

cd /home/
tar -zxvf docker-28.3.2.tgz 
# 解压
 chmod -R +x docker
# 递归赋予可执行权限
cp docker/* /usr/bin
# 拷贝文件到系统目录

编写docker服务文件

vim /etc/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target

[Service]
Type=notify
ExecStart=/usr/bin/dockerd
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=infinity
LimitNPROC=infinity
TimeoutStartSec=0
Delegate=yes
KillMode=process
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s

[Install]
WantedBy=multi-user.target
chmod +x /etc/systemd/system/docker.service
# 赋执行权限
systemctl daemon-reload
# 重新加载配置文件
systemctl enable docker.service
# 开机启动
systemctl start docker
# 启动docker
systemctl status docker
# docker状态
systemctl restart docker
# 重启docker服务

配置加速地址并更改存储路径

mkdir -p /etc/docker /home/docker
vim /etc/docker/daemon.json

填写加速地址以及存储

{
"data-root": "/home/docker",
"registry-mirrors": ["https://docker.m.daocloud.io", "https://docker.imgdb.de", "https://docker-0.unsee.tech", "https://docker.hlmirror.com", "https://cjie.eu.org"]
}
sudo systemctl daemon-reload
# 重新加载配置
sudo systemctl restart docker
# 重启 Docker 服务

安装docker-compose


github地址


演示机使用:
X86:docker-compose-linux-x86_64


离线机使用:
ARM64:docker-compose-linux-aarch64

  • 上传到对应目录

x86:

# 复制到系统目录
cp docker-compose-linux-x86_64 /usr/local/bin/docker-compose
# 赋予执行权限
chmod +x /usr/local/bin/docker-compose
# 验证安装
docker-compose --version

arm64:

# 复制到系统目录
cp docker-compose-linux-aarch64 /usr/local/bin/docker-compose
# 赋予执行权限
chmod +x /usr/local/bin/docker-compose
# 验证安装
docker-compose --version

输出内容:

[root@ola home]# docker-compose --version
Docker Compose version v2.39.1

镜像部署

镜像获取

mindie_2.0.T17.B010-800I-A2-py3.11-openeuler24.03-lts-aarch64.tar.gz

  • PS:以下均在arm架构实体机上操作

镜像导入

docker load -i mindie_2.0.T17.B010-800I-A2-py3.11-openeuler24.03-lts-aarch64.tar.gz

驱动获取

固件与驱动:商用版
固件与驱动:社区版

驱动安装

因为我这边离线机器都是安装好了的,这里略过,网上一大堆教程的。

驱动校验

在宿主机上执行

npu-smi info

看到以下信息即可:

npu-smi info
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.0.3                 Version: 24.1.0.3                                             |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 0     910B2               | OK            | 101.0       40                0    / 0             |
| 0                         | 0000:C1:00.0  | 0           0    / 0          5037 / 65536         |
.......

准备transformers-4.51.0

transformers-4.51.0
然后上传到指定宿主机目录(自行解压),我这里以/apps/mindie/为例

模型准备

模型拉取参阅:乌班图安装pipx以及配置魔塔终端工具

配置准备

  • 注意事项:
    1. ipAddress如果是0.0.0.0需要allowAllZeroIpListening=true
    1. worldSize要和npuDeviceIds 匹配。他们用来控制多卡并行推理。npuDeviceIds通常指的是逻辑卡(不管怎么挂卡都是0,1,2...这样的id)。但是如果你的容器是--privileged的话全部卡都会挂进来,那它就得指实体卡id的。
    1. 如果没证书 httpsEnabled要关上
  • 4.maxPrefillTokens >= maxSeqLen > maxInputTokenLen
    示例:"maxSeqLen" : 20480,"maxInputTokenLen" : 16384,"maxPrefillTokens" : 32678,"maxIterTimes" : 32768,
    1. maxIterTimes会限制迭代次数。如果很小,可能模型会戛然而止。如果不想特别设置最好和 maxSeqLen一致
  • 以下是我能正常启动的config.json配置文件:
{
    "Version" : "1.0.0",

    "ServerConfig" :
    {
        "ipAddress" : "0.0.0.0",
        "managementIpAddress" : "127.0.0.2",
        "port" : 1025,
        "managementPort" : 1026,
        "metricsPort" : 1027,
        "allowAllZeroIpListening" : true,
        "maxLinkNum" : 1000,
        "httpsEnabled" : false,
        "fullTextEnabled" : false,
        "tlsCaPath" : "security/ca/",
        "tlsCaFile" : ["ca.pem"],
        "tlsCert" : "security/certs/server.pem",
        "tlsPk" : "security/keys/server.key.pem",
        "tlsPkPwd" : "security/pass/key_pwd.txt",
        "tlsCrlPath" : "security/certs/",
        "tlsCrlFiles" : ["server_crl.pem"],
        "managementTlsCaFile" : ["management_ca.pem"],
        "managementTlsCert" : "security/certs/management/server.pem",
        "managementTlsPk" : "security/keys/management/server.key.pem",
        "managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
        "managementTlsCrlPath" : "security/management/certs/",
        "managementTlsCrlFiles" : ["server_crl.pem"],
        "kmcKsfMaster" : "tools/pmt/master/ksfa",
        "kmcKsfStandby" : "tools/pmt/standby/ksfb",
        "inferMode" : "standard",
        "interCommTLSEnabled" : true,
        "interCommPort" : 1121,
        "interCommTlsCaPath" : "security/grpc/ca/",
        "interCommTlsCaFiles" : ["ca.pem"],
        "interCommTlsCert" : "security/grpc/certs/server.pem",
        "interCommPk" : "security/grpc/keys/server.key.pem",
        "interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
        "interCommTlsCrlPath" : "security/grpc/certs/",
        "interCommTlsCrlFiles" : ["server_crl.pem"],
        "openAiSupport" : "vllm",
        "tokenTimeout" : 600,
        "e2eTimeout" : 600,
        "distDPServerEnabled":false
    },

    "BackendConfig" : {
        "backendName" : "mindieservice_llm_engine",
        "modelInstanceNumber" : 1,
        "npuDeviceIds" : [[0,1]],
        "tokenizerProcessNumber" : 8,
        "multiNodesInferEnabled" : false,
        "multiNodesInferPort" : 1120,
        "interNodeTLSEnabled" : true,
        "interNodeTlsCaPath" : "security/grpc/ca/",
        "interNodeTlsCaFiles" : ["ca.pem"],
        "interNodeTlsCert" : "security/grpc/certs/server.pem",
        "interNodeTlsPk" : "security/grpc/keys/server.key.pem",
        "interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
        "interNodeTlsCrlPath" : "security/grpc/certs/",
        "interNodeTlsCrlFiles" : ["server_crl.pem"],
        "interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
        "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
        "ModelDeployConfig" :
        {
            "maxSeqLen" : 2560,
            "maxInputTokenLen" : 2048,
            "truncation" : false,
            "ModelConfig" : [
                {
                    "modelInstanceType" : "Standard",
                    "modelName" : "Qwen3-32B",
                    "modelWeightPath" : "/data/Qwen3-32B",
                    "worldSize" : 2,
                    "cpuMemSize" : 5,
                    "npuMemSize" : -1,
                    "backendType" : "atb",
                    "trustRemoteCode" : false
                }
            ]
        },

        "ScheduleConfig" :
        {
            "templateType" : "Standard",
            "templateName" : "Standard_LLM",
            "cacheBlockSize" : 128,

            "maxPrefillBatchSize" : 50,
            "maxPrefillTokens" : 8192,
            "prefillTimeMsPerReq" : 150,
            "prefillPolicyType" : 0,

            "decodeTimeMsPerReq" : 50,
            "decodePolicyType" : 0,

            "maxBatchSize" : 200,
            "maxIterTimes" : 512,
            "maxPreemptCount" : 0,
            "supportSelectBatch" : false,
            "maxQueueDelayMicroseconds" : 5000
        }
    }
}

容器创建

以两块算力卡为例

docker run -it -d --shm-size=128g \
    --privileged \
    --name mindie-Qwen3-32B \
    --device=/dev/davinci_manager \
    --device=/dev/hisi_hdc \
    --device=/dev/devmm_svm \
    --device=/dev/davinci0 \
    --device=/dev/davinci1 \
    --net=host \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
    -v /usr/local/sbin:/usr/local/sbin:ro \
    -v /apps/sharedstorage/Qwen3-32B:/data/Qwen3-32B:ro \
    -v /apps/mindie/transformers-4.51.0/:/root/mindie2/transformers-4.51.0:ro \
    -v /apps/mindie/config/config.json:/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json:ro \
    mindie:2.0.T17.B010-800I-A2-py3.11-openeuler24.03-lts-aarch64 bash

进入容器

docker exec -it mindie-Qwen3-32B bash
cd /root/mindie2/transformers-4.51.0/
pip install --no-index --find-links=. transformers==4.51.0
# 升级transformers

启动模型

# 进入启动路径
cd /usr/local/Ascend/mindie/latest/mindie-service/bin/
# 启动推理服务
./mindieservice_daemon

见到以下内容为启动成功:

环境调用测试

curl "http://127.0.0.1:1025/v1/chat/completions" -H "Content-Type: application/json" \
    -H "Authorization: Bearer xxx" \
    -d '{
        "model": "Qwen3-32B",
        "stream": true,
        "messages": [
            {
                "role": "user",
                "content": "/no_think 你好."
            }
        ]
    }'

Oneapi部署封装接口

获取镜像

oneapi

镜像拉取及导入

docker pull --platform linux/arm64 justsong/one-api:v0.6.10
docker save justsong/one-api:v0.6.10 -o one-api:v0.6.10.tar
# 以上在X86中运行拉取
docker load -i one-apiv0.6.10.tar
# 在arm64中导入

创建docker-compose.yaml

services:
  one-api:
    image: ghcr.io/songquanpeng/one-api:v0.6.10
    container_name: one-api
    restart: always
    ports:
      - "3000:3000"
    environment:
      - TZ=Asia/Shanghai
      - TIKTOKEN_CACHE_DIR=/data/cache
    volumes:
      - ./data:/data
      - ./cache:/data/cache

容器启动

docker-compose up -d

Onerapi面板

web访问:http://IP:3000/

1753760122-404333-image.png

默认的登录信息:

  • 账号:root
  • 密码:123456

1753768559-409895-image.png

Onerapi配置

按照以下图例配置
1753768693-901848-image.png
1753768873-355262-image.png
1753768947-355042-image.png

  • PS:这里实际上的端口号应该是之前截图的1025,截图的时候没改过来

1753768999-189381-image.png
1753769125-985938-image.png
1753769242-613892-image.png

Onerapi接口测试

curl "http://127.0.0.1:3000/v1/chat/completions" -H "Content-Type: application/json" \
    -H "Authorization: Bearer sk-8di5qDTX0DHJETY3Ab7fBdD4424782A26ERH41FBH1Dd" \
    -d '{
        "model": "Qwen3-32B",
        "stream": true,
        "messages": [
            {
                "role": "user",
                "content": "/no_think 你好."
            }
        ]
    }'

正常情况下就能通了

搭建远程下载环境

tar -xzf python3.9-arm64.tar.gz
# 这里使用的是自己预编译的python3.9
cd python-aarch64/bin
./python3.9 -m http.server 8888

1754291187252.png
1754291236060.png

其他问题解决

1.oneapi添加多个不同来源相同同模型的时候区分办法

2.dify添加oneapi封装后的接口报错网络错误解决

现象描述:oneapi的端口映射到了宿主机8000,dify启动后通过web添加oneapi添加了key封装后模型的时候报错 internet error,查询且实践后得到解决方案,特此记录。

# 步骤 1:创建一个自定义网络(例如叫 dify-oneapi)

docker network create dify-oneapi

# 步骤 2:把 one-api 加入到该网络(无需重建容器)

docker network connect dify-oneapi one-api

# 步骤 3:把 需要加入的容器也接入同一个网络
#先查看 需要加入的的容器名(命令:docker ps),然后执行:

docker network connect dify-oneapi jw-nginx-1
docker network connect dify-oneapi jw-worker_beat-1
docker network connect dify-oneapi jw-plugin_daemon-1
docker network connect dify-oneapi jw-worker-1
docker network connect dify-oneapi jw-api-1
docker network connect dify-oneapi jw-web-1
docker network connect dify-oneapi jw-db-1
docker network connect dify-oneapi jw-sandbox-1
docker network connect dify-oneapi jw-redis-1
docker network connect dify-oneapi jw-ssrf_proxy-1
docker network connect dify-oneapi jw-weaviate-1
docker network connect dify-oneapi one-api
docker network connect dify-oneapi docker-nginx-1
docker network connect dify-oneapi docker-api-1
docker network connect dify-oneapi docker-worker_beat-1
docker network connect dify-oneapi docker-worker-1
docker network connect dify-oneapi docker-plugin_daemon-1
docker network connect dify-oneapi docker-sandbox-1
docker network connect dify-oneapi docker-web-1
docker network connect dify-oneapi docker-db-1
docker network connect dify-oneapi docker-redis-1
docker network connect dify-oneapi docker-ssrf_proxy-1
docker network connect dify-oneapi docker-weaviate-1

# 步骤 4:在 Dify 中添加插件地址改为:
http://one-api:3000/v1
# 注意此时用的是 容器名 one-api,Docker 会自动通过 DNS 解析它。3000为oneapi容器内部端口

参上处理后,问题解决。

3.创建独立的dify容器

docker compose -p xx up -d

4.dify重置管理员邮箱

docker exec -it docker-api-1 flask reset-email

然后输入旧的邮箱和新的邮箱即可

参考教程

1.华为官方构建Qwen3-32B说明
2.昇腾910B部署千问3(Qwen3)大模型

最后修改:2025 年 08 月 15 日
如果觉得我的文章对你有用,请随意赞赏