ultralytics最近官方支持了rknn模型的导入,整体流程比用rknntool简单了不少,当然也有坑,记录下。我用的是yolov11,不确定对于v8等是否能用,大家可以评论区反馈我。
PS:如果你需要瑞莎radxa、香橙派orange pi的 屏幕、外壳、散热器,可以来我的咸鱼(coder4)看看,欢迎扫码关注
1 PC上模型转换
环境
python -m venv ./ultralytics-env
source ./ultralytics-env/bin/activate
python -m venv ./ultralytics-env
source ./ultralytics-env/bin/activate
python -m venv ./ultralytics-env source ./ultralytics-env/bin/activate
依赖
sudo pip install ultralytics
sudo pip install rknn-toolkit2 onnxslim
sudo pip install ultralytics
sudo pip install rknn-toolkit2 onnxslim
sudo pip install ultralytics sudo pip install rknn-toolkit2 onnxslim
需要预下载个模型
模型转换,我手头的是3566的,也支持3588
yolo export model=./yolo11s.pt format=rknn name=rk3566
ls yolo11s_rknn_model
metadata.yaml yolo11s-rk3566.rknn
yolo export model=./yolo11s.pt format=rknn name=rk3566
ls yolo11s_rknn_model
metadata.yaml yolo11s-rk3566.rknn
yolo export model=./yolo11s.pt format=rknn name=rk3566 ls yolo11s_rknn_model metadata.yaml yolo11s-rk3566.rknn
这里目前都是固定尺寸的,不支持dynamic,但是可以设置imgsize
yolo export model=./yolo11s.pt format=rknn name=rk3566 imgsz=320
yolo export model=./yolo11s.pt format=rknn name=rk3566 imgsz=320
yolo export model=./yolo11s.pt format=rknn name=rk3566 imgsz=320
复制到有npu的开发板上
scp -r yolo11s_rknn_model board_ip:~/
scp -r yolo11s_rknn_model board_ip:~/
scp -r yolo11s_rknn_model board_ip:~/
2 rknn上预测
开启npu,不同开发板的步骤不一样,我这个是瑞莎(radxa)的rock 3c,方法如下:
PS:如果你需要瑞莎radxa、香橙派orange pi的 屏幕、外壳、散热器,可以来我的咸鱼(coder4)看看,欢迎扫码关注
# on rk board
# enable npu
sudo rsetup
# overlay -> manage -> Enable NPU
# on rk board
# enable npu
sudo rsetup
# overlay -> manage -> Enable NPU
# on rk board # enable npu sudo rsetup # overlay -> manage -> Enable NPU
安装包
sudo pip install ultralytics rknn-toolkit-lite2
sudo pip install ultralytics rknn-toolkit-lite2
sudo pip install ultralytics rknn-toolkit-lite2
预测
yolo predict model='./yolo11s_rknn_model' source='https://ultralytics.com/images/bus.jpg'
Found https://ultralytics.com/images/bus.jpg locally at bus.jpg
image 1/1 /home/radxa/bus.jpg: 640x640 4 persons, 1 bus, 950.2ms
Speed: 30.4ms preprocess, 950.2ms inference, 15.5ms postprocess per image at shape (1, 3, 640, 640)
yolo predict model='./yolo11s_rknn_model' source='https://ultralytics.com/images/bus.jpg'
Found https://ultralytics.com/images/bus.jpg locally at bus.jpg
image 1/1 /home/radxa/bus.jpg: 640x640 4 persons, 1 bus, 950.2ms
Speed: 30.4ms preprocess, 950.2ms inference, 15.5ms postprocess per image at shape (1, 3, 640, 640)
yolo predict model='./yolo11s_rknn_model' source='https://ultralytics.com/images/bus.jpg' Found https://ultralytics.com/images/bus.jpg locally at bus.jpg image 1/1 /home/radxa/bus.jpg: 640x640 4 persons, 1 bus, 950.2ms Speed: 30.4ms preprocess, 950.2ms inference, 15.5ms postprocess per image at shape (1, 3, 640, 640)
1s真的挺慢
缩小下size再试(注意生成模型时也要改imgsz)
yolo predict model='./yolo11s_rknn_model' source='https://ultralytics.com/images/bus.jpg' imgsz=320
image 1/1 /home/radxa/bus.jpg: 320x320 5 persons, 1 bus, 230.5ms
Speed: 11.9ms preprocess, 230.5ms inference, 9.6ms postprocess per image at shape (1, 3, 320, 320)
yolo predict model='./yolo11s_rknn_model' source='https://ultralytics.com/images/bus.jpg' imgsz=320
image 1/1 /home/radxa/bus.jpg: 320x320 5 persons, 1 bus, 230.5ms
Speed: 11.9ms preprocess, 230.5ms inference, 9.6ms postprocess per image at shape (1, 3, 320, 320)
yolo predict model='./yolo11s_rknn_model' source='https://ultralytics.com/images/bus.jpg' imgsz=320 image 1/1 /home/radxa/bus.jpg: 320x320 5 persons, 1 bus, 230.5ms Speed: 11.9ms preprocess, 230.5ms inference, 9.6ms postprocess per image at shape (1, 3, 320, 320)
200ms,好了不少,一般情况320够用了
对比下cpu的:
wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11s.pt
yolo predict model='./yolo11s.pt' source='https://ultralytics.com/images/bus.jpg' imgsz=320 device=cpu
image 1/1 /home/radxa/bus.jpg: 320x256 5 persons, 1 bus, 1845.9ms
Speed: 11.0ms preprocess, 1845.9ms inference, 8.1ms postprocess per image at shape (1, 3, 320, 256)
wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11s.pt
yolo predict model='./yolo11s.pt' source='https://ultralytics.com/images/bus.jpg' imgsz=320 device=cpu
image 1/1 /home/radxa/bus.jpg: 320x256 5 persons, 1 bus, 1845.9ms
Speed: 11.0ms preprocess, 1845.9ms inference, 8.1ms postprocess per image at shape (1, 3, 320, 256)
wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11s.pt yolo predict model='./yolo11s.pt' source='https://ultralytics.com/images/bus.jpg' imgsz=320 device=cpu image 1/1 /home/radxa/bus.jpg: 320x256 5 persons, 1 bus, 1845.9ms Speed: 11.0ms preprocess, 1845.9ms inference, 8.1ms postprocess per image at shape (1, 3, 320, 256)
320下1.8s,相当于加速了9x,还行
3 报错
有时候npu预测会报错,提示版本不匹配,需要更新下so
cd /usr/lib64
sudo su
mv librknnrt.so librknnrt.so.bk
wget https://github.com/airockchip/rknn-toolkit2/raw/refs/heads/master/rknpu2/runtime/Linux/librknn_api/aarch64/librknnrt.so -O aarch64-linux-gnu/librknnrt.so.2.3.0
ln -s aarch64-linux-gnu/librknnrt.so.2.3.0 ./librknnrt.so
cd /usr/lib64
sudo su
mv librknnrt.so librknnrt.so.bk
wget https://github.com/airockchip/rknn-toolkit2/raw/refs/heads/master/rknpu2/runtime/Linux/librknn_api/aarch64/librknnrt.so -O aarch64-linux-gnu/librknnrt.so.2.3.0
ln -s aarch64-linux-gnu/librknnrt.so.2.3.0 ./librknnrt.so
cd /usr/lib64 sudo su mv librknnrt.so librknnrt.so.bk wget https://github.com/airockchip/rknn-toolkit2/raw/refs/heads/master/rknpu2/runtime/Linux/librknn_api/aarch64/librknnrt.so -O aarch64-linux-gnu/librknnrt.so.2.3.0 ln -s aarch64-linux-gnu/librknnrt.so.2.3.0 ./librknnrt.so
4 性能优化
其实rk的npu默认是动态调节频率,所以如果是单张推理,其实性能一开始拉不起来
查看频率
cat /sys/class/devfreq/fde40000.npu/available_frequencies
200000000 297000000 400000000 600000000 700000000 800000000 900000000
cat /sys/class/devfreq/fde40000.npu/governor
rknpu_ondemand
cat /sys/class/devfreq/fde40000.npu/cur_freq
0
cat /sys/class/devfreq/fde40000.npu/available_frequencies
200000000 297000000 400000000 600000000 700000000 800000000 900000000
cat /sys/class/devfreq/fde40000.npu/governor
rknpu_ondemand
cat /sys/class/devfreq/fde40000.npu/cur_freq
0
cat /sys/class/devfreq/fde40000.npu/available_frequencies 200000000 297000000 400000000 600000000 700000000 800000000 900000000 cat /sys/class/devfreq/fde40000.npu/governor rknpu_ondemand cat /sys/class/devfreq/fde40000.npu/cur_freq 0
这里我们不改模式了,直接把min拉起来
echo 600000000 > /sys/class/devfreq/fde40000.npu/min_freq
echo 600000000 > /sys/class/devfreq/fde40000.npu/min_freq
echo 600000000 > /sys/class/devfreq/fde40000.npu/min_freq
效果:
image 1/4 /home/radxa/bus_images/bus1.jpg: 320x320 5 persons, 1 bus, 116.0ms
image 1/4 /home/radxa/bus_images/bus1.jpg: 320x320 5 persons, 1 bus, 116.0ms
image 1/4 /home/radxa/bus_images/bus1.jpg: 320x320 5 persons, 1 bus, 116.0ms
设置成8....,效果:
image 1/4 /home/radxa/bus_images/bus1.jpg: 320x320 5 persons, 1 bus, 108.6ms
image 1/4 /home/radxa/bus_images/bus1.jpg: 320x320 5 persons, 1 bus, 108.6ms
image 1/4 /home/radxa/bus_images/bus1.jpg: 320x320 5 persons, 1 bus, 108.6ms
大概就是3566的极限了,和rknn的modelzoo的跑分基本吻合了(10fps)