Nvidia Jetson Orin Nano & Ultralytics.

In this post I'll describe how to install Ultralytics on Jetson Orin Nano to be able to export and run model (use for inference).

The use case I was doing is:

  1. Train YOLO-model for the image recognition using PC / Nvidia GPU.
  2. Then copy trained model to the Jetson Orin Nano.
  3. Export model to TensorFlow format.
  4. Run inference.

Since Ultralytics is a Python package I setup everything in the virtual env. Setting up venv is beneficial as it will not leak specific libraries from the project you working on into the global namespace. This helps to keep your global environment clean and prevent potential issues with lib version incompatibilities (thinking about DLL-hell). Such a situation with different libraries have a different dependencies is common across Python ecosystem especially in the ML/data science space.

First I followed this article from Ultralytics team, but I got several errors during the install process. I spent several days to make everything works. And that's why I decided to share the knowledge I got during the process.

Important note: installing Jetpack on Orin Nano also installs multiple libraries compiled specifically for Orin Nano architecture in the global space. This was a critical part of understanding to make everything works.

I used one of the publicly available chess datasets to train my model.

Setup venv, packages and Ultralytics.

Here is step by step guide:
1. Add venv:

sudo apt install python3.10-venv

  1. Create virtual environment using --system-site-packages. This is the critical part.
💡
The term venv system-site-packages refers to an option used when creating a Python virtual environment to allow it to inherit packages installed in the global system's site-packages directory

python -m venv .venv --system-site-packages

  1. Now, as usual, activating created venv:

source .venv/bin/activate

I usually double check that venv was activated by using which python.

  1. I combined the following requirements.txt to install all required components. This was also tricky. Nvidia forums has links to the old servers which are not supported anymore and the new server where they outputs compiled libraires. Note: Nvidia updates compiled libraries from time to time, so if you are getting errors related to links to the libraries just check their server for updated links.
--extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v61
tensorflow==2.16.1+nv24.08

# torch 2.8.0
https://pypi.jetson-ai-lab.io/jp6/cu126/+f/62a/1beee9f2f1470/torch-2.8.0-cp310-cp310-linux_aarch64.whl#sha256=62a1beee9f2f147076a974d2942c90060c12771c94740830327cae705b2595fc
# torch audio 2.80
https://pypi.jetson-ai-lab.io/jp6/cu126/+f/81a/775c8af36ac85/torchaudio-2.8.0-cp310-cp310-linux_aarch64.whl#sha256=81a775c8af36ac859fb3f4a1b2f662d5fcf284a835b6bb4ed8d0827a6aa9c0b7
#torch vision
https://pypi.jetson-ai-lab.io/jp6/cu126/+f/907/c4c1933789645/torchvision-0.23.0-cp310-cp310-linux_aarch64.whl#sha256=907c4c1933789645ebb20dd9181d40f8647978e6bd30086ae7b01febb937d2d1

onnx==1.16.2
onnxslim>=0.1.67
# onnx runtime
https://pypi.jetson-ai-lab.io/jp6/cu126/+f/4eb/e6a8902dc7708/onnxruntime_gpu-1.23.0-cp310-cp310-linux_aarch64.whl#sha256=4ebe6a8902dc7708434b2e1541b3fe629ebf434e16ab5537d1d6a622b42c622b
ultralytics[export]
  1. Install packages:
    pip install --upgrade -r requirements.txt
  2. Give it some time and everything should be installed without any errors.

Converting model

I copied trained model from my pc to Jetson Orin Nano and use the following python script to export trained model for tensorrt:

from ultralytics import YOLO 

model = YOLO("./weights/best.pt") 
model.export(format="engine", half=True)

Notice: I didn't include device="dla:0" parameter, as Jetson Orin Nano has only GPU and doesn't have a DLA cores. If you have more advanced device it may be worth to use DLA capabilities by including this parameter.

Ultralytics 8.3.235 🚀 Python-3.10.12 torch-2.8.0 CUDA:0 (Orin, 7620MiB)
YOLO11n summary (fused): 100 layers, 2,583,322 parameters, 0 gradients, 6.3 GFLOPs

PyTorch: starting from 'weights/best.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 10, 8400) (5.2 MB)

ONNX: starting export with onnx 1.16.2 opset 20...
ONNX: slimming with onnxslim 0.1.78...
ONNX: export success ✅ 2.7s, saved as 'weights/best.onnx' (10.1 MB)

TensorRT: starting export with TensorRT 10.3.0...
[12/10/2025-20:01:50] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 702, GPU 3433 (MiB)
[12/10/2025-20:01:52] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +927, GPU +912, now: CPU 1672, GPU 4390 (MiB)
[12/10/2025-20:01:52] [TRT] [I] ----------------------------------------------------------------
[12/10/2025-20:01:52] [TRT] [I] Input filename:   weights/best.onnx
[12/10/2025-20:01:52] [TRT] [I] ONNX IR version:  0.0.9
[12/10/2025-20:01:52] [TRT] [I] Opset version:    20
[12/10/2025-20:01:52] [TRT] [I] Producer name:    pytorch
[12/10/2025-20:01:52] [TRT] [I] Producer version: 2.8.0
[12/10/2025-20:01:52] [TRT] [I] Domain:
[12/10/2025-20:01:52] [TRT] [I] Model version:    0
[12/10/2025-20:01:52] [TRT] [I] Doc string:
[12/10/2025-20:01:52] [TRT] [I] ----------------------------------------------------------------
TensorRT: input "images" with shape(1, 3, 640, 640) DataType.FLOAT
TensorRT: output "output0" with shape(1, 10, 8400) DataType.FLOAT
TensorRT: building FP16 engine as weights/best.engine
[12/10/2025-20:01:52] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[12/10/2025-20:08:48] [TRT] [I] Detected 1 inputs and 1 output network tensors.
[12/10/2025-20:08:52] [TRT] [I] Total Host Persistent Memory: 533200
[12/10/2025-20:08:52] [TRT] [I] Total Device Persistent Memory: 1024
[12/10/2025-20:08:52] [TRT] [I] Total Scratch Memory: 1382400
[12/10/2025-20:08:52] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 165 steps to complete.
[12/10/2025-20:08:52] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 20.0584ms to assign 10 blocks to 165 nodes requiring 9523712 bytes.
[12/10/2025-20:08:52] [TRT] [I] Total Activation Memory: 9523200
[12/10/2025-20:08:52] [TRT] [I] Total Weights Memory: 5252484
[12/10/2025-20:08:53] [TRT] [I] Engine generation completed in 420.449 seconds.
[12/10/2025-20:08:53] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 137 MiB
[12/10/2025-20:08:53] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 2936 MiB
TensorRT: export success ✅ 426.1s, saved as 'weights/best.engine' (8.3 MB)

Export complete (427.3s)

Now the model is ready to be used for inference

Inference

For inference I just get the chess set I have and made several photos using my phone. Then I used these photos to check model performance/accuracy.

Here is the python script:

from ultralytics import YOLO
import cv2
from matplotlib import pyplot as plt
import os
import glob
from pathlib import Path

model_path = Path("weights/best.engine")

model = YOLO(model_path, task='detect')


image_folder = Path("chess_images")
image_paths = glob.glob(os.path.join(image_folder, "*.jpg"))

print ("start infer")
for img_path in image_paths:
  print("----------------------------------------------------------------------------------------")
  print(f"Processing {img_path}")
  src = cv2.imread(img_path)
  result = model.predict(source=src, save=False, save_txt=False)
  for r in result:
    boxes = r.boxes
    names = r.names
    for box in boxes:
      cls_id = int(box.cls[0])
      conf = box.conf[0]
      print(f"Class: {names[cls_id]}, confidence: {conf}")
    print("----------------------------------------------------------------------------------------")
    print("\n")
start infer

Loading weights/best.engine for TensorRT inference...
[12/10/2025-20:18:36] [TRT] [I] Loaded engine size: 8 MiB
[12/10/2025-20:18:36] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +9, now: CPU 0, GPU 14 (MiB)

--------------------------------------------------------------------------------
Processing chess_images/white_knight.jpg

0: 640x640 1 pawn, 1 rook, 12.7ms
Speed: 7.8ms preprocess, 12.7ms inference, 9.0ms postprocess per image at shape (1, 3, 640, 640)
Class: rook, confidence: 0.44677734375
Class: pawn, confidence: 0.44677734375
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Processing chess_images/balck_rook.jpg

0: 640x640 1 rook, 11.9ms
Speed: 4.5ms preprocess, 11.9ms inference, 5.5ms postprocess per image at shape (1, 3, 640, 640)
Class: rook, confidence: 0.9892578125
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Processing chess_images/white_pawn.jpg

0: 640x640 1 pawn, 11.8ms
Speed: 4.3ms preprocess, 11.8ms inference, 5.1ms postprocess per image at shape (1, 3, 640, 640)
Class: pawn, confidence: 0.9697265625
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Processing chess_images/black_king.jpg

0: 640x640 1 king, 11.8ms
Speed: 4.4ms preprocess, 11.8ms inference, 5.2ms postprocess per image at shape (1, 3, 640, 640)
Class: king, confidence: 0.93701171875
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Processing chess_images/black_queen.jpg

0: 640x640 1 queen, 11.8ms
Speed: 4.4ms preprocess, 11.8ms inference, 5.2ms postprocess per image at shape (1, 3, 640, 640)
Class: queen, confidence: 0.83984375
--------------------------------------------------------------------------------

Hope this article will be helpful. Don't hesitate to reach out to me if you have any questions.

Thank you.