Installation

Having a complex set of dependencies, currently the recommended installation method is by running the Docker container image built by jetson-containers. First, clone and install that repo:

git clone https://github.com/dusty-nv/jetson-containers
bash jetson-containers/install.sh

Then you can start the nano_llm container like this:

jetson-containers run $(autotag nano_llm)

This will automatically pull/run the latest container image compatible with your version of JetPack-L4T (e.g. dustynv/nano_llm:r36.2.0). For other versions, see the Release Notes and Containers list.

Container Images

The latest builds are from the main branch, with monthly releases and bi-weekly point releases:

Version

JetPack 5

JetPack 6

main

dustynv/nano_llm:r35.4.1

dustynv/nano_llm:r36.2.0

24.5

dustynv/nano_llm:24.5-r35.4.1

dustynv/nano_llm:24.5-r36.2.0

24.4

dustynv/nano_llm:24.4-r35.4.1

dustynv/nano_llm:24.4-r36.2.0

The full list you can find in the Release Notes. To specify a particular version of NanoLLM to run:

jetson-containers run dustynv/nano_llm:24.5-r36.2.0  # stable release for JP6 instead of main

Running Models

Once in the container, you should be able to import nano_llm in a Python3 interpreter, and run the various example commands from the docs like:

python3 -m nano_llm.chat --model meta-llama/Llama-2-7b-chat-hf --api=mlc --quantization q4f16_ft

Or you can run the container & chat command in one go like this:

jetson-containers run \
  --env HUGGINGFACE_TOKEN=hf_abc123def \
  $(./autotag nano_llm) \
  python3 -m nano_llm.chat --api=mlc \
    --model meta-llama/Llama-2-7b-chat-hf \
    --quantization q4f16_ft

Setting your $HUGGINGFACE_TOKEN is for models requiring authentication to download (like Llama-2)

Building In Other Containers

You can either add NanoLLM on top of your container by using it as a base image, or using NanoLLM as the base image in your Dockerfile. When doing the former use the --base argument to jetson-containers/build.sh to build it off your container:

jetson-containers/build.sh --base my_container:latest --name my_container:llm nano_llm

Doing so will also install all the needed dependencies on top of your container (including CUDA, PyTorch, the LLM inference APIs, ect). It should be based on the same version of Ubuntu as JetPack.

And in the event that you want to add your own container on top of NanoLLM - thereby skipping its build process - then you can just use a FROM statement (like FROM dustynv/nano_llm:r36.2.0) at the top of your Dockerfile. Or you can make your own package with jetson-containers for it.