Welcome to NanoLLM!

NanoLLM is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM’s, multimodality, speech services, vector databases with RAG, and web frontends. It can be used to build responsive, low-latency interactive agents that can be deployed on Jetson.

Benchmarks

For more info, see the Benchmarks on Jetson AI Lab.

Model Support

LLM
- Llama
- Mistral
- Mixtral
- GPT-2
- GPT-NeoX
SLM
- StableLM
- Phi-2
- Gemma
- TinyLlama
- ShearedLlama
- OpenLLama
VLM
- Llava
- VILA
- NousHermes/Obsidian
Speech
- Riva ASR
- Riva TTS
- Piper TTS
- XTTS

See the Models section for more info and API documentation.

Containers

Currently supported on Jetson Orin and JetPack 5/6. Containers are built by jetson-containers with images available on DockerHub. These are the monthly releases (there are also point releases):

Version	JetPack 5	JetPack 6
main	`dustynv/nano_llm:r35.4.1`	`dustynv/nano_llm:r36.2.0`
24.5	`dustynv/nano_llm:24.5-r35.4.1`	`dustynv/nano_llm:24.5-r36.2.0`
24.4	`dustynv/nano_llm:24.4-r35.4.1`	`dustynv/nano_llm:24.4-r36.2.0`

See the Release Notes and Installation Guide for info about running the containers and samples.

Videos

For more background on generative AI applications in edge devices, visit the Jetson AI Lab.

Documentation: