Welcome to NanoLLM!
NanoLLM is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM’s, multimodality, speech services, vector databases with RAG, and web frontends. It can be used to build responsive, low-latency interactive agents that can be deployed on Jetson.
Benchmarks
For more info, see the Benchmarks on Jetson AI Lab.
Model Support
LLM
Llama
Mistral
Mixtral
GPT-2
GPT-NeoX
SLM
StableLM
Phi-2
Gemma
TinyLlama
ShearedLlama
OpenLLama
VLM
Llava
VILA
NousHermes/Obsidian
Speech
Riva ASR
Riva TTS
Piper TTS
XTTS
See the Models section for more info and API documentation.
Containers
Currently supported on Jetson Orin and JetPack 5/6. Containers are built by jetson-containers with images available on DockerHub. These are the monthly releases (there are also point releases):
Version |
JetPack 5 |
JetPack 6 |
---|---|---|
main |
|
|
24.5 |
|
|
24.4 |
|
|
See the Release Notes and Installation Guide for info about running the containers and samples.
Videos
For more background on generative AI applications in edge devices, visit the Jetson AI Lab.