Welcome to NanoLLM!
NanoLLM is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM’s, multimodality, speech services, vector databases with RAG, and web frontends. It can be used to build responsive, low-latency interactive agents that can be deployed on Jetson.
Benchmarks
For more info, see the Benchmarks on Jetson AI Lab.
Model Support
LLM
Llama
Mistral
Mixtral
GPT-2
GPT-NeoX
SLM
StableLM
Phi-2
Gemma
TinyLlama
ShearedLlama
OpenLLama
VLM
Llava
VILA
NousHermes/Obsidian
Speech
Riva ASR
Riva TTS
XTTS
See the Models section for more info and API documentation.
Platform Support
Currently built for Jetson Orin and JetPack 6. Containers are provided by jetson-containers.
Videos
For more background on generative AI applications in edge devices, visit the Jetson AI Lab.