Welcome to NanoLLM!

NanoLLM is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM’s, multimodality, speech services, vector databases with RAG, and web frontends. It can be used to build responsive, low-latency interactive agents that can be deployed on Jetson.

Benchmarks

For more info, see the Benchmarks on Jetson AI Lab.

Model Support

LLM
- Llama
- Mistral
- Mixtral
- GPT-2
- GPT-NeoX
SLM
- StableLM
- Phi-2
- Gemma
- TinyLlama
- ShearedLlama
- OpenLLama
VLM
- Llava
- VILA
- NousHermes/Obsidian
Speech
- Riva ASR
- Riva TTS
- XTTS

See the Models section for more info and API documentation.

Platform Support

Currently built for Jetson Orin and JetPack 6. Containers are provided by jetson-containers.

Videos

For more background on generative AI applications in edge devices, visit the Jetson AI Lab.