Welcome to NanoLLM!

NanoLLM is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM’s, multimodality, speech services, vector databases with RAG, and web frontends. It can be used to build responsive, low-latency interactive agents that can be deployed on Jetson.

Benchmarks

For more info, see the Benchmarks on Jetson AI Lab.

Model Support

  • LLM

    • Llama

    • Mistral

    • Mixtral

    • GPT-2

    • GPT-NeoX

  • SLM

    • StableLM

    • Phi-2

    • Gemma

    • TinyLlama

    • ShearedLlama

    • OpenLLama

  • VLM

    • Llava

    • VILA

    • NousHermes/Obsidian

  • Speech

    • Riva ASR

    • Riva TTS

    • Piper TTS

    • XTTS

See the Models section for more info and API documentation.

Containers

Currently supported on Jetson Orin and JetPack 5/6. Containers are built by jetson-containers with images available on DockerHub. These are the monthly releases (there are also point releases):

Version

JetPack 5

JetPack 6

main

dustynv/nano_llm:r35.4.1

dustynv/nano_llm:r36.2.0

24.5

dustynv/nano_llm:24.5-r35.4.1

dustynv/nano_llm:24.5-r36.2.0

24.4

dustynv/nano_llm:24.4-r35.4.1

dustynv/nano_llm:24.4-r36.2.0

See the Release Notes and Installation Guide for info about running the containers and samples.

Videos



For more background on generative AI applications in edge devices, visit the Jetson AI Lab.