Welcome to NanoLLM!

NanoLLM is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM’s, multimodality, speech services, vector databases with RAG, and web frontends. It can be used to build responsive, low-latency interactive agents that can be deployed on Jetson.

Benchmarks

For more info, see the Benchmarks on Jetson AI Lab.

Model Support

  • LLM

    • Llama

    • Mistral

    • Mixtral

    • GPT-2

    • GPT-NeoX

  • SLM

    • StableLM

    • Phi-2

    • Gemma

    • TinyLlama

    • ShearedLlama

    • OpenLLama

  • VLM

    • Llava

    • VILA

    • NousHermes/Obsidian

  • Speech

    • Riva ASR

    • Riva TTS

    • XTTS

See the Models section for more info and API documentation.

Platform Support

Currently built for Jetson Orin and JetPack 6. Containers are provided by jetson-containers.

Videos



For more background on generative AI applications in edge devices, visit the Jetson AI Lab.