==================== Welcome to NanoLLM! ==================== `NanoLLM `_ is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM's, multimodality, speech services, vector databases with RAG, and web frontends. It can be used to build responsive, low-latency interactive agents that can be deployed on Jetson. ---------- Benchmarks ---------- .. raw:: html .. raw:: html For more info, see the `Benchmarks `_ on Jetson AI Lab. ------------- Model Support ------------- * LLM * Llama * Mistral * Mixtral * GPT-2 * GPT-NeoX * SLM * StableLM * Phi-2 * Gemma * TinyLlama * ShearedLlama * OpenLLama * VLM * Llava * VILA * NousHermes/Obsidian * Speech * Riva ASR * Riva TTS * Piper TTS * XTTS See the :ref:`Models` section for more info and API documentation. ------------- Containers ------------- Currently supported on Jetson Orin and JetPack 5/6. Containers are built by `jetson-containers `_ with images available on `DockerHub `_. These are the monthly releases (there are also point releases): .. include:: containers.rst See the :ref:`Release Notes` and :ref:`Installation Guide ` for info about running the containers and samples. ---------- Videos ---------- .. raw:: html .. raw:: html .. raw:: html .. raw:: html .. raw:: html

For more background on generative AI applications in edge devices, visit the `Jetson AI Lab `_. .. toctree:: :maxdepth: 3 :caption: Documentation: install.md models.md chat.md multimodal.md plugins.md agents.md webserver.md utilities.md releases.md