Utilities
Documented on this page are various utility functions that NanoLLM provides, including audio/image manipulation, tensor format conversion, argument parsing, ect.
To use these, import them from nano_llm.utils (like from nano_llm.utils import convert_tensor
)
Tensor Conversion
Audio
- convert_audio(samples, dtype=<class 'numpy.int16'>)[source]
Convert between audio datatypes like float<->int16 and apply sample re-scaling. If the samples are a raw bytes array, it’s assumed that they are in int16 format. Supports audio samples as byte buffer, numpy ndarray, and torch.Tensor. Converted byte buffers will be returned as ndarray, otherwise the same object type as input.
- audio_silent(samples, threshold=0.0)[source]
Detect if the audio samples are silent or muted.
If threshold < 0, false will be returned (silence detection disabled). If threshold > 0, the audio’s average RMS will be compared to the threshold. If threshold = 0, it will check for any non-zero samples (faster than RMS)
Returns true if audio levels are below threshold, otherwise false.
Images
- load_image(path)[source]
Load an image from a local path or URL that will be downloaded.
- Parameters:
path (str) – either a path or URL to the image.
- Returns:
PIL.Image
instance
- is_image(image)[source]
Returns true if the object is a PIL.Image, np.ndarray, torch.Tensor, or jetson_utils.cudaImage
- cuda_image(image)[source]
Convert an image from PIL.Image, np.ndarray, torch.Tensor, or __gpu_array_interface__ to a jetson_utils.cudaImage on the GPU (without using memory copies when possible)
Argument Parsing
- class ArgParser(extras=['model', 'chat', 'generation', 'log'], **kwargs)[source]
Bases:
ArgumentParser
Dynamically adds extra command-line args that are commonly used by various subsystems.
- Defaults = ['model', 'chat', 'generation', 'log']
The default options for model loading, chat, generation config, and logging.
- Audio = ['audio_input', 'audio_output']
Audio device I/O options
- Video = ['video_input', 'video_output']
Video streaming I/O options
- Riva = ['asr', 'tts']
ASR/TTS model options
- __init__(extras=['model', 'chat', 'generation', 'log'], **kwargs)[source]
Populate an
argparse.ArgumentParser
with additional options as specified by the provided extras.
- static parse_prompt_args(prompts, chat=True)[source]
Parse prompt command-line argument and return list of prompts. It’s assumed that the argparse argument was created like this:
parser.add_argument('--prompt', action='append', nargs='*')
If the prompt text is ‘default’, then default chat prompts will be assigned if
chat=True
(otherwise default completion prompts)