Utilities

Documented on this page are various utility functions that NanoLLM provides, including audio/image manipulation, tensor format conversion, argument parsing, ect.

To use these, import them from nano_llm.utils (like from nano_llm.utils import convert_tensor)

Tensor Conversion

convert_dtype(dtype, to='np')[source]: Convert a string, numpy type, or torch.dtype to either numpy or PyTorch

convert_tensor(tensor, return_tensors='pt', device=None, dtype=None, **kwargs)[source]: Convert tensors between numpy/torch/ect

Audio

convert_audio(samples, dtype=<class 'numpy.int16'>)[source]: Convert between audio datatypes like float<->int16 and apply sample re-scaling. If the samples are a raw bytes array, it’s assumed that they are in int16 format. Supports audio samples as byte buffer, numpy ndarray, and torch.Tensor. Converted byte buffers will be returned as ndarray, otherwise the same object type as input.

audio_rms(samples)[source]: Compute the average audio RMS (returns a float between 0 and 1)

audio_silent(samples, threshold=0.0)[source]

Detect if the audio samples are silent or muted.

If threshold < 0, false will be returned (silence detection disabled). If threshold > 0, the audio’s average RMS will be compared to the threshold. If threshold = 0, it will check for any non-zero samples (faster than RMS)

Returns true if audio levels are below threshold, otherwise false.

Images

load_image(path)[source]

Load an image from a local path or URL that will be downloaded.

Parameters:: path (str) – either a path or URL to the image.
Returns:: PIL.Image instance

is_image(image)[source]: Returns true if the object is a PIL.Image, np.ndarray, torch.Tensor, or jetson_utils.cudaImage

cuda_image(image)[source]: Convert an image from PIL.Image, np.ndarray, torch.Tensor, or __gpu_array_interface__ to a jetson_utils.cudaImage on the GPU (without using memory copies when possible)

torch_image(image, dtype=None, device=None)[source]: Convert the image to a type that is compatible with PyTorch (torch.Tensor, ndarray, PIL.Image)

image_size(image)[source]: Returns the dimensions of the image as a (height, width, channels) tuple.

Argument Parsing

class ArgParser(extras=['model', 'chat', 'generation', 'log'], **kwargs)[source]

Bases: ArgumentParser

Dynamically adds extra command-line args that are commonly used by various subsystems.

Defaults = ['model', 'chat', 'generation', 'log']: The default options for model loading, chat, generation config, and logging.

Audio = ['audio_input', 'audio_output']: Audio device I/O options

Video = ['video_input', 'video_output']: Video streaming I/O options

Riva = ['asr', 'tts']: ASR/TTS model options

__init__(extras=['model', 'chat', 'generation', 'log'], **kwargs)[source]: Populate an argparse.ArgumentParser with additional options as specified by the provided extras.

parse_args(**kwargs)[source]: Override for parse_args() that does some additional configuration

static parse_prompt_args(prompts, chat=True)[source]

Parse prompt command-line argument and return list of prompts. It’s assumed that the argparse argument was created like this:

parser.add_argument('--prompt', action='append', nargs='*')

If the prompt text is ‘default’, then default chat prompts will be assigned if chat=True (otherwise default completion prompts)