> ## Documentation Index
> Fetch the complete documentation index at: https://geniex.aihub.qualcomm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# API reference

> Complete API documentation for the GenieX Python SDK.

The GenieX Python SDK follows the same design patterns as Hugging Face `transformers` — use `AutoModel*.from_pretrained()` to load a model, then call `.generate()` for inference.

***

## AutoModelForCausalLM

Factory for loading causal language models — both text-only and multimodal. Returns a [`GenieXLLM`](#geniexllm) for text-only models, or a [`GenieXVLM`](#geniexvlm) when a multimodal model is detected (e.g. `phi4_multimodal`, `qwen3.5-vl`, `gemma4`).

```python theme={"dark"}
from geniex import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "ai-hub-models/Qwen3-4B-Instruct",
    device_map="auto",
)
```

### `from_pretrained()`

| Parameter            | Type          | Default    | Description                                                                                                                                               |
| -------------------- | ------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model_name_or_path` | `str`         | *required* | HuggingFace repo id, short alias (e.g. `"qwen3"`), or local path.                                                                                         |
| `device_map`         | `str`         | `"auto"`   | `"auto"` picks the first available runtime + compute unit. Also accepts `"<runtime>"` (runtime) or `"<runtime>:<compute_unit>"` (runtime + compute unit). |
| `precision`          | `str \| None` | `None`     | Precision (quantization) variant (e.g. `"Q4_K_M"`). Filters files when downloading from Hub.                                                              |
| `mmproj_path`        | `str \| None` | `None`     | Path to the multimodal projector file. Auto-resolved from Hub; pass explicitly to force VLM mode.                                                         |

<Expandable title="Additional parameters">
  | Parameter    | Type          | Default | Description                                                                         |
  | ------------ | ------------- | ------- | ----------------------------------------------------------------------------------- |
  | `model_name` | `str \| None` | `None`  | Override the registry model name (e.g. `"granite4"` for Qualcomm AI Engine Direct). |
  | `hf_token`   | `str \| None` | `None`  | HuggingFace bearer token for gated models.                                          |
  | `max_tokens` | `int`         | —       | Maximum tokens for generation.                                                      |
</Expandable>

**Returns:** [`GenieXLLM`](#geniexllm) or [`GenieXVLM`](#geniexvlm) (auto-detected based on the model)

***

## AutoModelForVision2Seq

Factory for loading vision-language / multimodal models. Returns a [`GenieXVLM`](#geniexvlm) instance.

```python theme={"dark"}
from geniex import AutoModelForVision2Seq

model = AutoModelForVision2Seq.from_pretrained("ai-hub-models/Qwen2.5-7B-Instruct", device_map="qairt")
```

### `from_pretrained()`

Accepts all the same parameters as [`AutoModelForCausalLM.from_pretrained()`](#from_pretrained) plus:

| Parameter     | Type          | Default | Description                                                                     |
| ------------- | ------------- | ------- | ------------------------------------------------------------------------------- |
| `mmproj_path` | `str \| None` | `None`  | Path to the multimodal projector file. Auto-resolved when downloading from Hub. |

**Returns:** [`GenieXVLM`](#geniexvlm)

***

## GenieXLLM

Text-only language model instance returned by `AutoModelForCausalLM.from_pretrained()`.

### `generate()`

Run text generation from a formatted prompt string.

```python theme={"dark"}
output = model.generate(prompt, max_new_tokens=256)
print(output.text)
```

| Parameter        | Type    | Default    | Description                                                                            |
| ---------------- | ------- | ---------- | -------------------------------------------------------------------------------------- |
| `prompt`         | `str`   | *required* | The formatted prompt string (use `model.tokenizer.apply_chat_template()` to build it). |
| `max_new_tokens` | `int`   | `512`      | Maximum number of tokens to generate.                                                  |
| `temperature`    | `float` | `0.7`      | Sampling temperature.                                                                  |
| `top_p`          | `float` | `0.9`      | Nucleus sampling threshold.                                                            |
| `top_k`          | `int`   | `40`       | Top-k sampling.                                                                        |
| `min_p`          | `float` | `0.0`      | Minimum probability threshold.                                                         |
| `stream`         | `bool`  | `False`    | If `True`, returns a [`TextIteratorStreamer`](#textiteratorstreamer) instead.          |

<Expandable title="Penalty and advanced parameters">
  | Parameter            | Type                | Default | Description                                     |
  | -------------------- | ------------------- | ------- | ----------------------------------------------- |
  | `repetition_penalty` | `float`             | `1.1`   | Repetition penalty.                             |
  | `presence_penalty`   | `float`             | `0.0`   | Presence penalty.                               |
  | `frequency_penalty`  | `float`             | `0.0`   | Frequency penalty.                              |
  | `seed`               | `int`               | `-1`    | Random seed (`-1` = random).                    |
  | `stop`               | `list[str] \| None` | `None`  | Stop sequences.                                 |
  | `grammar`            | `str \| None`       | `None`  | GBNF grammar string for constrained generation. |
  | `json_mode`          | `bool`              | `False` | Force JSON output.                              |
</Expandable>

**Returns:** [`GenerateOutput`](#generateoutput) (or [`TextIteratorStreamer`](#textiteratorstreamer) when `stream=True`)

### `reset()`

Resets conversation state and clears the KV cache.

### `save_kv_cache(path)` / `load_kv_cache(path)`

Save or load the key-value cache to/from a file path (`str`).

### `close()`

Releases the model handle and frees resources. Also supports context-manager usage:

```python theme={"dark"}
with AutoModelForCausalLM.from_pretrained("qwen3") as model:
    output = model.generate(prompt)
```

***

## GenieXVLM

Vision-language model instance returned by `AutoModelForVision2Seq.from_pretrained()`.

### `generate()`

Same parameters as [`GenieXLLM.generate()`](#generate) plus:

| Parameter | Type                | Default | Description                                        |
| --------- | ------------------- | ------- | -------------------------------------------------- |
| `images`  | `list[str] \| None` | `None`  | List of image file paths for the model to process. |
| `audios`  | `list[str] \| None` | `None`  | List of audio file paths for the model to process. |

```python theme={"dark"}
output = model.generate(prompt, images=["/path/to/image.jpg"], max_new_tokens=256)
print(output.text)
```

**Returns:** [`GenerateOutput`](#generateoutput) (or [`TextIteratorStreamer`](#textiteratorstreamer) when `stream=True`)

### `reset()` / `close()`

Same as [`GenieXLLM`](#geniexllm).

***

## ModelTokenizer

Accessed via `model.tokenizer`. Provides a `transformers`-compatible chat template interface.

### `apply_chat_template()`

Formats a list of chat messages using the model's built-in chat template.

```python theme={"dark"}
messages = [{"role": "user", "content": "What is 2+2?"}]
prompt = model.tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True,
)
```

| Parameter               | Type                        | Default    | Description                                                                                                                                                                                                                                                                                                                                                            |
| ----------------------- | --------------------------- | ---------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `messages`              | `list[dict]`                | *required* | List of message dicts with `"role"` and `"content"` keys.                                                                                                                                                                                                                                                                                                              |
| `tokenize`              | `bool`                      | `False`    | Must be `False` — standalone tokenization is not supported.                                                                                                                                                                                                                                                                                                            |
| `add_generation_prompt` | `bool`                      | `True`     | Whether to append the generation prompt suffix.                                                                                                                                                                                                                                                                                                                        |
| `enable_thinking`       | `bool \| None`              | `None`     | Enable thinking mode. `None` (default) and `True` both let a thinking-capable model think (and are no-ops on non-thinking models). `False` asks a thinking-capable model to skip its thinking turn — forced to `True` with a warning on non-thinking models, where the suppression block is OOD. Capability is auto-detected and exposed as `model.supports_thinking`. |
| `tools`                 | `list[dict] \| str \| None` | `None`     | Tool definitions as a list of dicts or pre-serialised JSON string.                                                                                                                                                                                                                                                                                                     |

**Returns:** `str` — formatted prompt ready for `model.generate()`.

***

## Output classes

### GenerateOutput

Returned by `model.generate()`.

| Attribute  | Type                          | Description                                             |
| ---------- | ----------------------------- | ------------------------------------------------------- |
| `text`     | `str`                         | The generated text (thinking tags stripped if present). |
| `thinking` | `str \| None`                 | The model's reasoning content, or `None`.               |
| `profile`  | [`ProfileData`](#profiledata) | Performance metrics.                                    |

### ProfileData

| Attribute          | Type          | Description                                       |
| ------------------ | ------------- | ------------------------------------------------- |
| `ttft`             | `int`         | Time to first token (ms).                         |
| `prompt_tokens`    | `int`         | Number of prompt tokens.                          |
| `generated_tokens` | `int`         | Number of generated tokens.                       |
| `prefill_speed`    | `float`       | Prefill speed (tokens/s).                         |
| `decode_speed`     | `float`       | Decode speed (tokens/s).                          |
| `stop_reason`      | `str \| None` | Why generation stopped (e.g. `"eos"`, `"limit"`). |

<Expandable title="All timing fields">
  | Attribute     | Type  | Description                  |
  | ------------- | ----- | ---------------------------- |
  | `prompt_time` | `int` | Prompt processing time (ms). |
  | `decode_time` | `int` | Decode time (ms).            |
</Expandable>

### TextIteratorStreamer

Returned by `model.generate(..., stream=True)`. Yields decoded text chunks as they are generated.

```python theme={"dark"}
streamer = model.generate(prompt, max_new_tokens=256, stream=True)
for chunk in streamer:
    print(chunk, end="", flush=True)

final = streamer.output  # GenerateOutput available after iteration
```

| Method / Property | Description                                                    |
| ----------------- | -------------------------------------------------------------- |
| `__iter__()`      | Yields `str` chunks as they are generated.                     |
| `output`          | `GenerateOutput \| None` — available after iteration finishes. |
| `cancel()`        | Stop generation at the next token boundary.                    |

***

## Model manager

The same model manager the CLI uses is available programmatically via `geniex.model_manager`.

```python theme={"dark"}
from geniex import model_manager as mm

MODEL = "Qwen/Qwen3-0.6B-GGUF"
mm.pull(MODEL)
print(f"pull complete: {MODEL}")
paths = mm.get_paths(MODEL)
print(f"model path:    {paths}")
local_models = mm.list_models()
print(f"local models:  {local_models}")
mm.remove(MODEL)
print(f"model removed: {MODEL}")
```

| Function                | Description                                          |
| ----------------------- | ---------------------------------------------------- |
| `pull(model_name, ...)` | Download a model by alias or `org/repo[:precision]`. |
| `list_models()`         | Returns `list[str]` of cached model names.           |
| `get_paths(model_name)` | Returns `ModelPaths` with resolved local file paths. |
| `get_type(model_name)`  | Returns `"llm"` or `"vlm"`.                          |
| `resolve_alias(alias)`  | Resolves a short alias to canonical `org/repo`.      |
| `remove(model_name)`    | Delete a cached model from disk.                     |
| `clean()`               | Remove all cached models. Returns count removed.     |

<Expandable title="pull() parameters">
  | Parameter     | Type               | Default    | Description                                                               |
  | ------------- | ------------------ | ---------- | ------------------------------------------------------------------------- |
  | `model_name`  | `str`              | *required* | `org/repo` or a short alias.                                              |
  | `precision`   | `str \| None`      | `None`     | Precision (quantization) hint (e.g. `"Q4_K_M"`).                          |
  | `hub`         | `str`              | `"auto"`   | `"auto"` \| `"hf"` \| `"localfs"`.                                        |
  | `local_path`  | `str \| None`      | `None`     | Source directory (required when `hub="localfs"`).                         |
  | `hf_token`    | `str \| None`      | `None`     | HuggingFace bearer token.                                                 |
  | `on_progress` | `Callable \| None` | `None`     | Callback `(files: list[FileProgress]) -> bool`; return `False` to cancel. |
</Expandable>

<Expandable title="ModelPaths attributes">
  | Attribute        | Type          | Description                           |
  | ---------------- | ------------- | ------------------------------------- |
  | `model_path`     | `str`         | Path to the main model file.          |
  | `model_dir`      | `str`         | Directory containing the model.       |
  | `model_name`     | `str`         | Canonical model name.                 |
  | `runtime`        | `str`         | Runtime identifier.                   |
  | `mmproj_path`    | `str \| None` | Multimodal projector path (VLM only). |
  | `tokenizer_path` | `str \| None` | Tokenizer file path.                  |
  | `compute_unit`   | `str \| None` | Compute unit identifier.              |
</Expandable>

***

## SDK functions

| Function                                | Description                                                                                         |
| --------------------------------------- | --------------------------------------------------------------------------------------------------- |
| `geniex.init()`                         | Initialize the SDK. Called automatically on first model load.                                       |
| `geniex.deinit()`                       | Shut down the SDK and release resources.                                                            |
| `geniex.version()`                      | Returns the SDK version string.                                                                     |
| `geniex.get_runtime_list()`             | Returns `list[str]` of available runtime IDs.                                                       |
| `geniex.get_compute_unit_list(runtime)` | Returns `list[tuple[str, str]]` of `(compute_unit, compute_unit_name)` pairs for the given runtime. |

<br />

<div class="feedback-wrapper">
  <span class="feedback-label">Was this page helpful?</span>

  <div class="feedback-toggle">
    <input type="radio" name="feedback" id="feedback-yes" class="feedback-input" />

    <label for="feedback-yes" class="feedback-button">
      <img src="https://mintcdn.com/qualcomm-0801e48b/VijZ6eXFSGIGNc9Z/Images/FeedBack/thumbs-up.svg?fit=max&auto=format&n=VijZ6eXFSGIGNc9Z&q=85&s=9fa1e132909ee8667d02f775cd8bc108" alt="Thumbs up" class="feedback-icon" noZoom width="14" height="14" data-path="Images/FeedBack/thumbs-up.svg" />

      Yes
    </label>

    <input type="radio" name="feedback" id="feedback-no" class="feedback-input" />

    <label for="feedback-no" class="feedback-button">
      <img src="https://mintcdn.com/qualcomm-0801e48b/VijZ6eXFSGIGNc9Z/Images/FeedBack/thumbs-down.svg?fit=max&auto=format&n=VijZ6eXFSGIGNc9Z&q=85&s=7f5f8865b502b5262849a700e6989278" alt="Thumbs down" class="feedback-icon" noZoom width="14" height="14" data-path="Images/FeedBack/thumbs-down.svg" />

      No
    </label>
  </div>
</div>
