> ## Documentation Index
> Fetch the complete documentation index at: https://geniex.aihub.qualcomm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# API 参考

> GenieX Android SDK —— 运行环境 / 计算单元选择、模型管理与 LLM/VLM 推理 API。

## **运行环境与计算单元选择**

### 运行环境

通过 `runtime_id` 选择推理运行环境：

```kotlin theme={"dark"}
val runtime_id: String?   // "llama_cpp" | "qairt" | null
```

| `runtime_id`  | 运行环境                        | 模型格式                    | 计算单元                           |
| ------------- | --------------------------- | ----------------------- | ------------------------------ |
| `"llama_cpp"` | llama.cpp + GGML Hexagon 后端 | GGUF                    | CPU / Adreno GPU / Hexagon NPU |
| `"qairt"`     | Qualcomm® AI Engine Direct  | Qualcomm AI Hub 预编译 bin | 仅 Hexagon NPU                  |
| `null`        | SDK 根据模型路径自动判断              | —                       | —                              |

常量定义见 [`RuntimeIdValue`](https://github.com/qualcomm/GenieX/blob/main/bindings/android/app/src/main/java/com/geniex/sdk/bean/InputPluginBase.kt)（`LLAMA_CPP`、`QAIRT`）。

### 计算单元

给原生 SDK `geniex_resolve_device` 的计算单元别名。

```kotlin theme={"dark"}
val compute_unit: String?   // "cpu" | "gpu" | "npu" | null
```

| 别名      | 行为                                       |
| ------- | ---------------------------------------- |
| `null`  | 运行环境默认值——`llama_cpp` 与 `qairt` 均为 `npu`。 |
| `"npu"` | Hexagon NPU 加速。骁龙上推荐。                    |
| `"gpu"` | 通过 OpenCL 使用 Adreno GPU（仅 `llama_cpp`）。  |
| `"cpu"` | 纯 CPU。强制 `nGpuLayers = 0`。               |

<Note>
  Qualcomm AI Engine Direct 仅支持 NPU。对 Qualcomm AI Hub Model 传 `"cpu"` 或 `"gpu"` 仅记录告警并回退到 NPU，不会报错。
</Note>

***

## **模型管理器**

模型通过内置的 Rust 模型管理器在设备上拉取。**请勿**手动 `adb push` 权重——使用 `ModelManagerWrapper`。

### `ModelManagerWrapper`

```kotlin theme={"dark"}
// Init (idempotent — safe to call on every Activity.onCreate).
GenieXSdk.getInstance().init(context)

// Pull with streaming progress.
ModelManagerWrapper.pullFlow(
    ModelPullInput(
        model_name = "unsloth/Qwen3-0.6B-GGUF",
        precision  = "Q4_0",
        hub        = HubSource.HUGGINGFACE,
    )
).collect { event ->
    when (event) {
        is ModelManagerWrapper.PullEvent.Progress  -> /* update UI */
        ModelManagerWrapper.PullEvent.Completed    -> /* done */
        is ModelManagerWrapper.PullEvent.Error     -> /* show error */
    }
}

// Resolve on-disk paths for a previously-pulled model.
val paths: ModelPaths? = ModelManagerWrapper.getPaths("unsloth/Qwen3-0.6B-GGUF")

// Inventory / cleanup.
ModelManagerWrapper.list()              // List<String>
ModelManagerWrapper.remove("org/repo") // 0 = success
ModelManagerWrapper.clean()            // wipe all cached models
```

### `ModelPullInput`

```kotlin theme={"dark"}
data class ModelPullInput(
    val model_name:   String,                     // "org/repo" or alias
    val precision:    String? = null,             // precision (quantization) e.g. "Q4_0", "Q4_K_M"
    val hub:          HubSource = HubSource.AUTO, // AUTO routes by model_name
    val local_path:   String? = null,             // only when hub == LOCALFS
    val hf_token:     String? = null,             // falls back to GENIEX_HFTOKEN env
    val chipset:      String? = null,             // required for Qualcomm AI Hub on Android (e.g. "SM8750")
    val display_name: String? = null,
)
```

### `HubSource`

```kotlin theme={"dark"}
enum class HubSource(val value: Int) {
    AUTO(0),          // routes by prefix (e.g. ai-hub-models/* → AIHUB)
    HUGGINGFACE(1),
    MODELSCOPE(2),
    AIHUB(3),
    VOLCES(4),
    LOCALFS(127),
}
```

### `ModelPaths`

由 `getPaths()` 返回。其字段可直接传入 `LlmCreateInput` / `VlmCreateInput`：

```kotlin theme={"dark"}
data class ModelPaths(
    val model_path:     String,
    val model_dir:      String,
    val model_name:     String,
    val runtime_id:      String,            // authoritative — prefer over UI selection
    val mmproj_path:    String? = null,    // VLM projection weights
    val tokenizer_path: String? = null,
    val compute_unit:      String? = null,
)
```

<Note>
  在 Android 上拉取 Qualcomm AI Hub **必须**显式传 `chipset`。Rust 侧仅在骁龙 Windows 上自动识别。骁龙 8 至尊版用 `"SM8750"`，骁龙 8 至尊版 Gen 5 用 `"SM8850"`。
</Note>

***

## **数据结构**

### `LlmCreateInput`

```kotlin theme={"dark"}
data class LlmCreateInput(
    val model_name:     String,
    val model_path:     String,
    val tokenizer_path: String? = null,
    val config:         ModelConfig,
    val runtime_id:      String? = null,
    val compute_unit:      String? = null,
)
```

### `VlmCreateInput`

```kotlin theme={"dark"}
data class VlmCreateInput(
    val model_name:  String,
    val model_path:  String,
    val mmproj_path: String? = null,     // vision projection weights (GGUF VLMs)
    val config:      ModelConfig,
    val runtime_id:   String? = null,
    val compute_unit:   String? = null,
)
```

### `ModelConfig`

```kotlin theme={"dark"}
data class ModelConfig(
    var nCtx:                  Int     = 2048,     // context size; 0 = model default
    var nThreads:              Int     = 8,
    var nThreadsBatch:         Int     = 8,
    var nBatch:                Int     = 2048,
    var nUBatch:               Int     = 512,
    var nSeqMax:               Int     = 1,
    var nGpuLayers:            Int     = 0,
    val chat_template_path:    String  = "",
    val chat_template_content: String  = "",
    val max_tokens:            Int     = 2048,
    val enable_thinking:       Boolean = false,
    val verbose:               Boolean = false,
)
```

<Note>
  JNI 会根据 `compute_unit` 改写 `nGpuLayers`：`cpu` 强制为 0，`npu` 强制为 999。`gpu` 会保留传入值——如要全层卸载，请设为 999。
</Note>

### `ChatMessage`

```kotlin theme={"dark"}
data class ChatMessage(
    var role:    String,   // "system" | "user" | "assistant"
    var content: String,
)
```

### `VlmChatMessage` / `VlmContent`

```kotlin theme={"dark"}
data class VlmChatMessage(
    val role:     String?,                 // "system" | "user" | "assistant"
    val contents: List<VlmContent>,
)

data class VlmContent(
    val type: String?,                     // "text" | "image"
    val text: String?,                     // text content, or absolute file path for image
)
```

### `GenerationConfig`

```kotlin theme={"dark"}
data class GenerationConfig(
    var maxTokens:     Int              = 32,
    var stopWords:     Array<String>?   = null,
    var stopCount:     Int              = 0,
    var nPast:         Int              = 0,
    var samplerConfig: SamplerConfig?   = null,
    var imagePaths:    Array<String>?   = null,
    var imageCount:    Int              = 0,
    var audioPaths:    Array<String>?   = null,
    var audioCount:    Int              = 0,
)
```

<Note>
  `maxTokens` 默认为 **32**。多数场景应设为更大值（例如 `maxTokens = 2048`）。
</Note>

### `LlmStreamResult`

```kotlin theme={"dark"}
sealed class LlmStreamResult {
    data class Token(val text: String)              : LlmStreamResult()
    data class Completed(val profile: ProfilingData) : LlmStreamResult()
    data class Error(val throwable: Throwable)      : LlmStreamResult()
}
```

***

## **llama.cpp（GGUF 模型）**

可在 CPU、Adreno GPU 或 Hexagon NPU 上运行任意 GGUF 模型，计算单元由 `compute_unit` 控制。

### LLM

```kotlin theme={"dark"}
val paths = ModelManagerWrapper.getPaths("unsloth/Qwen3-0.6B-GGUF")
    ?: error("Model not downloaded")

LlmWrapper.builder()
    .llmCreateInput(
        LlmCreateInput(
            model_name = paths.model_name,
            model_path = paths.model_path,
            config     = ModelConfig(nCtx = 4096),
            runtime_id  = "llama_cpp",
            compute_unit  = null,       // null → npu (recommended on Snapdragon)
        )
    )
    .build()
    .onSuccess { llmWrapper = it }
    .onFailure { println("Error: ${it.message}") }

val chat = arrayListOf(ChatMessage("user", "What is AI?"))

llmWrapper.applyChatTemplate(chat.toTypedArray(), null, false).onSuccess { t ->
    llmWrapper.generateStreamFlow(t.formattedText, GenerationConfig(maxTokens = 2048)).collect { result ->
        when (result) {
            is LlmStreamResult.Token     -> print(result.text)
            is LlmStreamResult.Completed -> println("\nDone")
            is LlmStreamResult.Error     -> println("Error: ${result.throwable}")
        }
    }
}
```

#### 计算单元变体

| 目标                 | `compute_unit`   | 备注                         |
| ------------------ | ---------------- | -------------------------- |
| 骁龙 NPU（推荐）         | `"npu"` 或 `null` | Hexagon NPU 加速。            |
| Adreno GPU（OpenCL） | `"gpu"`          | 设 `nGpuLayers = 999` 全层卸载。 |
| 纯 CPU              | `"cpu"`          | 兼容任意 ARM64 芯片。             |

### VLM

GGUF VLM 需要两份产物：LLM 权重（`model_path`）与视觉投影（`mmproj_path`）。两者均来自 `getPaths()`：

```kotlin theme={"dark"}
val paths = ModelManagerWrapper.getPaths("unsloth/Qwen3-VL-2B-Instruct-GGUF")
    ?: error("Model not downloaded")

VlmWrapper.builder()
    .vlmCreateInput(
        VlmCreateInput(
            model_name  = paths.model_name,
            model_path  = paths.model_path,
            mmproj_path = paths.mmproj_path,
            config      = ModelConfig(nCtx = 4096),
            runtime_id   = "llama_cpp",
            compute_unit   = null,
        )
    )
    .build()
    .onSuccess { vlmWrapper = it }

val msg = VlmChatMessage(
    role     = "user",
    contents = listOf(
        VlmContent("image", "/storage/emulated/0/Pictures/example.jpg"),
        VlmContent("text",  "Describe this image."),
    ),
)
val chat = arrayListOf(msg)

vlmWrapper.applyChatTemplate(chat.toTypedArray(), null, false).onSuccess { t ->
    val gen = vlmWrapper.injectMediaPathsToConfig(chat.toTypedArray(), GenerationConfig(maxTokens = 2048))
    vlmWrapper.generateStreamFlow(t.formattedText, gen).collect { result ->
        when (result) {
            is LlmStreamResult.Token     -> print(result.text)
            is LlmStreamResult.Completed -> println("\nDone")
            is LlmStreamResult.Error     -> println("Error: ${result.throwable}")
        }
    }
}
```

<Note>
  请始终把 `t.formattedText`（chat template 化后的 prompt）传入 `generateStreamFlow`，**不要**传原始用户文本。原生流水线把 prompt 视为已格式化。
</Note>

***

## **Qualcomm® AI Hub 模型（通过 Qualcomm AI Engine Direct 使用 NPU）**

来自 Qualcomm AI Hub 的预编译模型。仅 NPU，绑定指定芯片（`SM8750` = 骁龙 8 至尊版，`SM8850` = 骁龙 8 至尊版 Gen 5）。

### 下载 Qualcomm AI Hub 模型

```kotlin theme={"dark"}
ModelManagerWrapper.pullFlow(
    ModelPullInput(
        model_name = "ai-hub-models/Qwen2.5-VL-7B-Instruct",
        hub        = HubSource.AUTO,     // AUTO routes `ai-hub-models/*` to Qualcomm AI Hub
        chipset    = "SM8750",           // REQUIRED on Android
    )
).collect { event ->
    when (event) {
        is ModelManagerWrapper.PullEvent.Progress  -> updateProgressBar(event.files)
        ModelManagerWrapper.PullEvent.Completed    -> println("done")
        is ModelManagerWrapper.PullEvent.Error     -> println("err ${event.code}: ${event.message}")
    }
}
```

### 已支持模型

| 模态  | Hub 仓库                                 |
| --- | -------------------------------------- |
| LLM | `ai-hub-models/Qwen3-4B-Instruct-2507` |
| VLM | `ai-hub-models/Qwen2.5-VL-7B-Instruct` |

### LLM

```kotlin theme={"dark"}
val paths = ModelManagerWrapper.getPaths("ai-hub-models/Qwen3-4B-Instruct-2507")
    ?: error("Model not downloaded")

LlmWrapper.builder()
    .llmCreateInput(
        LlmCreateInput(
            model_name = paths.model_name,
            model_path = paths.model_path,
            config     = ModelConfig(max_tokens = 2048, enable_thinking = false),
            runtime_id  = "qairt",
            compute_unit  = null,           // null → NPU (only option for Qualcomm AI Engine Direct)
        )
    )
    .build()
    .onSuccess { llmWrapper = it }
    .onFailure { println("Error: ${it.message}") }

val chat = arrayListOf(ChatMessage("user", "What is AI?"))

llmWrapper.applyChatTemplate(chat.toTypedArray(), null, false).onSuccess { t ->
    llmWrapper.generateStreamFlow(t.formattedText, GenerationConfig()).collect { result ->
        when (result) {
            is LlmStreamResult.Token     -> print(result.text)
            is LlmStreamResult.Completed -> println("\nDone")
            is LlmStreamResult.Error     -> println("Error: ${result.throwable}")
        }
    }
}
```

<Note>
  Qualcomm AI Engine Direct 会以 `PARAM_NOT_SUPPORTED` 拒绝 `nGpuLayers != 0` 与 `nCtx != 0`——KV 缓存与上下文长度由 Qualcomm AI Hub 模型包在编译期固化。两者保持默认即可，仅调整 `max_tokens` / `enable_thinking`。
</Note>

### VLM

```kotlin theme={"dark"}
val paths = ModelManagerWrapper.getPaths("ai-hub-models/Qwen2.5-VL-7B-Instruct")
    ?: error("Model not downloaded")

VlmWrapper.builder()
    .vlmCreateInput(
        VlmCreateInput(
            model_name  = paths.model_name,
            model_path  = paths.model_path,
            mmproj_path = paths.mmproj_path,
            config      = ModelConfig(max_tokens = 2048, enable_thinking = false),
            runtime_id   = "qairt",
            compute_unit   = null,
        )
    )
    .build()
    .onSuccess { vlmWrapper = it }

val msg = VlmChatMessage(
    role     = "user",
    contents = listOf(
        VlmContent("image", "/storage/emulated/0/Pictures/cat.jpg"),
        VlmContent("text",  "What's in this image?"),
    ),
)
val chat = arrayListOf(msg)

vlmWrapper.applyChatTemplate(chat.toTypedArray(), null, false).onSuccess { t ->
    val gen = vlmWrapper.injectMediaPathsToConfig(chat.toTypedArray(), GenerationConfig(maxTokens = 2048))
    vlmWrapper.generateStreamFlow(t.formattedText, gen).collect { result ->
        when (result) {
            is LlmStreamResult.Token     -> print(result.text)
            is LlmStreamResult.Completed -> println("\nDone")
            is LlmStreamResult.Error     -> println("Error: ${result.throwable}")
        }
    }
}
```

<Note>
  请把 **chat template 化的 prompt**（`t.formattedText`）传入 `generateStreamFlow`，绝不要传原始用户文本。Qualcomm AI Engine Direct VLM 把 prompt 视为已格式化——传原始文本会得到退化的输出。
</Note>

***

## **需要帮助？**

<CardGroup cols={2}>
  <Card title="GitHub Issues" icon="github" href="https://github.com/qualcomm/GenieX/issues">
    提交 bug、提需求或浏览开放的 Issue。
  </Card>

  <Card title="Slack" icon="slack" href="https://aihub.qualcomm.com/community/slack">
    开发者协作与资源。
  </Card>
</CardGroup>

<br />

<div class="feedback-wrapper">
  <span class="feedback-label">Was this page helpful?</span>

  <div class="feedback-toggle">
    <input type="radio" name="feedback" id="feedback-yes" class="feedback-input" />

    <label for="feedback-yes" class="feedback-button">
      <img src="https://mintlify.s3.us-west-1.amazonaws.com/qualcomm-0801e48b/Images/FeedBack/thumbs-up.svg" alt="Thumbs up" class="feedback-icon" noZoom />

      Yes
    </label>

    <input type="radio" name="feedback" id="feedback-no" class="feedback-input" />

    <label for="feedback-no" class="feedback-button">
      <img src="https://mintlify.s3.us-west-1.amazonaws.com/qualcomm-0801e48b/Images/FeedBack/thumbs-down.svg" alt="Thumbs down" class="feedback-icon" noZoom />

      No
    </label>
  </div>
</div>