> ## Documentation Index
> Fetch the complete documentation index at: https://geniex.aihub.qualcomm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Platforms & runtimes

> Snapdragon platforms supported by GenieX and which runtime to pick on each.

GenieX runs exclusively on Qualcomm Snapdragon — no x86 or non-Snapdragon ARM build. To get going, pick the **Snapdragon platform** you'll run on, then pick a **runtime** that matches the model you want to run.

## **Snapdragon platforms**

GenieX is supported across three Snapdragon families — compute, mobile, and IoT — covering Windows ARM64, Android, and Linux ARM64.

| OS                                                                 | Interfaces                                                                                                                                       |
| ------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Windows ARM64** *(Compute / Copilot+ PC)*                        | <ul><li>[CLI](/en/run/cli/quickstart)</li><li>[Python SDK](/en/run/python/quickstart)</li><li>[Local server](/en/run/cli/local-server)</li></ul> |
| **Android** *(Mobile)*                                             | <ul><li>[Android SDK](/en/run/android/quickstart) (Kotlin, Maven Central)</li></ul>                                                              |
| **Linux ARM64** *(Dragonwing IoT — cameras, robotics, industrial)* | <ul><li>[Native install](/en/run/cli/install)</li><li>[Docker](/en/run/linux/install)</li></ul>                                                  |

For the full list of supported chipsets, see the [Qualcomm AI Hub device list](https://workbench.aihub.qualcomm.com/docs/hub/devices.html).

<Tip>**No device on hand?** Sign in to [Qualcomm Developer Cloud (QDC)](https://qdc.qualcomm.com/) for remote sessions on Snapdragon X, 8 Elite, and Dragonwing QCS9075. See the [QDC walkthrough in the FAQ](/en/resources/faq#qdc-qualcomm-device-cloud).</Tip>

## **GenieX runtimes**

GenieX ships with two runtimes so you get both **broad model coverage** and **peak Snapdragon performance** in one stack:

* **`llama_cpp`** — any GGUF model on Hugging Face, running on Hexagon NPU, Adreno GPU, or CPU through Qualcomm's GGML Hexagon backend. The widest model selection.
* **`qairt`** ([Qualcomm® AI Engine Direct](https://www.qualcomm.com/developer/software/qualcomm-ai-engine-direct-sdk)) — pre-compiled bundles from [Qualcomm AI Hub](https://aihub.qualcomm.com/models/), compiled and quantized per chipset and pinned to the Hexagon NPU. The fastest path when your model is on Qualcomm AI Hub.

<Note>**Qualcomm AI Engine Direct** (also known as the *Qualcomm AI Engine Direct SDK*, *Qualcomm AI Runtime*, and historically *QAIRT*) is the official name. Throughout these docs we use the official name.</Note>

|                                          | **llama.cpp**                            | **Qualcomm AI Engine Direct**                      |
| ---------------------------------------- | ---------------------------------------- | -------------------------------------------------- |
| **Model format**                         | GGUF (any community model)               | Qualcomm AI Hub pre-compiled bundles               |
| **Compute units**                        | NPU / GPU / CPU                          | NPU only                                           |
| **Precisions (Quantizations) picked by** | You (`Q4_0`, `Q8_0`, `F16`, …)           | Pre-quantized in the bundle                        |
| **Best for**                             | Bringing your own GGUF from Hugging Face | Highest NPU performance on Qualcomm® AI Hub Models |

Pick **`llama_cpp`** for any GGUF from Hugging Face, or when you need CPU/GPU fallback (e.g. IoT devices without HTP). Pick **`qairt`** for the fastest NPU path on models published to Qualcomm AI Hub.

### Defaults

If you don't pass a compute unit:

| Runtime     | Default compute unit                                                     |
| ----------- | ------------------------------------------------------------------------ |
| `llama_cpp` | `hybrid` (HTP + CPU per-tensor scheduling — the fast path on Snapdragon) |
| `qairt`     | `npu`                                                                    |

Some model families override this — for example, `gpt-oss` falls back to `npu` on `llama_cpp` because it can't run on the hybrid path.

***

## **llama.cpp**

The `llama_cpp` runtime executes **any GGUF model** through llama.cpp with Qualcomm's GGML Hexagon backend. Pull any community GGUF from Hugging Face and run it on Snapdragon NPU, Adreno GPU, or pure CPU.

### Compute units

`--compute` maps to the underlying hardware as follows:

| Alias                | Effect                                                                                                              |
| -------------------- | ------------------------------------------------------------------------------------------------------------------- |
| `npu`                | Pin to Hexagon NPU (`HTP0`). Best NPU-only path.                                                                    |
| `gpu`                | Adreno GPU via OpenCL.                                                                                              |
| `cpu`                | Pure CPU. Forces `nGpuLayers = 0`.                                                                                  |
| `hybrid` *(default)* | Empty `device_id` + `n_gpu_layers=999` — llama.cpp's per-tensor HTP+CPU scheduler. **The fast path on Snapdragon.** |

The precision you pick at `geniex pull` time also determines where the model lands — see [Precisions (Quantizations) Supported](/en/models/supported#precisions-quantizations-supported).

***

## **Qualcomm AI Engine Direct**

The `qairt` runtime executes pre-compiled bundles from **Qualcomm AI Hub** through [Qualcomm® AI Engine Direct](https://www.qualcomm.com/developer/software/qualcomm-ai-engine-direct-sdk). NPU-only, with the bundle compiled and quantized for a specific Snapdragon chipset — typically the fastest NPU path when your model is on Qualcomm AI Hub.

### Compute units

Qualcomm AI Engine Direct is **NPU only**.

| Alias             | Effect                                           |
| ----------------- | ------------------------------------------------ |
| `npu` *(default)* | Pin `HTP0` — the only supported path.            |
| `cpu` / `gpu`     | Coerced to `npu` with a warning. Never an error. |

### Runtime constraints

The bundle has its **precision, context length, and KV cache size baked in** — none can be changed at runtime. On Android, `nGpuLayers != 0` and `nCtx != 0` are rejected with `PARAM_NOT_SUPPORTED`; leave both at defaults and tune `max_tokens` / `enable_thinking` only. To change precision or context length, get a different bundle from [Qualcomm AI Hub](https://aihub.qualcomm.com/models/).

<br />

<div class="feedback-wrapper">
  <span class="feedback-label">Was this page helpful?</span>

  <div class="feedback-toggle">
    <input type="radio" name="feedback" id="feedback-yes" class="feedback-input" />

    <label for="feedback-yes" class="feedback-button">
      <img src="https://mintcdn.com/qualcomm-0801e48b/VijZ6eXFSGIGNc9Z/Images/FeedBack/thumbs-up.svg?fit=max&auto=format&n=VijZ6eXFSGIGNc9Z&q=85&s=9fa1e132909ee8667d02f775cd8bc108" alt="Thumbs up" class="feedback-icon" noZoom width="14" height="14" data-path="Images/FeedBack/thumbs-up.svg" />

      Yes
    </label>

    <input type="radio" name="feedback" id="feedback-no" class="feedback-input" />

    <label for="feedback-no" class="feedback-button">
      <img src="https://mintcdn.com/qualcomm-0801e48b/VijZ6eXFSGIGNc9Z/Images/FeedBack/thumbs-down.svg?fit=max&auto=format&n=VijZ6eXFSGIGNc9Z&q=85&s=7f5f8865b502b5262849a700e6989278" alt="Thumbs down" class="feedback-icon" noZoom width="14" height="14" data-path="Images/FeedBack/thumbs-down.svg" />

      No
    </label>
  </div>
</div>
