> ## Documentation Index > Fetch the complete documentation index at: https://geniex.aihub.qualcomm.com/llms.txt > Use this file to discover all available pages before exploring further. # Platforms & runtimes > Snapdragon platforms supported by GenieX and which runtime to pick on each. GenieX runs exclusively on Qualcomm Snapdragon — no x86 or non-Snapdragon ARM build. To get going, pick the **Snapdragon platform** you'll run on, then pick a **runtime** that matches the model you want to run. ## **Snapdragon platforms** GenieX is supported across three Snapdragon families — compute, mobile, and IoT — covering Windows ARM64, Android, and Linux ARM64. | OS | Interfaces | | ------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------ | | **Windows ARM64** *(Compute / Copilot+ PC)* |

[CLI](/en/run/cli/quickstart)
[Python SDK](/en/run/python/quickstart)
[Local server](/en/run/cli/local-server)

| | **Android** *(Mobile)* |

[Android SDK](/en/run/android/quickstart) (Kotlin, Maven Central)

| | **Linux ARM64** *(Dragonwing IoT — cameras, robotics, industrial)* |

[Native install](/en/run/cli/install)
[Docker](/en/run/linux/install)

| For the full list of supported chipsets, see the [Qualcomm AI Hub device list](https://workbench.aihub.qualcomm.com/docs/hub/devices.html). **No device on hand?** Sign in to [Qualcomm Developer Cloud (QDC)](https://qdc.qualcomm.com/) for remote sessions on Snapdragon X, 8 Elite, and Dragonwing QCS9075. See the [QDC walkthrough in the FAQ](/en/resources/faq#qdc-qualcomm-device-cloud). ## **GenieX runtimes** GenieX ships with two runtimes so you get both **broad model coverage** and **peak Snapdragon performance** in one stack: * **`llama_cpp`** — any GGUF model on Hugging Face, running on Hexagon NPU, Adreno GPU, or CPU through Qualcomm's GGML Hexagon backend. The widest model selection. * **`qairt`** ([Qualcomm® AI Engine Direct](https://www.qualcomm.com/developer/software/qualcomm-ai-engine-direct-sdk)) — pre-compiled bundles from [Qualcomm AI Hub](https://aihub.qualcomm.com/models/), compiled and quantized per chipset and pinned to the Hexagon NPU. The fastest path when your model is on Qualcomm AI Hub. **Qualcomm AI Engine Direct** (also known as the *Qualcomm AI Engine Direct SDK*, *Qualcomm AI Runtime*, and historically *QAIRT*) is the official name. Throughout these docs we use the official name. | | **llama.cpp** | **Qualcomm AI Engine Direct** | | ---------------------------------------- | ---------------------------------------- | -------------------------------------------------- | | **Model format** | GGUF (any community model) | Qualcomm AI Hub pre-compiled bundles | | **Compute units** | NPU / GPU / CPU | NPU only | | **Precisions (Quantizations) picked by** | You (`Q4_0`, `Q8_0`, `F16`, …) | Pre-quantized in the bundle | | **Best for** | Bringing your own GGUF from Hugging Face | Highest NPU performance on Qualcomm® AI Hub Models | Pick **`llama_cpp`** for any GGUF from Hugging Face, or when you need CPU/GPU fallback (e.g. IoT devices without HTP). Pick **`qairt`** for the fastest NPU path on models published to Qualcomm AI Hub. ### Defaults If you don't pass a compute unit: | Runtime | Default compute unit | | ----------- | ------------------------------------------------------------------------ | | `llama_cpp` | `hybrid` (HTP + CPU per-tensor scheduling — the fast path on Snapdragon) | | `qairt` | `npu` | Some model families override this — for example, `gpt-oss` falls back to `npu` on `llama_cpp` because it can't run on the hybrid path. *** ## **llama.cpp** The `llama_cpp` runtime executes **any GGUF model** through llama.cpp with Qualcomm's GGML Hexagon backend. Pull any community GGUF from Hugging Face and run it on Snapdragon NPU, Adreno GPU, or pure CPU. ### Compute units `--compute` maps to the underlying hardware as follows: | Alias | Effect | | -------------------- | ------------------------------------------------------------------------------------------------------------------- | | `npu` | Pin to Hexagon NPU (`HTP0`). Best NPU-only path. | | `gpu` | Adreno GPU via OpenCL. | | `cpu` | Pure CPU. Forces `nGpuLayers = 0`. | | `hybrid` *(default)* | Empty `device_id` + `n_gpu_layers=999` — llama.cpp's per-tensor HTP+CPU scheduler. **The fast path on Snapdragon.** | The precision you pick at `geniex pull` time also determines where the model lands — see [Precisions (Quantizations) Supported](/en/models/supported#precisions-quantizations-supported). *** ## **Qualcomm AI Engine Direct** The `qairt` runtime executes pre-compiled bundles from **Qualcomm AI Hub** through [Qualcomm® AI Engine Direct](https://www.qualcomm.com/developer/software/qualcomm-ai-engine-direct-sdk). NPU-only, with the bundle compiled and quantized for a specific Snapdragon chipset — typically the fastest NPU path when your model is on Qualcomm AI Hub. ### Compute units Qualcomm AI Engine Direct is **NPU only**. | Alias | Effect | | ----------------- | ------------------------------------------------ | | `npu` *(default)* | Pin `HTP0` — the only supported path. | | `cpu` / `gpu` | Coerced to `npu` with a warning. Never an error. | ### Runtime constraints The bundle has its **precision, context length, and KV cache size baked in** — none can be changed at runtime. On Android, `nGpuLayers != 0` and `nCtx != 0` are rejected with `PARAM_NOT_SUPPORTED`; leave both at defaults and tune `max_tokens` / `enable_thinking` only. To change precision or context length, get a different bundle from [Qualcomm AI Hub](https://aihub.qualcomm.com/models/).

Was this page helpful?

Yes