Platforms & runtimes - Qualcomm® AI Hub GenieX

GenieX runs exclusively on Qualcomm Snapdragon — no x86 or non-Snapdragon ARM build. To get going, pick the Snapdragon platform you’ll run on, then pick a runtime that matches the model you want to run.

Snapdragon platforms

GenieX is supported across three Snapdragon families — compute, mobile, and IoT — covering Windows ARM64, Android, and Linux ARM64.

OS	Interfaces
Windows ARM64 (Compute / Copilot+ PC)	CLI Python SDK Local server
Android (Mobile)	Android SDK (Kotlin, Maven Central)
Linux ARM64 (Dragonwing IoT — cameras, robotics, industrial)	Native install Docker

For the full list of supported chipsets, see the Qualcomm AI Hub device list.

No device on hand? Sign in to Qualcomm Developer Cloud (QDC) for remote sessions on Snapdragon X, 8 Elite, and Dragonwing QCS9075. See the QDC walkthrough in the FAQ.

GenieX runtimes

GenieX ships with two runtimes so you get both broad model coverage and peak Snapdragon performance in one stack:

llama_cpp — any GGUF model on Hugging Face, running on Hexagon NPU, Adreno GPU, or CPU through Qualcomm’s GGML Hexagon backend. The widest model selection.
qairt (Qualcomm® AI Engine Direct) — pre-compiled bundles from Qualcomm AI Hub, compiled and quantized per chipset and pinned to the Hexagon NPU. The fastest path when your model is on Qualcomm AI Hub.

Qualcomm AI Engine Direct (also known as the Qualcomm AI Engine Direct SDK, Qualcomm AI Runtime, and historically QAIRT) is the official name. Throughout these docs we use the official name.

	llama.cpp	Qualcomm AI Engine Direct
Model format	GGUF (any community model)	Qualcomm AI Hub pre-compiled bundles
Compute units	NPU / GPU / CPU	NPU only
Precisions (Quantizations) picked by	You (`Q4_0`, `Q8_0`, `F16`, …)	Pre-quantized in the bundle
Best for	Bringing your own GGUF from Hugging Face	Highest NPU performance on Qualcomm® AI Hub Models

Pick llama_cpp for any GGUF from Hugging Face, or when you need CPU/GPU fallback (e.g. IoT devices without HTP). Pick qairt for the fastest NPU path on models published to Qualcomm AI Hub.

Defaults

If you don’t pass a compute unit:

Runtime	Default compute unit
`llama_cpp`	`hybrid` (HTP + CPU per-tensor scheduling — the fast path on Snapdragon)
`qairt`	`npu`

Some model families override this — for example, gpt-oss falls back to npu on llama_cpp because it can’t run on the hybrid path.

llama.cpp

The llama_cpp runtime executes any GGUF model through llama.cpp with Qualcomm’s GGML Hexagon backend. Pull any community GGUF from Hugging Face and run it on Snapdragon NPU, Adreno GPU, or pure CPU.

Compute units

--compute maps to the underlying hardware as follows:

Alias	Effect
`npu`	Pin to Hexagon NPU (`HTP0`). Best NPU-only path.
`gpu`	Adreno GPU via OpenCL.
`cpu`	Pure CPU. Forces `nGpuLayers = 0`.
`hybrid` (default)	Empty `device_id` + `n_gpu_layers=999` — llama.cpp’s per-tensor HTP+CPU scheduler. The fast path on Snapdragon.

The precision you pick at geniex pull time also determines where the model lands — see Precisions (Quantizations) Supported.

Qualcomm AI Engine Direct

The qairt runtime executes pre-compiled bundles from Qualcomm AI Hub through Qualcomm® AI Engine Direct. NPU-only, with the bundle compiled and quantized for a specific Snapdragon chipset — typically the fastest NPU path when your model is on Qualcomm AI Hub.

Compute units

Qualcomm AI Engine Direct is NPU only.

Alias	Effect
`npu` (default)	Pin `HTP0` — the only supported path.
`cpu` / `gpu`	Coerced to `npu` with a warning. Never an error.

Runtime constraints

The bundle has its precision, context length, and KV cache size baked in — none can be changed at runtime. On Android, nGpuLayers != 0 and nCtx != 0 are rejected with PARAM_NOT_SUPPORTED; leave both at defaults and tune max_tokens / enable_thinking only. To change precision or context length, get a different bundle from Qualcomm AI Hub.

Was this page helpful?

Yes

​Snapdragon platforms

​GenieX runtimes

​Defaults

​llama.cpp

​Compute units

​Qualcomm AI Engine Direct

​Compute units

​Runtime constraints

Snapdragon platforms

GenieX runtimes

Defaults

llama.cpp

Compute units

Qualcomm AI Engine Direct

Compute units

Runtime constraints