GenieX runs exclusively on Qualcomm Snapdragon — no x86 or non-Snapdragon ARM build. To get going, pick the Snapdragon platform you’ll run on, then pick a runtime that matches the model you want to run.
GenieX is supported across three Snapdragon families — compute, mobile, and IoT — covering Windows ARM64, Android, and Linux ARM64.
| OS | Interfaces |
|---|
| Windows ARM64 (Compute / Copilot+ PC) | |
| Android (Mobile) | |
| Linux ARM64 (Dragonwing IoT — cameras, robotics, industrial) | |
For the full list of supported chipsets, see the Qualcomm AI Hub device list.
GenieX runtimes
GenieX ships with two runtimes so you get both broad model coverage and peak Snapdragon performance in one stack:
llama_cpp — any GGUF model on Hugging Face, running on Hexagon NPU, Adreno GPU, or CPU through Qualcomm’s GGML Hexagon backend. The widest model selection.
qairt (Qualcomm® AI Engine Direct) — pre-compiled bundles from Qualcomm AI Hub, compiled and quantized per chipset and pinned to the Hexagon NPU. The fastest path when your model is on Qualcomm AI Hub.
Qualcomm AI Engine Direct (also known as the Qualcomm AI Engine Direct SDK, Qualcomm AI Runtime, and historically QAIRT) is the official name. Throughout these docs we use the official name.
| llama.cpp | Qualcomm AI Engine Direct |
|---|
| Model format | GGUF (any community model) | Qualcomm AI Hub pre-compiled bundles |
| Compute units | NPU / GPU / CPU | NPU only |
| Precisions (Quantizations) picked by | You (Q4_0, Q8_0, F16, …) | Pre-quantized in the bundle |
| Best for | Bringing your own GGUF from Hugging Face | Highest NPU performance on Qualcomm® AI Hub Models |
Pick llama_cpp for any GGUF from Hugging Face, or when you need CPU/GPU fallback (e.g. IoT devices without HTP). Pick qairt for the fastest NPU path on models published to Qualcomm AI Hub.
Defaults
If you don’t pass a compute unit:
| Runtime | Default compute unit |
|---|
llama_cpp | hybrid (HTP + CPU per-tensor scheduling — the fast path on Snapdragon) |
qairt | npu |
Some model families override this — for example, gpt-oss falls back to npu on llama_cpp because it can’t run on the hybrid path.
llama.cpp
The llama_cpp runtime executes any GGUF model through llama.cpp with Qualcomm’s GGML Hexagon backend. Pull any community GGUF from Hugging Face and run it on Snapdragon NPU, Adreno GPU, or pure CPU.
Compute units
--compute maps to the underlying hardware as follows:
| Alias | Effect |
|---|
npu | Pin to Hexagon NPU (HTP0). Best NPU-only path. |
gpu | Adreno GPU via OpenCL. |
cpu | Pure CPU. Forces nGpuLayers = 0. |
hybrid (default) | Empty device_id + n_gpu_layers=999 — llama.cpp’s per-tensor HTP+CPU scheduler. The fast path on Snapdragon. |
The precision you pick at geniex pull time also determines where the model lands — see Precisions (Quantizations) Supported.
Qualcomm AI Engine Direct
The qairt runtime executes pre-compiled bundles from Qualcomm AI Hub through Qualcomm® AI Engine Direct. NPU-only, with the bundle compiled and quantized for a specific Snapdragon chipset — typically the fastest NPU path when your model is on Qualcomm AI Hub.
Compute units
Qualcomm AI Engine Direct is NPU only.
| Alias | Effect |
|---|
npu (default) | Pin HTP0 — the only supported path. |
cpu / gpu | Coerced to npu with a warning. Never an error. |
Runtime constraints
The bundle has its precision, context length, and KV cache size baked in — none can be changed at runtime. On Android, nGpuLayers != 0 and nCtx != 0 are rejected with PARAM_NOT_SUPPORTED; leave both at defaults and tune max_tokens / enable_thinking only. To change precision or context length, get a different bundle from Qualcomm AI Hub.