Skip to main content
GenieX runs exclusively on Qualcomm Snapdragon — no x86 or non-Snapdragon ARM build. To get going, pick the Snapdragon platform you’ll run on, then pick a runtime that matches the model you want to run.

Snapdragon platforms

GenieX is supported across three Snapdragon families — compute, mobile, and IoT — covering Windows ARM64, Android, and Linux ARM64.
OSInterfaces
Windows ARM64 (Compute / Copilot+ PC)
Android (Mobile)
Linux ARM64 (Dragonwing IoT — cameras, robotics, industrial)
For the full list of supported chipsets, see the Qualcomm AI Hub device list.
No device on hand? Sign in to Qualcomm Developer Cloud (QDC) for remote sessions on Snapdragon X, 8 Elite, and Dragonwing QCS9075. See the QDC walkthrough in the FAQ.

GenieX runtimes

GenieX ships with two runtimes so you get both broad model coverage and peak Snapdragon performance in one stack:
  • llama_cpp — any GGUF model on Hugging Face, running on Hexagon NPU, Adreno GPU, or CPU through Qualcomm’s GGML Hexagon backend. The widest model selection.
  • qairt (Qualcomm® AI Engine Direct) — pre-compiled bundles from Qualcomm AI Hub, compiled and quantized per chipset and pinned to the Hexagon NPU. The fastest path when your model is on Qualcomm AI Hub.
Qualcomm AI Engine Direct (also known as the Qualcomm AI Engine Direct SDK, Qualcomm AI Runtime, and historically QAIRT) is the official name. Throughout these docs we use the official name.
llama.cppQualcomm AI Engine Direct
Model formatGGUF (any community model)Qualcomm AI Hub pre-compiled bundles
Compute unitsNPU / GPU / CPUNPU only
Precisions (Quantizations) picked byYou (Q4_0, Q8_0, F16, …)Pre-quantized in the bundle
Best forBringing your own GGUF from Hugging FaceHighest NPU performance on Qualcomm® AI Hub Models
Pick llama_cpp for any GGUF from Hugging Face, or when you need CPU/GPU fallback (e.g. IoT devices without HTP). Pick qairt for the fastest NPU path on models published to Qualcomm AI Hub.

Defaults

If you don’t pass a compute unit:
RuntimeDefault compute unit
llama_cpphybrid (HTP + CPU per-tensor scheduling — the fast path on Snapdragon)
qairtnpu
Some model families override this — for example, gpt-oss falls back to npu on llama_cpp because it can’t run on the hybrid path.

llama.cpp

The llama_cpp runtime executes any GGUF model through llama.cpp with Qualcomm’s GGML Hexagon backend. Pull any community GGUF from Hugging Face and run it on Snapdragon NPU, Adreno GPU, or pure CPU.

Compute units

--compute maps to the underlying hardware as follows:
AliasEffect
npuPin to Hexagon NPU (HTP0). Best NPU-only path.
gpuAdreno GPU via OpenCL.
cpuPure CPU. Forces nGpuLayers = 0.
hybrid (default)Empty device_id + n_gpu_layers=999 — llama.cpp’s per-tensor HTP+CPU scheduler. The fast path on Snapdragon.
The precision you pick at geniex pull time also determines where the model lands — see Precisions (Quantizations) Supported.

Qualcomm AI Engine Direct

The qairt runtime executes pre-compiled bundles from Qualcomm AI Hub through Qualcomm® AI Engine Direct. NPU-only, with the bundle compiled and quantized for a specific Snapdragon chipset — typically the fastest NPU path when your model is on Qualcomm AI Hub.

Compute units

Qualcomm AI Engine Direct is NPU only.
AliasEffect
npu (default)Pin HTP0 — the only supported path.
cpu / gpuCoerced to npu with a warning. Never an error.

Runtime constraints

The bundle has its precision, context length, and KV cache size baked in — none can be changed at runtime. On Android, nGpuLayers != 0 and nCtx != 0 are rejected with PARAM_NOT_SUPPORTED; leave both at defaults and tune max_tokens / enable_thinking only. To change precision or context length, get a different bundle from Qualcomm AI Hub.