Prerequisites
- The Python SDK installed — see Install.
- Familiarity with runtime choice —
qairtfor Qualcomm AI Hub Models,llama_cppfor any GGUF.
transformers — load with AutoModelForCausalLM.from_pretrained(), then call .generate().
LLM inference (GGUF)
Any GGUF model from Hugging Face runs viallama_cpp. Model weights are downloaded on first use.
LLM inference (QAIRT)
Pre-compiled bundles from Qualcomm AI Hub run entirely on the Hexagon NPU via theqairt runtime. Use device_map="qairt" (or "npu"). Model weights are downloaded on first use.
VLM inference (QAIRT)
Download a sample image first:Jupyter notebook walkthrough
For laptop users, follow the step-by-step Jupyter notebook at examples/python/windows.ipynb — it covers environment setup and inference end-to-end.Next steps
API reference
All classes, methods, and parameters for the Python SDK.
Models
Supported models, GGUF on Hugging Face, and self-converted Qualcomm AI Engine Direct bundles.
Was this page helpful?