> ## Documentation Index > Fetch the complete documentation index at: https://geniex.aihub.qualcomm.com/llms.txt > Use this file to discover all available pages before exploring further. # 快速入门 > 在 Windows ARM64 上通过 GenieX Python SDK 运行你的第一个模型。 ## **前置条件** * 已安装 Python SDK——详见[安装](/cn/run/python/install)。 * 了解[运行环境选择](/cn/get-started/platforms#geniex-运行环境)——`qairt` 用于 Qualcomm AI Hub 模型，`llama_cpp` 用于任意 GGUF。 SDK 与 Hugging Face `transformers` 设计一致——通过 `AutoModelForCausalLM.from_pretrained()` 加载，再调用 `.generate()`。 ## **LLM 推理（GGUF）** 任意来自 Hugging Face 的 GGUF 模型均可通过 `llama_cpp` 运行。模型权重在首次使用时自动下载。 ```python theme={"dark"} from geniex import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen3-0.6B-GGUF", # HF repo id of a GGUF model, or a local .gguf path device_map="auto", # "auto" | "cpu" | "gpu" | "npu" | "hybrid" # | "" | ":" # auto -> hybrid for llama_cpp, npu for qairt ) messages = [{"role": "user", "content": "What is 2+2?"}] prompt = model.tokenizer.apply_chat_template( messages, add_generation_prompt=True, ) # 单次生成 output = model.generate(prompt, max_new_tokens=256) print(output.text) print(f"[{output.profile.generated_tokens} tok, " f"{output.profile.decode_speed:.1f} tok/s, stop={output.profile.stop_reason}]") # 流式生成 streamer = model.generate(prompt, max_new_tokens=256, stream=True) for chunk in streamer: print(chunk, end="", flush=True) model.close() ``` ## **LLM 推理（QAIRT）** 来自 [Qualcomm AI Hub](https://aihub.qualcomm.com/models/) 的预编译模型包通过 `qairt` 运行环境完全在 Hexagon NPU 上运行。使用 `device_map="qairt"`（或 `"npu"`）。模型权重在首次使用时自动下载。 ```python theme={"dark"} from geniex import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained( "ai-hub-models/Qwen3-4B", # Qualcomm AI Hub 模型 ID device_map="qairt", # 仅在 NPU 上运行 ) messages = [{"role": "user", "content": "What is 2+2?"}] prompt = model.tokenizer.apply_chat_template( messages, add_generation_prompt=True, ) # 单次生成 output = model.generate(prompt, max_new_tokens=256) print(output.text) print(f"[{output.profile.generated_tokens} tok, " f"{output.profile.decode_speed:.1f} tok/s, stop={output.profile.stop_reason}]") # 流式生成 streamer = model.generate(prompt, max_new_tokens=256, stream=True) for chunk in streamer: print(chunk, end="", flush=True) model.close() ``` ## **VLM 推理（QAIRT）** 首先下载示例图片： ```bash theme={"dark"} curl -o demo.jpg https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-geniex/demo.jpg ``` 随后运行推理： ```python theme={"dark"} import os from geniex import AutoModelForCausalLM image_path = os.path.abspath("demo.jpg") model = AutoModelForCausalLM.from_pretrained( "ai-hub-models/Qwen2.5-VL-7B-Instruct", # Qualcomm AI Hub VLM 模型包 device_map="qairt", ) messages = [{ "role": "user", "content": [ {"type": "image", "image": image_path}, {"type": "text", "text": "Describe the image."}, ], }] prompt = model.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) streamer = model.generate(prompt, images=[image_path], max_new_tokens=256, stream=True) for chunk in streamer: print(chunk, end="", flush=True) model.close() ``` ## **Jupyter Notebook 教程** 笔记本电脑用户可参考 [examples/python/windows.ipynb](https://github.com/qualcomm/GenieX/blob/main/examples/python/windows.ipynb) 中的逐步 Jupyter Notebook，覆盖环境配置与端到端推理。 ## **下一步** Python SDK 的全部类、方法与参数。支持的模型、Hugging Face 上的 GGUF，以及自行转换的 Qualcomm AI Engine Direct 模型包。

Was this page helpful?

Yes