> ## Documentation Index
> Fetch the complete documentation index at: https://geniex.aihub.qualcomm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# 快速入门

> 在 Windows ARM64 上通过 GenieX Python SDK 运行你的第一个模型。

## **前置条件**

* 已安装 Python SDK——详见[安装](/cn/run/python/install)。
* 了解[运行环境选择](/cn/get-started/platforms#geniex-运行环境)——`qairt` 用于 Qualcomm AI Hub 模型，`llama_cpp` 用于任意 GGUF。

SDK 与 Hugging Face `transformers` 设计一致——通过 `AutoModelForCausalLM.from_pretrained()` 加载，再调用 `.generate()`。

## **LLM 推理（GGUF）**

任意来自 Hugging Face 的 GGUF 模型均可通过 `llama_cpp` 运行。模型权重在首次使用时自动下载。

```python theme={"dark"}
from geniex import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-0.6B-GGUF",     # HF repo id of a GGUF model, or a local .gguf path
    device_map="auto",          # "auto" | "cpu" | "gpu" | "npu" | "hybrid"
                                # | "<runtime>" | "<runtime>:<compute-unit>"
                                # auto -> hybrid for llama_cpp, npu for qairt
)

messages = [{"role": "user", "content": "What is 2+2?"}]
prompt = model.tokenizer.apply_chat_template(
    messages, add_generation_prompt=True,
)

# 单次生成
output = model.generate(prompt, max_new_tokens=256)
print(output.text)
print(f"[{output.profile.generated_tokens} tok, "
      f"{output.profile.decode_speed:.1f} tok/s, stop={output.profile.stop_reason}]")

# 流式生成
streamer = model.generate(prompt, max_new_tokens=256, stream=True)
for chunk in streamer:
    print(chunk, end="", flush=True)

model.close()
```

## **LLM 推理（QAIRT）**

来自 [Qualcomm AI Hub](https://aihub.qualcomm.com/models/) 的预编译模型包通过 `qairt` 运行环境完全在 Hexagon NPU 上运行。使用 `device_map="qairt"`（或 `"npu"`）。模型权重在首次使用时自动下载。

```python theme={"dark"}
from geniex import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "ai-hub-models/Qwen3-4B",   # Qualcomm AI Hub 模型 ID
    device_map="qairt",         # 仅在 NPU 上运行
)

messages = [{"role": "user", "content": "What is 2+2?"}]
prompt = model.tokenizer.apply_chat_template(
    messages, add_generation_prompt=True,
)

# 单次生成
output = model.generate(prompt, max_new_tokens=256)
print(output.text)
print(f"[{output.profile.generated_tokens} tok, "
      f"{output.profile.decode_speed:.1f} tok/s, stop={output.profile.stop_reason}]")

# 流式生成
streamer = model.generate(prompt, max_new_tokens=256, stream=True)
for chunk in streamer:
    print(chunk, end="", flush=True)

model.close()
```

## **VLM 推理（QAIRT）**

首先下载示例图片：

```bash theme={"dark"}
curl -o demo.jpg https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-geniex/demo.jpg
```

随后运行推理：

```python theme={"dark"}
import os
from geniex import AutoModelForCausalLM

image_path = os.path.abspath("demo.jpg")

model = AutoModelForCausalLM.from_pretrained(
    "ai-hub-models/Qwen2.5-VL-7B-Instruct",  # Qualcomm AI Hub VLM 模型包
    device_map="qairt",
)
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image_path},
        {"type": "text", "text": "Describe the image."},
    ],
}]
prompt = model.tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True,
)

streamer = model.generate(prompt, images=[image_path], max_new_tokens=256, stream=True)
for chunk in streamer:
    print(chunk, end="", flush=True)

model.close()
```

## **Jupyter Notebook 教程**

笔记本电脑用户可参考 [examples/python/windows.ipynb](https://github.com/qualcomm/GenieX/blob/main/examples/python/windows.ipynb) 中的逐步 Jupyter Notebook，覆盖环境配置与端到端推理。

## **下一步**

<CardGroup cols={2}>
  <Card title="API 参考" href="/cn/run/python/api-reference" icon="book">
    Python SDK 的全部类、方法与参数。
  </Card>

  <Card title="模型" href="/cn/models/supported" icon="cube">
    支持的模型、Hugging Face 上的 GGUF，以及自行转换的 Qualcomm AI Engine Direct 模型包。
  </Card>
</CardGroup>

<br />

<div class="feedback-wrapper">
  <span class="feedback-label">Was this page helpful?</span>

  <div class="feedback-toggle">
    <input type="radio" name="feedback" id="feedback-yes" class="feedback-input" />

    <label for="feedback-yes" class="feedback-button">
      <img src="https://mintlify.s3.us-west-1.amazonaws.com/qualcomm-0801e48b/Images/FeedBack/thumbs-up.svg" alt="Thumbs up" class="feedback-icon" noZoom />

      Yes
    </label>

    <input type="radio" name="feedback" id="feedback-no" class="feedback-input" />

    <label for="feedback-no" class="feedback-button">
      <img src="https://mintlify.s3.us-west-1.amazonaws.com/qualcomm-0801e48b/Images/FeedBack/thumbs-down.svg" alt="Thumbs down" class="feedback-icon" noZoom />

      No
    </label>
  </div>
</div>
