> ## Documentation Index
> Fetch the complete documentation index at: https://geniex.aihub.qualcomm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Quickstart

> Run your first model from the GenieX Python SDK on Windows ARM64.

## **Prerequisites**

* The Python SDK installed — see [Install](/en/run/python/install).
* Familiarity with [runtime choice](/en/get-started/platforms#geniex-runtimes) — `qairt` for Qualcomm AI Hub Models, `llama_cpp` for any GGUF.

The SDK follows the same design as Hugging Face `transformers` — load with `AutoModelForCausalLM.from_pretrained()`, then call `.generate()`.

## **LLM inference (GGUF)**

Any GGUF model from Hugging Face runs via `llama_cpp`. Model weights are downloaded on first use.

```python theme={"dark"}
from geniex import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-0.6B-GGUF",     # HF repo id of a GGUF model, or a local .gguf path
    device_map="auto",          # "auto" | "cpu" | "gpu" | "npu" | "hybrid"
                                # | "<runtime>" | "<runtime>:<compute-unit>"
                                # auto -> hybrid for llama_cpp, npu for qairt
)

messages = [{"role": "user", "content": "What is 2+2?"}]
prompt = model.tokenizer.apply_chat_template(
    messages, add_generation_prompt=True,
)

# One-shot
output = model.generate(prompt, max_new_tokens=256)
print(output.text)
print(f"[{output.profile.generated_tokens} tok, "
      f"{output.profile.decode_speed:.1f} tok/s, stop={output.profile.stop_reason}]")

# Streaming
streamer = model.generate(prompt, max_new_tokens=256, stream=True)
for chunk in streamer:
    print(chunk, end="", flush=True)

model.close()
```

## **LLM inference (QAIRT)**

Pre-compiled bundles from [Qualcomm AI Hub](https://aihub.qualcomm.com/models/) run entirely on the Hexagon NPU via the `qairt` runtime. Use `device_map="qairt"` (or `"npu"`). Model weights are downloaded on first use.

```python theme={"dark"}
from geniex import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "ai-hub-models/Qwen3-4B",   # Qualcomm AI Hub model id
    device_map="qairt",         # NPU-only
)

messages = [{"role": "user", "content": "What is 2+2?"}]
prompt = model.tokenizer.apply_chat_template(
    messages, add_generation_prompt=True,
)

# One-shot
output = model.generate(prompt, max_new_tokens=256)
print(output.text)
print(f"[{output.profile.generated_tokens} tok, "
      f"{output.profile.decode_speed:.1f} tok/s, stop={output.profile.stop_reason}]")

# Streaming
streamer = model.generate(prompt, max_new_tokens=256, stream=True)
for chunk in streamer:
    print(chunk, end="", flush=True)

model.close()
```

## **VLM inference (QAIRT)**

Download a sample image first:

```bash theme={"dark"}
curl -o demo.jpg https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-geniex/demo.jpg
```

Then run inference:

```python theme={"dark"}
import os
from geniex import AutoModelForCausalLM

image_path = os.path.abspath("demo.jpg")

model = AutoModelForCausalLM.from_pretrained(
    "ai-hub-models/Qwen2.5-VL-7B-Instruct",  # Qualcomm AI Hub VLM bundle
    device_map="qairt",
)
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image_path},
        {"type": "text", "text": "Describe the image."},
    ],
}]
prompt = model.tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True,
)

streamer = model.generate(prompt, images=[image_path], max_new_tokens=256, stream=True)
for chunk in streamer:
    print(chunk, end="", flush=True)

model.close()
```

## **Jupyter notebook walkthrough**

For laptop users, follow the step-by-step Jupyter notebook at [examples/python/windows.ipynb](https://github.com/qualcomm/GenieX/blob/main/examples/python/windows.ipynb) — it covers environment setup and inference end-to-end.

## **Next steps**

<CardGroup cols={2}>
  <Card title="API reference" href="/en/run/python/api-reference" icon="book">
    All classes, methods, and parameters for the Python SDK.
  </Card>

  <Card title="Models" href="/en/models/supported" icon="cube">
    Supported models, GGUF on Hugging Face, and self-converted Qualcomm AI Engine Direct bundles.
  </Card>
</CardGroup>

<br />

<div class="feedback-wrapper">
  <span class="feedback-label">Was this page helpful?</span>

  <div class="feedback-toggle">
    <input type="radio" name="feedback" id="feedback-yes" class="feedback-input" />

    <label for="feedback-yes" class="feedback-button">
      <img src="https://mintlify.s3.us-west-1.amazonaws.com/qualcomm-0801e48b/Images/FeedBack/thumbs-up.svg" alt="Thumbs up" class="feedback-icon" noZoom />

      Yes
    </label>

    <input type="radio" name="feedback" id="feedback-no" class="feedback-input" />

    <label for="feedback-no" class="feedback-button">
      <img src="https://mintlify.s3.us-west-1.amazonaws.com/qualcomm-0801e48b/Images/FeedBack/thumbs-down.svg" alt="Thumbs down" class="feedback-icon" noZoom />

      No
    </label>
  </div>
</div>
