CLI reference - Qualcomm® AI Hub GenieX

Model inference

`geniex pull`

Download a model and store it locally.

geniex pull <model-name>[:<precision>]

Flag	Description
`--model-hub`	Model source: `aihub` \| `hf` \| `localfs`. Auto-detected when omitted.
`--local-path`	Path to a local directory or AI Hub `.zip` file. Implies `--model-hub localfs`.
`--model-type`	Model type: `llm` \| `vlm`. Auto-detected when omitted.

Pulling from a local path:

geniex pull local/my-model --local-path /path/to/model-dir

pull copies files into the GenieX cache. After a successful pull you can safely delete the source to avoid keeping two copies.

Precision (Quantization) (llama.cpp only) For GGUF models the CLI prompts you to pick a precision:

Choose a precision version to download
> Q4_0       [1.2 GiB] (default)
  Q8_0       [2.0 GiB]
  F16        [3.8 GiB]

Q4_0 has the best Hexagon NPU support. See Precisions (Quantizations) Supported.

Qualcomm AI Hub Models are pre-quantized — no choice needed.

`geniex infer` — LLM

Launch an interactive chat session with a language model.

geniex infer ai-hub-models/Qwen3-4B

Thinking mode — control whether the model shows reasoning before responding:

geniex infer ai-hub-models/Qwen3-4B --think         # show reasoning steps
geniex infer ai-hub-models/Qwen3-4B --think=false   # respond directly

Compute unit selection (via --compute) — pick which compute unit runs the model (default: npu):

# llama.cpp models support all compute units
geniex infer unsloth/Qwen3.5-0.8B-GGUF --compute npu
geniex infer unsloth/Qwen3.5-0.8B-GGUF --compute gpu
geniex infer unsloth/Qwen3.5-0.8B-GGUF --compute cpu

# Qualcomm AI Hub Models only support NPU
geniex infer ai-hub-models/Qwen3-4B --compute npu

Qualcomm AI Hub Models run on NPU only. Using --compute cpu or --compute gpu returns an error.

`geniex infer` — VLM

Run vision-language inference with text-only or image input:

geniex infer ai-hub-models/Qwen2.5-VL-7B-Instruct-GGUF

For text-only, just launch and chat. For image input, provide the absolute path or drag the file into your terminal:

Describe this picture </full/path/to/image.png>

`geniex serve`

Start the OpenAI-compatible local server. See Local server for the API.

geniex serve

Configuration flags

These flags can be passed to geniex infer to control model loading and generation.

Sampler flags

Control how the model selects tokens during generation.

Flag	Type	Default	Description
`--temperature`	float	—	Sampling temperature. Higher values increase randomness.
`--top-p`	float	—	Top-p (nucleus) sampling threshold.
`--top-k`	int	—	Top-k sampling. Only consider the top-k most likely tokens.
`--min-p`	float	—	Min-p sampling threshold.
`--repetition-penalty`	float	`1`	Penalize repeated tokens. Values > 1 reduce repetition.
`--presence-penalty`	float	—	Penalize tokens that have appeared at all.
`--frequency-penalty`	float	—	Penalize tokens proportional to their frequency.
`--seed`	int	—	Random seed for reproducible outputs.
`--grammar-path`	string	—	Path to a GBNF grammar file for constrained generation.
`--grammar-string`	string	—	Inline grammar in GBNF string format.
`--enable-json`	—	—	Force JSON-only output.

Model flags

Control model loading, context, and generation limits.

Flag	Type	Default	Description
`-n`, `--ngl`	int	`999`	Number of layers to offload to GPU.
`--nctx`	int	`4096`	Context window size (max input + output tokens).
`--max-tokens`	int	`2048`	Maximum tokens to generate per response.
`--stop`	string[]	—	Stop sequences (can be specified multiple times).
`--stop-file`	string	—	File containing stop sequences (one per line).
`--think` / `--think=false`	bool	`true`	Enable or disable thinking mode for reasoning models.
`-s`, `--system-prompt`	string	—	System prompt to set model behavior.

Utility commands

Command	Description	Example
`geniex list`	Display all downloaded models with their names and sizes.	`geniex list`
`geniex remove <model>`	Remove a specific local model by name.	`geniex remove unsloth/Qwen3-0.6B-GGUF`
`geniex clean`	Delete all locally cached models.	`geniex clean`
`geniex infer -h`	Show help for `geniex infer`.	`geniex infer -h`
`geniex serve -h`	Show help for `geniex serve`.	`geniex serve -h`

Was this page helpful?

Yes

​Model inference

​geniex pull

​geniex infer — LLM

​geniex infer — VLM

​geniex serve

​Configuration flags

​Sampler flags

​Model flags

​Utility commands

Model inference

`geniex pull`

`geniex infer` — LLM

`geniex infer` — VLM

`geniex serve`

Configuration flags

Sampler flags

Model flags

Utility commands