> ## Documentation Index > Fetch the complete documentation index at: https://geniex.aihub.qualcomm.com/llms.txt > Use this file to discover all available pages before exploring further. # CLI 参考 > GenieX CLI 的全部命令与参数，附使用示例。 ## **模型推理** ### **`geniex pull`** 下载模型并存储到本地。 ```powershell theme={"dark"} geniex pull [:] ``` | 标志 | 说明 | | -------------- | --------------------------------------------------- | | `--model-hub` | 模型来源：`aihub` \| `hf` \| `localfs`。省略时自动检测。 | | `--local-path` | 本地目录或 AI Hub `.zip` 文件的路径。隐含 `--model-hub localfs`。 | | `--model-type` | 模型类型：`llm` \| `vlm`。省略时自动检测。 | **从本地路径拉取：** ```powershell theme={"dark"} geniex pull local/my-model --local-path /path/to/model-dir ``` `pull` 会将文件复制到 GenieX 缓存。拉取成功后可安全删除源文件，避免保留两份副本。 **精度（量化）（仅 llama.cpp）** 对于 GGUF 模型，CLI 会提示选择精度： ```powershell theme={"dark"} Choose a precision version to download > Q4_0 [1.2 GiB] (default) Q8_0 [2.0 GiB] F16 [3.8 GiB] ``` `Q4_0` 在 Hexagon NPU 上支持最佳。详见[支持的精度（量化）](/cn/models/supported#支持的精度量化)。 Qualcomm AI Hub 模型已预量化——无需选择。 ### **`geniex infer` — LLM** 启动与语言模型的交互式对话。 ```powershell theme={"dark"} geniex infer ai-hub-models/Qwen3-4B ``` **思考模式**——控制模型是否在回复前展示推理过程： ```powershell theme={"dark"} geniex infer ai-hub-models/Qwen3-4B --think # show reasoning steps geniex infer ai-hub-models/Qwen3-4B --think=false # respond directly ``` **计算单元选择**（通过 `--compute`）——选择运行模型的计算单元（默认：`npu`）： ```powershell theme={"dark"} # llama.cpp models support all compute units geniex infer unsloth/Qwen3.5-0.8B-GGUF --compute npu geniex infer unsloth/Qwen3.5-0.8B-GGUF --compute gpu geniex infer unsloth/Qwen3.5-0.8B-GGUF --compute cpu # Qualcomm AI Hub Models only support NPU geniex infer ai-hub-models/Qwen3-4B --compute npu ``` Qualcomm AI Hub 模型仅在 NPU 上运行。使用 `--compute cpu` 或 `--compute gpu` 会返回错误。 ### **`geniex infer` — VLM** 运行视觉语言模型推理，支持纯文本或图像输入： ```bash theme={"dark"} geniex infer ai-hub-models/Qwen2.5-VL-7B-Instruct-GGUF ``` 如果只需文本输入，直接启动并对话即可。若要使用图像输入，请提供**绝对路径**或将图片文件直接拖入终端： ```bash theme={"dark"} Describe this picture ``` ## **`geniex serve`** 启动兼容 OpenAI 协议的本地服务器。API 详见[本地服务器](/cn/run/cli/local-server)。 ```bash theme={"dark"} geniex serve ``` ## **配置标志** 以下标志可传给 `geniex infer`，用于控制模型加载与生成行为。 ### 采样器标志控制模型在生成时如何选择 token。 | 标志 | 类型 | 默认值 | 说明 | | ---------------------- | ------ | --- | ---------------------------- | | `--temperature` | float | — | 采样温度。值越高随机性越大。 | | `--top-p` | float | — | Top-p（核采样）阈值。 | | `--top-k` | int | — | Top-k 采样。仅考虑概率最高的 k 个 token。 | | `--min-p` | float | — | Min-p 采样阈值。 | | `--repetition-penalty` | float | `1` | 重复惩罚。值 > 1 减少重复。 | | `--presence-penalty` | float | — | 对已出现过的 token 进行惩罚。 | | `--frequency-penalty` | float | — | 按频率比例惩罚 token。 | | `--seed` | int | — | 随机种子，用于可复现输出。 | | `--grammar-path` | string | — | GBNF 语法文件路径，用于约束生成。 | | `--grammar-string` | string | — | 内联 GBNF 格式语法字符串。 | | `--enable-json` | — | — | 强制仅输出 JSON。 | ### 模型标志控制模型加载、上下文与生成限制。 | 标志 | 类型 | 默认值 | 说明 | | --------------------------- | --------- | ------ | --------------------------- | | `-n`, `--ngl` | int | `999` | 卸载到 GPU 的层数。 | | `--nctx` | int | `4096` | 上下文窗口大小（最大输入 + 输出 token 数）。 | | `--max-tokens` | int | `2048` | 每次响应生成的最大 token 数。 | | `--stop` | string\[] | — | 停止序列（可多次指定）。 | | `--stop-file` | string | — | 包含停止序列的文件（每行一个）。 | | `--think` / `--think=false` | bool | `true` | 启用或禁用推理模型的思考模式。 | | `-s`, `--system-prompt` | string | — | 设置模型行为的系统提示。 | ## **实用命令** | 命令 | 说明 | 示例 | | ----------------------- | ------------------------ | --------------------------------------- | | `geniex list` | 显示所有已下载模型的名称与大小。 | `geniex list` | | `geniex remove ` | 按名称删除指定的本地模型。 | `geniex remove unsloth/Qwen3-0.6B-GGUF` | | `geniex clean` | 删除所有本地缓存的模型。 | `geniex clean` | | `geniex infer -h` | 显示 `geniex infer` 的帮助信息。 | `geniex infer -h` | | `geniex serve -h` | 显示 `geniex serve` 的帮助信息。 | `geniex serve -h` |

Was this page helpful?

Yes