Model ID:
qwen/qwen3.5-9b | Parameters: 9B | Released: 2026-03-10Overview
Qwen 3.5 9B is a compact yet capable open-source language model designed to balance performance and efficiency. It supports multilingual understanding and generation, long-context inference up to 131,072 tokens, and multimodal input with text and image support while producing text outputs. Compared with larger models, it offers lower serving cost and faster inference, making it well suited for chat assistants, coding help, document summarization, visual understanding, and other real-time AI applications.Air API Playground
Try the model in the playground.
Deploy with Container
Deploy with AIR Container.
API Usage Guide
Learn how to use the API.
Pricing
| Input | Output |
|---|---|
| $0.05 / 1M tokens | $0.15 / 1M tokens |
Key Features
- Efficient 9B model suitable for cost-sensitive and latency-sensitive deployments
- Strong multilingual understanding and generation
- Supports long-context inference up to 131,072 tokens
- Supports text and image input with text output (max 1 image per request; sending 2+ images returns a 400 error)
- Includes support for reasoning, tools, streaming, vision, json_mode, and logprobs
- Good balance between performance, latency, and serving cost
- Compatible with optimized inference runtimes such as vLLM
- Provided in FP8 format for efficient deployment
Use Cases
General Q&A
Handle general-purpose Q&A with efficient reasoning and contextual understanding.Code Generation & Assistance
Generate and explain code efficiently for practical development tasks.Summarization & Analysis
Efficient summarization and light analytical tasks for long documents.Visual Understanding
Understand image inputs and generate text-based explanations or insights.Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
messages | array | Required | - | List of input messages for chat-based generation. Supports text and image content. |
max_tokens | integer | Optional | - | Maximum number of output tokens to generate |
temperature | number | Optional | 1 | Sampling temperature (0.0-2.0) |
top_p | number | Optional | 1 | Nucleus sampling threshold |
top_k | integer | Optional | - | Limits sampling to the top-k most likely tokens |
min_p | number | Optional | - | Minimum probability threshold for token sampling |
frequency_penalty | number | Optional | 0 | Penalty for token frequency |
presence_penalty | number | Optional | 0 | Penalty for token presence |
repetition_penalty | number | Optional | - | Penalty for repeated token generation |
stop | string | array | Optional | - | Stop sequence(s) where generation will terminate |
seed | integer | Optional | - | Random seed for reproducible sampling |
stream | boolean | Optional | false | Enable streaming responses |
Model Details
| Property | Value |
|---|---|
| Context Length | 131,072 |
| Max Output Length | 131,072 |
| Quantization | fp8 |
| Input Modalities | text, image (max 1 image per request) |
| Output Modalities | text |
| Supported Features | tools, reasoning, streaming, vision, json_mode, logprobs |
Quick Start
Get your API key
Generate an API key from your AirCloud account.
Tags
open-source conversational 9B reasoning multilingual efficient vision
