Skip to main content
Model ID: qwen/qwen3.5-9b | Parameters: 9B | Released: 2026-03-10

Overview

Qwen 3.5 9B is a compact yet capable open-source language model designed to balance performance and efficiency. It supports multilingual understanding and generation, long-context inference up to 131,072 tokens, and multimodal input with text and image support while producing text outputs. Compared with larger models, it offers lower serving cost and faster inference, making it well suited for chat assistants, coding help, document summarization, visual understanding, and other real-time AI applications.

Air API Playground

Try the model in the playground.

Deploy with Container

Deploy with AIR Container.

API Usage Guide

Learn how to use the API.

Pricing

InputOutput
$0.05 / 1M tokens$0.15 / 1M tokens

Key Features

  • Efficient 9B model suitable for cost-sensitive and latency-sensitive deployments
  • Strong multilingual understanding and generation
  • Supports long-context inference up to 131,072 tokens
  • Supports text and image input with text output (max 1 image per request; sending 2+ images returns a 400 error)
  • Includes support for reasoning, tools, streaming, vision, json_mode, and logprobs
  • Good balance between performance, latency, and serving cost
  • Compatible with optimized inference runtimes such as vLLM
  • Provided in FP8 format for efficient deployment

Use Cases

General Q&A

Handle general-purpose Q&A with efficient reasoning and contextual understanding.
Explain how transformers work in simple terms and give a real-world analogy.

Code Generation & Assistance

Generate and explain code efficiently for practical development tasks.
Write a Python function to check if a number is prime and explain the logic.

Summarization & Analysis

Efficient summarization and light analytical tasks for long documents.

Visual Understanding

Understand image inputs and generate text-based explanations or insights.

Parameters

ParameterTypeRequiredDefaultDescription
messagesarrayRequired-List of input messages for chat-based generation. Supports text and image content.
max_tokensintegerOptional-Maximum number of output tokens to generate
temperaturenumberOptional1Sampling temperature (0.0-2.0)
top_pnumberOptional1Nucleus sampling threshold
top_kintegerOptional-Limits sampling to the top-k most likely tokens
min_pnumberOptional-Minimum probability threshold for token sampling
frequency_penaltynumberOptional0Penalty for token frequency
presence_penaltynumberOptional0Penalty for token presence
repetition_penaltynumberOptional-Penalty for repeated token generation
stopstring | arrayOptional-Stop sequence(s) where generation will terminate
seedintegerOptional-Random seed for reproducible sampling
streambooleanOptionalfalseEnable streaming responses

Model Details

PropertyValue
Context Length131,072
Max Output Length131,072
Quantizationfp8
Input Modalitiestext, image (max 1 image per request)
Output Modalitiestext
Supported Featurestools, reasoning, streaming, vision, json_mode, logprobs

Quick Start

1

Get your API key

Generate an API key from your AirCloud account.
2

Run the code

Replace YOUR_API_KEY with your actual key and choose your preferred language.
import requests

response = requests.post(
    "https://external.aieev.cloud:5007/ai/api/v1/chat/completions",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "model": "qwen/qwen3.5-9b",
        "messages": [{"role": "user", "content": "Hello!"}],
        "temperature": 0.7
    }
)

result = response.json()
print(result["choices"][0]["message"]["content"])

Tags

open-source conversational 9B reasoning multilingual efficient vision