Model ID:
qwen/qwen3.5-35b-a3b | Parameters: 35B (A3B MoE) | Released: 2026-02-26Overview
Qwen 3.5 35B (A3B) is a mixture-of-experts (MoE) large language model designed to deliver strong performance with efficient inference. It uses sparse activation, where only a subset of parameters is activated per token, enabling a better balance between capability and serving cost. Compared to smaller models such as 9B variants, it offers improved reasoning, coding, and analytical performance for more complex workloads. The model supports long-context inference up to 262K tokens and multimodal input (text and image), making it suitable for advanced assistants, backend automation, multimodal understanding, and large-scale inference systems.Air API Playground
Try the model in the playground.
Deploy with Container
Deploy with AIR Container.
API Usage Guide
Learn how to use the API.
Pricing
| Input | Output |
|---|---|
| $0.1625 / 1M tokens | $1.3 / 1M tokens |
Key Features
- Mixture-of-Experts (MoE, A3B) architecture with sparse activation for efficient scaling
- Stronger reasoning and coding performance compared to smaller models such as 9B variants
- Supports long-context inference up to 262,144 tokens
- Multimodal capability with text and image input support (max 1 image per request; sending 2+ images returns a 400 error)
- Strong multilingual understanding and generation
- Efficient inference through sparse expert activation
- Compatible with high-throughput serving engines such as vLLM
- Provided in FP8 format for efficient deployment
Use Cases
Complex Q&A
Handle complex multi-step reasoning and analytical queries.Advanced Code Generation
Generate production-level code and system design explanations.Deep Analysis & Summarization
Perform deeper document understanding and insight extraction.Visual Understanding
Analyze images and extract insights through multimodal reasoning.Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
messages | array | Required | - | List of messages for chat-based generation |
max_tokens | integer | Optional | - | Maximum tokens to generate |
temperature | number | Optional | 1 | Sampling temperature (0.0-2.0) |
top_p | number | Optional | 1 | Nucleus sampling threshold |
frequency_penalty | number | Optional | 0 | Penalty for token frequency |
presence_penalty | number | Optional | 0 | Penalty for token presence |
stream | boolean | Optional | false | Enable streaming responses |
Model Details
| Property | Value |
|---|---|
| Context Length | 262,144 |
| Max Output Length | 262,144 |
| Quantization | fp8 |
| Input Modalities | text, image (max 1 image per request) |
| Output Modalities | text |
| Supported Features | tools, reasoning, streaming, vision, json_mode, logprobs |
| Sampling Parameters | min_p, temperature, presence_penalty, repetition_penalty, stop, top_p, top_k, frequency_penalty, seed |
Quick Start
Get your API key
Generate an API key from your AirCloud account.
Tags
open-source conversational 35B reasoning multilingual moe high-performance
