Model ID:
qwen/qwen3-tts-customvoice | Parameters: 1.7B | Released: 2026-01-22Overview
Qwen3-TTS-12Hz-1.7B-CustomVoice is a multilingual text-to-speech model from the Qwen3 family. With 1.7 billion parameters, it delivers high-quality speech synthesis across English, Chinese, Japanese, and Korean. The model features 9 preset voices and supports custom voice cloning, making it versatile for various applications. Operating at a 12Hz token rate, it provides efficient audio generation while maintaining natural-sounding output.Air API Playground
Try the model in the playground.
Deploy with Container
Deploy with AIR Container.
API Usage Guide
Learn how to use the API.
Pricing
| Input | Output |
|---|---|
| $0 / 1M tokens | $0 / 1M tokens |
Key Features
- 1.7B parameter model with high-quality multilingual speech synthesis
- Supports English, Chinese, Japanese, and Korean
- 9 diverse preset voices with custom voice capability
- 12Hz token rate for efficient audio generation
- Built on Qwen3 architecture with strong language understanding
Use Cases
Narration Generation
Generate natural voice narration for video content and audiobooks. Input Text:Life is like a box of chocolates. You never know what you’re gonna get.
Voice Announcements
Create voice announcements and notifications with various voice styles. Input Text:Your order has been confirmed and will be delivered within 3 business days.
Conversational AI Voice
Generate natural voice responses for chatbots and virtual assistants. Input Text:I’d be happy to help you with that! Let me check your account details.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
input | string | Required | - | Text to convert to speech |
voice | enum | Optional | "serena" | Voice preset |
response_format | enum | Optional | "mp3" | Output audio format |
speed | number | Optional | 1 | Speech speed multiplier (0.25-4.0) |
Quick Start
Get your API key
Generate an API key from your AirCloud account.
Tags
open-source tts 1.7B custom-voice multilingual multi-voice
