# ZenCreator Public API — Skill for AI Agents

Programmatic access to ZenCreator's AI generation tools: image generation, face swap, upscaling,
video generation, lipsync and more. All generations are asynchronous: start, poll, fetch outputs.

## Authentication

Send `Authorization: Bearer <API_KEY>` with every request. Keys look like `zc_live_<key_id>_<secret>`
and are created at https://app.zencreator.pro/api-keys (scopes: `read`, `generate`).

- Base URL: https://api.zencreator.pro
- OpenAPI JSON: https://api.zencreator.pro/api/public/openapi.json
- Human documentation: https://api.zencreator.pro/api/public/docs

## Quickstart

1. `GET /api/public/v1/tools` — pick a tool; its `input_schema` (JSON Schema) defines the `input` payload.
2. `POST /api/public/v1/generations` with body `{"tool": "<name>", "input": {...}}`.
   Invalid input returns `422` with the offending field; nothing is charged.
3. Poll `GET /api/public/v1/generations/{id}` every 5-10s until status is `succeeded`, `partial` or `failed`.
   On `failed`/`partial` the `error` field explains why.
4. `GET /api/public/v1/generations/{id}/result` — outputs with `url` and `download_url`.

Tools referencing images/video/audio take **asset IDs**: upload via
`POST /api/public/v1/assets` (multipart: `media_type`, `file`; max 50 MB) and pass the returned `asset_id`.

## Endpoints

Every `GET` endpoint requires the `read` scope; `POST /generations` and `POST /assets` require `generate`.

- `GET /api/public/v1/tools` — list tools with prices and input schemas
- `GET /api/public/v1/tools/{tool_name}` — one tool
- `POST /api/public/v1/generations` — start a generation (header `Idempotency-Key` makes retries safe, 24h)
- `GET /api/public/v1/generations/{id}` — status: `queued` | `processing` | `succeeded` | `partial` | `failed`
- `GET /api/public/v1/generations/{id}/result` — outputs of a finished generation
- `POST /api/public/v1/assets` — upload an input file, returns `asset_id`
- `GET /api/public/v1/loras?model=<model>` — LoRA styles (pass `id` as `lora_id` in tool inputs)
- `GET /api/public/v1/templates` — ready-to-use presets (send its `input`, or pass `id` as `lora_id`)
- `GET /api/public/v1/photoshoot-categories` — categories for the `photoshoot` tool (pass `id` as `category_id`)
- `GET /api/public/v1/balance` — current credit balance
- `GET /api/public/v1/transactions` — credit history (top-ups, spends, refunds)

## Credits

Prices are listed per tool in the catalog below; for many tools the charge depends on the input
(model, duration, resolution, number of images). Credits are charged on acceptance and
refunded automatically when a generation fails. Balance too low → `402 insufficient_credits`.
Top up at https://app.zencreator.pro/billing.

## Rate Limits

Per API key, 60-second sliding window: read 60/min, generate 10/min,
upload 30/min. Watch `X-RateLimit-Remaining` and `Retry-After` headers.

## Errors

Envelope: `{"error": {"code": "...", "message": "..."}}`. Codes: `missing_api_key`/`invalid_api_key` (401),
`insufficient_credits` (402), `insufficient_scope`/`forbidden`/`user_blocked` (403), `not_found` (404),
`payload_too_large` (413), `unsupported_media_type` (415), `validation_error` (422),
`rate_limit_exceeded` (429), `internal_error` (500). A failed generation is NOT an HTTP error —
it is `200` with `"status": "failed"` and an `error` message.

## Tool Catalog

### upscaler

Increase the resolution of an existing image while restoring or adding detail. Best for low-resolution sources that need to be cleaned up and improved; gains are limited on an already-sharp 4K image. Input: one image asset. Output: one higher-resolution image.

**Versions:**
- `basic` / `basic_safe_face` — Baseline upscale; the `*_safe_face` variant preserves the face.
- `natural_clarity` — Cheap and natural-looking, at any size.
- `premium_realism` / `premium_safe_face` — Photorealistic detail; `*_safe_face` preserves the face.
- `ultra_clarity` — Maximum detail and sharpness.

`megapixels` sets the target size and `creativity` how freely the engine may invent new detail.

**Price** (credits):

| `version` | `megapixels` | Credits |
|---|---|---|
| `basic`, `basic_safe_face` | `4`, `16`, `24`, `32` | 1 |
| `basic`, `basic_safe_face` | `8` | 2 |
| `natural_clarity` | *any* | 1 |
| `premium_realism`, `premium_safe_face` | `4`, `16`, `24`, `32` | 2 |
| `premium_realism`, `premium_safe_face` | `8` | 4 |
| `ultra_clarity` | `4`, `8` | 1 |
| `ultra_clarity` | `16` | 2 |
| `ultra_clarity` | `24` | 3 |
| `ultra_clarity` | `32` | 4 |

Input fields:
- `image_asset` (string; required) — Asset ID of the image to upscale (upload via POST /assets).
- `version` (one of: "basic" | "basic_safe_face" | "natural_clarity" | "premium_realism" | "premium_safe_face" | "ultra_clarity"; default: "basic") — Upscale engine. `basic` — fast, faithful enlargement; `basic_safe_face` — basic but protects facial identity; `premium_realism` — adds photoreal texture and detail; `premium_safe_face` — premium while protecting the face; `natural_clarity` — gentle, natural sharpening; `ultra_clarity` — maximum detail and sharpness. Use a `*_safe_face` version when keeping the exact face matters. Default `basic`.
- `megapixels` (one of: 4 | 8 | 16 | 24 | 32; default: 4) — Target output size in megapixels: one of 4, 8, 16, 24, 32. Higher means a larger image and a higher price. Default 4.
- `creativity` (one of: 0 | 1 | 2 | 3 | 4 | 5; default: 3) — How freely the model may invent new detail, 0-5 (0 = stay faithful to the source, 5 = most creative). Default 3.

**Outputs:** `image_asset` (string)

### faceswap

Swap a face on a photo: the face from `face_asset` is placed onto the person in `ref_asset`, keeping the reference image's body, pose and scene. Image only — there is no video face swap. Output: one image with the face replaced.

**Models:**
- `SDXL` *(default)* — Lowest likeness of the set.
- `GENERAL` — Better likeness, but not always stable.
- `GENERAL_ADVANCED` — Improved general swap with stronger identity preservation.
- `FULL_HEAD` — Replaces the entire head, not just the face — use it when the target's hairstyle or head shape differs strongly from the source.

**Price:** 1 credit.

Input fields:
- `ref_asset` (string; required) — Asset ID of the target image whose face will be replaced.
- `face_asset` (string; required) — Asset ID of the image providing the new face.
- `model` (one of: "FULL_HEAD" | "GENERAL" | "GENERAL_ADVANCED" | "SDXL"; default: "SDXL") — Face-swap model (e.g. a general model, an SDXL model, an advanced model and a full-head variant).
- `resolution` (one of: "2K" | "4K"; default: "4K") — Output resolution, e.g. `2K` or `4K`.

**Outputs:** `image_asset` (string)

### by_prompt

Generate an image from a text prompt, with no input image (text-to-image). The entry point for creation: use it when there is no source image and the picture is described in words — building a character or content from scratch, concepts, backgrounds, NSFW from a description, quick drafts and final hero shots. Supports fast/quality modes, batches, aspect ratios and body-shape LoRAs.

**Three content groups (pick by what you need):**
- **Top cloud models, censored** (`GENERAL_NSFW`, `NANO_BANANA`) — highest quality and realism, but they block full NSFW content.
- **Uncensored, not built for full NSFW** (`WAN_2_7_IMAGE`, `QWEN_IMAGE`, `SEEDREAM_5`) — high quality and won't block, but won't create full NSFW content from scratch; they accurately **transform NSFW references** you provide.
- **Local, built for full NSFW** (`SDXL_NSFW`, `FLUX_KLEIN_NSFW`) — slightly lower quality and more artifacts, but real full-NSFW capability.

**Models:**
- `GENERAL_NSFW` *(default, trusted)* — General-purpose workhorse with a good quality/speed/price balance and strong facial likeness; the NSFW version is uncensored but does not produce full NSFW anatomy from text alone (it covers it up). Older model — occasional hand/limb artifacts.
- `GENERAL_SFW` — The same pipeline, SFW only.
- `SDXL_NSFW` *(trusted)* — Best choice for full NSFW anatomy from text alone (it knows anatomy from training). Local model: slightly lower quality, more artifacts. Text-only — does not accept reference images.
- `WAN_2_7_IMAGE` / `WAN_2_7_IMAGE_PRO` — Modern model with higher quality and detail (Pro = top consistency). Renders bodies, scenes and composition more aesthetically and has fewer hand/limb artifacts. Weaker at in-image text. Uncensored, but transforms your NSFW references rather than inventing full NSFW content.
- `WAN` — Legacy WAN image model; prefer `WAN_2_7_IMAGE`.
- `QWEN_IMAGE` / `QWEN_IMAGE_PRO` — Aesthetic results with good facial likeness and few artifacts; great for stylized / illustrative / anime subjects. Pro adds realism. Uncensored; transforms NSFW references.
- `SEEDREAM_5` — Newer generation: better prompt understanding, stronger stylization, better likeness, fewer artifacts. Uncensored; transforms NSFW references.
- `NANO_BANANA` — Among the best for realism, and the only model that reliably renders legible in-image text (posters, signage, captions); strong real-world knowledge. Heavily censored — won't produce even mildly suggestive content. Weaker facial likeness.
- `FLUX_KLEIN_NSFW` *(trusted)* — The most advanced local full-NSFW model: creates uncensored NSFW content and also works with references — bring a character's face and create an action. Slightly lower quality, occasional artifacts.

Models marked *(trusted)* unlock NSFW behaviour only for trusted accounts (or accounts with an NSFW content policy).

**Price** (credits):

| `model` | `mode` | Credits |
|---|---|---|
| `FLUX_KLEIN_NSFW`, `NANO_BANANA`, `QWEN_IMAGE_PRO`, `WAN_2_7_IMAGE_PRO` | *any* | 2 |
| `GENERAL_NSFW`, `GENERAL_SFW`, `QWEN_IMAGE`, `SEEDREAM_5`, `WAN`, `WAN_2_7_IMAGE` | *any* | 1 |
| `SDXL_NSFW` | `fast` | 1 |
| `SDXL_NSFW` | `quality` | 2 |

Input fields:
- `positive_prompt` (string; required) — Text describing what to generate.
- `negative_prompt` (string; default: "") — Text describing what to avoid in the image.
- `loras` (object or null; default: null) — Named LoRA styles to apply, as a map of LoRA name to strength (typically 0-1). The allowed names are the keys listed for this field; see the model prompt guide for what each style does.
- `batch_size` (integer; default: 1) — How many images to generate in one call (default 1).
- `model` (one of: "FLUX_KLEIN_NSFW" | "GENERAL_NSFW" | "GENERAL_SFW" | "NANO_BANANA" | "QWEN_IMAGE" | "QWEN_IMAGE_PRO" | "SDXL_NSFW" | "SEEDREAM_5" | "WAN" | "WAN_2_7_IMAGE" | "WAN_2_7_IMAGE_PRO"; default: "GENERAL_NSFW") — Generation model, covering the general/SDXL NSFW, WAN (incl. WAN 2.7 image), Nano Banana, Seedream, Qwen and Flux Klein families; `*_PRO` variants trade speed for higher quality.
- `ratio` (one of: "16:9" | "1:1" | "21:9" | "2:3" | "3:2" | "3:4" | "4:3" | "9:16"; default: "1:1") — Aspect ratio: 1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2 or 21:9.
- `mode` (one of: "fast" | "quality"; default: "fast") — Generation mode: `fast` (quicker and cheaper) or `quality` (slower, higher fidelity).
- `width` (integer or null; default: null) — Optional output width in pixels; if omitted a size is derived from `ratio`.
- `height` (integer or null; default: null) — Optional output height in pixels; if omitted a size is derived from `ratio`.

**Outputs:** `image_asset` (string)

### by_ref

Generate a similar image from a reference photo (image-to-image). With `GENERAL` you can bring a character's face plus a reference photo and get a similar shot featuring **your** character; with `SDXL` you bring only a photo and get a similar one. *Legacy tool — for most real tasks `image_editor` is more flexible and is the recommended choice.*

**Models:**
- `SDXL` *(default)* — Local, NSFW-capable. Input is a photo only.
- `GENERAL` — Higher quality and realism; can carry a character's face into a reference-like shot.

**Price** (credits):

| `model` | `mode` | Credits |
|---|---|---|
| `GENERAL` | *any* | 1 |
| `SDXL` | `fast` | 1 |
| `SDXL` | `quality` | 2 |

Input fields:
- `ref_asset` (string; required) — Asset ID of the reference image to guide the generation.
- `face_asset` (string or null; default: null) — Optional asset ID of a face to preserve in the result.
- `model` (one of: "GENERAL" | "SDXL"; default: "SDXL") — Generation model.
- `mode` (one of: "fast" | "quality"; default: "fast") — Generation mode (e.g. `fast` vs `quality`).
- `positive_prompt` (string; default: "") — Text describing what to generate.
- `negative_prompt` (string; default: "") — Text describing what to avoid.
- `batch_size` (integer; default: 4) — Number of images to generate, 1-10. Default 4.
- `strength` (number; default: 1.0) — How strongly the reference image constrains the result, 0.0-1.0 (higher sticks closer to the reference).
- `keep_pose` (boolean; default: false) — Preserve the pose of the reference image. Default false.
- `swap_background` (boolean; default: false) — Replace the background of the reference image. Default false.
- `loras` (object or null; default: null) — Named LoRA styles to apply, as a map of LoRA name to strength (typically 0-1). The allowed names are the keys listed for this field; see the model prompt guide for what each style does.

**Outputs:** `image_assets` (array[string])

### photoshoot

Run a consistent photoshoot of one character from a photo of the face and a photo of the body (without a face): each shot is generated from scratch, so poses come out organically different between frames — unlike `image_editor`, which keeps the source composition. The tool feeds the references through prepared prompt presets and returns a batch of images. Presets are grouped by category, so you can produce a set in a given style or action. Strength: reproducible, satisfying results with no manual prompting — at the cost of fine-grained control.

**Price:** 3 credits per image (multiplied by `number_of_images`).

Input fields:
- `face_ref` (string; required) — Asset ID of the face reference (upload via POST /assets).
- `body_ref` (string; required) — Asset ID of the body/pose reference (upload via POST /assets).
- `prompt` (string; default: "") — Free-text description of the shot (scene, outfit, lighting). Provide `prompt` for a custom shot, or `category_id` to use a curated category instead. If both are given, `category_id` takes precedence.
- `category_id` (string or null; default: null) — ID of a curated photoshoot category (see GET /photoshoot-categories). When set, the shot's prompts are taken from the category and `image_count` images are produced; it takes precedence over `prompt`.
- `image_count` (integer; default: 1) — Total number of images to generate (min 1). Drives how many shots are produced from the chosen `category_id` (or how many times `prompt` is repeated).
- `ratio` (one of: "16:9" | "1:1" | "21:9" | "2:3" | "3:2" | "3:4" | "4:3" | "9:16"; default: "1:1") — Aspect ratio of the output image.
- `width` (integer; default: 1024) — Output width in pixels (default 1024).
- `height` (integer; default: 1024) — Output height in pixels (default 1024).
- `number_of_images` (integer; default: 1) — Parallel images per shot (default 1); leave at 1 — use `image_count` to control the total.

**Outputs:** `image_assets` (array[string])

### facegen

Generate a brand-new face from structured attributes: gender, age, ethnicity, body type, eye/hair/beard colour, hairstyle, beard and makeup. No reference image; returns several variants per request. Strength: full parametric control over the appearance — use it to create a fresh persona to reuse in other tools. For each attribute the accepted values are listed in that field's enum. Niche — used occasionally.

**Price:** 1 credit.

Input fields:
- `gender` (one of: "Female" | "Male"; required) — Gender of the face.
- `age` (one of: "18-25" | "26-35" | "36-45" | "46-55" | "56-65" | "66-75" | "75-85" | "85+"; required) — Apparent age band.
- `origin` (one of: "Central American" | "Central Asian" | "East Asian" | "European" | "Middle Eastern" | "North African" | "North American" | "Oceanian / Pacific Islander" | "South American" | "South Asian" | "Southeast Asian"; required) — Ethnic origin / look.
- `body_type` (one of: "Chubby" | "Curvy" | "Fit" | "Muscular" | "Normal" | "Slim"; default: null) — Body type.
- `eye_color` (one of: "Albino" | "Amber" | "Blue" | "Brown" | "Gray" | "Green" | "Hazel" | "Heterochromia" | "Red" | "Violet"; default: null) — Eye colour.
- `hair_color` (one of: "Black" | "Blonde" | "Blue" | "Brunette" | "Gray" | "Green" | "Honey" | "Ombre" | "Pink" | "Platinum" | "Red" | "Salt and Pepper" | "Violet"; default: null) — Hair colour.
- `hair_type` (one of: "Bald" | "Coily" | "Curly" | "Frizzy" | "Silky" | "Straight" | "Thick" | "Thin" | "Wavy"; default: null) — Hair type, e.g. straight/wavy/curly.
- `hair_style` (one of: "Afro" | "Bob" | "Bowl" | "Braids" | "Bun" | "Buzzcut" | "Caesar" | "Mohawk" | "Pigtails" | "Pixie" | "Ponytail" | "Shag" | "Undercut" | "Updo"; default: null) — Hair style.
- `beard` (one of: "Anchor Beard" | "Balbo Beard" | "Chevron Mustache" | "Chinstrap Beard" | "Circle Beard" | "Corporate Beard" | "Ducktail Beard" | "Friendly Mutton Chops" | "Full Beard" | "Garibaldi Beard" | "Goatee" | "Handlebar Mustache" | "Horseshoe Mustache" | "Imperial Mustache" | "Mutton Chops" | "Pencil Mustache" | "Soul Patch" | "Stubble Beard" | "Van Dyke Beard" | "Zappa Mustache"; default: null) — Beard style.
- `beard_color` (one of: "Black" | "Blonde" | "Blue" | "Brunette" | "Gray" | "Green" | "Honey" | "Ombre" | "Pink" | "Platinum" | "Red" | "Salt and Pepper" | "Violet"; default: null) — Beard colour.
- `makeup` (one of: "Anime Makeup" | "Artistic Makeup" | "Avant-garde Makeup" | "Bohemian Makeup" | "Boho Makeup" | "Classic Makeup" | "Cut Crease Makeup" | "Dewy Makeup" | "Edgy Makeup" | "Festival Makeup" | "Glam Makeup" | "Glowy Makeup" | "Golden Makeup" | "Monochromatic Makeup" | "Natural Makeup" | "Party Makeup" | "Retro Makeup" | "Sunset Eye Makeup" | "Vintage Makeup" | "Watercolor Makeup"; default: null) — Makeup style.

**Outputs:** `image_assets` (array[string])

### carousel

Bring an image and get the same subject from different camera angles (up to 10). Use it for social-media carousels and a "3D"/product overview of an object or character. Niche — used occasionally.

**Price:** 1 credit per image (multiplied by `number_of_images`).

Input fields:
- `asset_ref` (string; required) — Asset ID of the subject/reference image.
- `ratio` (one of: "16:9" | "1:1" | "21:9" | "2:3" | "3:2" | "3:4" | "4:3" | "9:16"; default: "1:1") — Aspect ratio of the outputs (e.g. 1:1, 16:9, 9:16).
- `width` (integer; default: 1024) — Output width in pixels (used when `ratio` is not set).
- `height` (integer; default: 1024) — Output height in pixels (used when `ratio` is not set).
- `number_of_images` (integer; default: 4) — Number of angle variations to generate, 1-10. Default 4.

**Outputs:** `image_assets` (array[string])

### videogen

Animate a photo into a video (image-to-video). Cost depends on the model, duration (5-15 s) and resolution (480p-1080p). Supports prompt enhancement, fixed camera, LoRAs and optional audio. Quick guide: general content (best price/quality) — Seedance; NSFW content — `wan@2.7-nsfw` or `wan@2.2-lora`; tasteful content with complex actions — Kling (censored).

**Wan** — the `wan@2.7` line understands prompts best and animates a first frame well (1080p, good physics, up to 5 references):
- `wan@2.7` — Latest line; top prompt understanding and first-frame animation.
- `wan@2.7-nsfw` — Wan 2.7 for NSFW; the best choice when you have a first frame (or a frame with an action) to animate.
- `wan@2.6` / `wan@2.6-flash` — Cheaper and older; **flash** is even cheaper and faster.
- `wan@2.5` — Older generation; prefer `wan@2.6` or `wan@2.7`.
- `wan@2.2` — Animates a first frame; uncensored NSFW base.
- `wan@2.2-lora` — Presets with action-trained LoRAs that turn a first frame into a complex action. Includes "Blink" LoRAs: bring any photo of your character and the frame morphs into the desired NSFW action. The **easiest option for beginners** — no prompt needed, just a photo similar to the example's first frame.

**Kling** — censored, but animates a first frame well; newer versions cost more and understand prompts and complex actions better:
- `kling@2.6` — Top motion and physics, plus native audio (dialogue, effects).
- `kling@2.5` — High quality, cheaper, consistent at volume.
- `kling@2.1` — Stable motion.
- `kling@1.6` — Oldest and cheapest Kling.

**Seedance** — uncensored; best price/quality balance for content:
- `seedance_pro_fast` — Faster and cheaper, less "smart".
- `seedance_pro` — Pricier and smarter.
- `seedance_v1_5_pro` — Best quality and result; joint audio+video, micro-expressions, first+last frame.

**Grok:**
- `grok@4.1` — Censored; animates a first frame, top image-to-video, always with native audio.

Model capabilities differ — read the `model`, `last_frame` and `resolution` fields for which model supports what.

**Price** (credits):

| `model` | `duration` | `generate_audio` | Credits (480p) | Credits (720p) | Credits (1080p) |
|---|---|---|---|---|---|
| `grok@4.1` | `5` | *any* | 10 | 15 | 15 |
| `grok@4.1` | `6` | *any* | 12 | 18 | 18 |
| `grok@4.1` | `8` | *any* | 16 | 24 | 24 |
| `grok@4.1` | `10` | *any* | 20 | 30 | 30 |
| `grok@4.1` | `12` | *any* | 24 | 36 | 36 |
| `grok@4.1` | `15` | *any* | 30 | 45 | 45 |
| `kling@1.6`, `kling@2.1`, `kling@2.5` | `5`, `6`, `8`, `12`, `15` | *any* | 7 | 7 | 7 |
| `kling@1.6`, `kling@2.1`, `kling@2.5` | `10` | *any* | 13 | 13 | 13 |
| `kling@2.6` | `5` | `true` | 10 | 10 | 10 |
| `kling@2.6` | `5` | `false` | 5 | 5 | 5 |
| `kling@2.6` | `6` | `true` | 12 | 12 | 12 |
| `kling@2.6` | `6` | `false` | 6 | 6 | 6 |
| `kling@2.6` | `8` | `true` | 16 | 16 | 16 |
| `kling@2.6` | `8` | `false` | 8 | 8 | 8 |
| `kling@2.6` | `10` | `true` | 20 | 20 | 20 |
| `kling@2.6` | `10` | `false` | 10 | 10 | 10 |
| `kling@2.6` | `12` | `true` | 24 | 24 | 24 |
| `kling@2.6` | `12` | `false` | 12 | 12 | 12 |
| `kling@2.6` | `15` | `true` | 30 | 30 | 30 |
| `kling@2.6` | `15` | `false` | 15 | 15 | 15 |
| `seedance_pro` | `5` | *any* | 4 | 8 | 15 |
| `seedance_pro`, `seedance_pro_fast` | `6`, `8`, `12`, `15` | *any* | 7 | 7 | 7 |
| `seedance_pro` | `10` | *any* | 8 | 15 | 30 |
| `seedance_pro_fast` | `5` | *any* | 2 | 4 | 7 |
| `seedance_pro_fast` | `10` | *any* | 4 | 8 | 13 |
| `seedance_v1_5_pro` | `5` | `true` | 8 | 8 | 16 |
| `seedance_v1_5_pro` | `5` | `false` | 4 | 4 | 8 |
| `seedance_v1_5_pro` | `6`, `8`, `15` | `true` | 8 | 8 | 8 |
| `seedance_v1_5_pro` | `6`, `8`, `15` | `false` | 4 | 4 | 4 |
| `seedance_v1_5_pro` | `10` | `true` | 8 | 16 | 32 |
| `seedance_v1_5_pro` | `10` | `false` | 4 | 8 | 16 |
| `seedance_v1_5_pro` | `12` | `true` | 8 | 20 | 40 |
| `seedance_v1_5_pro` | `12` | `false` | 4 | 10 | 20 |
| `wan@2.2` | `5`, `6`, `10`, `12`, `15` | *any* | 10 | 10 | 10 |
| `wan@2.2` | `8` | *any* | 13 | 13 | 13 |
| `wan@2.2-lora` | *any* | *any* | 10 | 10 | 10 |
| `wan@2.5`, `wan@2.7` | `5` | *any* | 5 | 10 | 15 |
| `wan@2.5`, `wan@2.7` | `6` | *any* | 6 | 12 | 18 |
| `wan@2.5`, `wan@2.7` | `8` | *any* | 8 | 16 | 24 |
| `wan@2.5`, `wan@2.7` | `10` | *any* | 10 | 20 | 30 |
| `wan@2.5`, `wan@2.7` | `12` | *any* | 12 | 24 | 36 |
| `wan@2.5`, `wan@2.7` | `15` | *any* | 15 | 30 | 45 |
| `wan@2.6` | `5` | *any* | 10 | 10 | 15 |
| `wan@2.6`, `wan@2.7-nsfw` | `6` | *any* | 12 | 12 | 18 |
| `wan@2.6`, `wan@2.7-nsfw` | `8` | *any* | 16 | 16 | 24 |
| `wan@2.6` | `10` | *any* | 20 | 20 | 30 |
| `wan@2.6`, `wan@2.7-nsfw` | `12` | *any* | 24 | 24 | 36 |
| `wan@2.6` | `15` | *any* | 30 | 30 | 45 |
| `wan@2.6-flash` | `5` | `true` | 5 | 5 | 7 |
| `wan@2.6-flash` | `5`, `6` | `false` | 3 | 3 | 4 |
| `wan@2.6-flash` | `6` | `true` | 6 | 6 | 9 |
| `wan@2.6-flash` | `8` | `true` | 8 | 8 | 12 |
| `wan@2.6-flash` | `8` | `false` | 4 | 4 | 6 |
| `wan@2.6-flash` | `10` | `true` | 10 | 10 | 14 |
| `wan@2.6-flash` | `10` | `false` | 5 | 5 | 7 |
| `wan@2.6-flash` | `12` | `true` | 12 | 12 | 17 |
| `wan@2.6-flash` | `12` | `false` | 6 | 6 | 8 |
| `wan@2.6-flash` | `15` | `true` | 15 | 15 | 21 |
| `wan@2.6-flash` | `15` | `false` | 8 | 8 | 10 |
| `wan@2.7-nsfw` | `5` | *any* | 10 | 13 | 17 |
| `wan@2.7-nsfw` | `10` | *any* | 20 | 26 | 34 |
| `wan@2.7-nsfw` | `15` | *any* | 30 | 39 | 51 |

Input fields:
- `ref_asset` (string; required) — Asset ID of the starting image to animate (image-to-video).
- `model` (one of: "grok@4.1" | "kling@1.6" | "kling@2.1" | "kling@2.5" | "kling@2.6" | "seedance_pro" | "seedance_pro_fast" | "seedance_v1_5_pro" | "wan@2.2" | "wan@2.2-lora" | "wan@2.5" | "wan@2.6" | "wan@2.6-flash" | "wan@2.7" | "wan@2.7-nsfw"; default: "wan@2.2") — Video model. Kling: `kling@1.6`, `kling@2.1` (supports last_frame), `kling@2.5`, `kling@2.6`. WAN: `wan@2.2` (image-to-video, no last_frame), `wan@2.2-lora` (same as wan@2.2 but accepts `loras`), `wan@2.5`, `wan@2.6`, `wan@2.6-flash` (faster), `wan@2.7`, `wan@2.7-nsfw`. Seedance: `seedance_pro_fast`, `seedance_pro` (supports last_frame), `seedance_v1_5_pro` (supports last_frame). Grok: `grok@4.1`. Default `wan@2.2`.
- `last_frame` (string or null; default: null) — Asset ID of the target final frame: the video interpolates from `ref_asset` to this frame. Only honoured by models that support it (`kling@2.1`, `seedance_pro`, `seedance_v1_5_pro`); WAN models (incl. wan@2.2 and wan@2.7) do not support a last frame.
- `audio_asset` (string or null; default: null) — Asset ID of an audio track to drive the video, for models that support audio input.
- `prompt` (string; default: "") — Text describing the desired motion and scene.
- `duration` (one of: 5 | 6 | 8 | 10 | 12 | 15; default: 5) — Clip length in seconds: one of 5, 6, 8, 10, 12, 15. The values a model accepts depend on the model. Default 5.
- `negative_prompt` (string; default: "") — Text describing what to avoid in the video.
- `resolution` (one of: "1080p" | "480p" | "720p"; default: "720p") — Output resolution. Most models accept `720p` and `1080p`. `480p` is NOT supported by `wan@2.6`, `wan@2.6-flash`, `wan@2.7`, `wan@2.7-nsfw` or `seedance_v1_5_pro` (use 720p or 1080p for those). `grok@4.1` accepts only `480p` or `720p`. Default `720p`.
- `generate_audio` (boolean; default: true) — Generate an audio track too, for models that support it. Default true.
- `camera_fixed` (boolean; default: false) — Keep the camera fixed instead of letting it move. Default false.
- `prompt_extend` (boolean; default: false) — Let the model auto-expand and enhance your prompt. Default false.
- `loras` (object or null; default: null) — LoRA styles to apply, as a map of LoRA id to scale settings. Only `wan@2.2-lora` uses this; browse compatible ids via GET /loras?model=wan@2.2.

**Outputs:** `video_asset` (string)

### image_editor

Edit and composite existing images by prompt — the main, most flexible image tool. Bring references, edit and combine them; bring your character and dress them from a reference photo; keep a product or object **exactly** (fabric, pattern, shape) while changing the scene. It offers both SFW and NSFW models, plus LoRA presets that extend NSFW capability. You are limited mainly by your imagination. Edits inherit the source composition (pose, framing), so for a photoshoot-style series of varied organic poses of one character use `photoshoot` instead.

**Models:**
- `GENERAL_NSFW` *(default, trusted)* — Universal default, uncensored, good facial likeness; general NSFW edits such as outfit or pose changes. Older model — occasional limb artifacts.
- `NANO_BANANA` — High realism and the most precise prompt-driven edits; **required for any edit involving in-image text**. Heavily censored (no NSFW); weaker likeness.
- `QWEN_IMAGE` / `QWEN_IMAGE_PRO` — Aesthetic, good likeness, few artifacts; Pro adds realism. Uncensored; transforms your NSFW references rather than creating full NSFW content from scratch.
- `SEEDREAM_5` — Newer than the default: better prompt understanding, stronger stylization, better likeness, fewer artifacts. Uncensored; transforms NSFW references.
- `WAN_2_7_IMAGE` / `WAN_2_7_IMAGE_PRO` — Aesthetic bodies and composition, few artifacts, precise editing; Pro = higher quality. Uncensored; transforms NSFW references. Weaker at in-image text.
- `FLUX_KLEIN_NSFW` *(trusted)* — Local flagship for full NSFW with references: it knows anatomy and accepts a face reference. Slightly lower quality, occasional artifacts.
- `FLUX_KLEIN_LORA` *(trusted)* — LoRA presets that extend NSFW capability (including undress presets) and style templates; pass a `lora_id` (browse via GET /templates).

Models marked *(trusted)* unlock NSFW behaviour only for trusted accounts (or accounts with an NSFW content policy).

**Price** (credits per image, multiplied by `number_of_images`):

| `model` | Credits |
|---|---|
| `FLUX_KLEIN_LORA` | 3 |
| `FLUX_KLEIN_NSFW`, `GENERAL_NSFW`, `QWEN_IMAGE`, `SEEDREAM_5`, `WAN_2_7_IMAGE` | 1 |
| `NANO_BANANA` | 4 |
| `QWEN_IMAGE_PRO`, `WAN_2_7_IMAGE_PRO` | 2 |

Input fields:
- `image_assets` (array[string]; required) — Asset ID(s) of the source image(s). Pass a single id to edit one image, or several (as an array) to combine them into one result (unless `batch_mode` is set).
- `prompt` (string; required) — Instruction describing the edit to apply.
- `model` (one of: "FLUX_KLEIN_LORA" | "FLUX_KLEIN_NSFW" | "GENERAL_NSFW" | "NANO_BANANA" | "QWEN_IMAGE" | "QWEN_IMAGE_PRO" | "SEEDREAM_5" | "WAN_2_7_IMAGE" | "WAN_2_7_IMAGE_PRO"; default: "GENERAL_NSFW") — Editing model. `GENERAL_NSFW` — general-purpose editor (default, allows NSFW); `NANO_BANANA` — Google Nano Banana; `QWEN_IMAGE` / `QWEN_IMAGE_PRO` — Qwen editors (Pro = higher quality); `SEEDREAM_5` — Seedream; `FLUX_KLEIN_NSFW` — Flux Klein editor; `FLUX_KLEIN_LORA` — Flux Klein with a style template (requires `lora_id`); `WAN_2_7_IMAGE` / `WAN_2_7_IMAGE_PRO` — WAN 2.7 editors (Pro = higher quality).
- `ratio` (one of: "16:9" | "1:1" | "21:9" | "2:3" | "3:2" | "3:4" | "4:3" | "9:16"; default: null) — Aspect ratio (e.g. 1:1, 16:9, 9:16). Overrides width/height when set.
- `width` (integer; default: 1024) — Output width in pixels (default 1024). Used when `ratio` is not set.
- `height` (integer; default: 1024) — Output height in pixels (default 1024). Used when `ratio` is not set.
- `batch_mode` (boolean; default: false) — When several `image_assets` are given, edit each one separately (one result per image) instead of combining them into a single image. Default false.
- `number_of_images` (integer; default: 1) — How many images to generate (default 1).
- `sequential_generation` (boolean; default: false) — Generate the requested images one after another, each conditioned on the previous result (for consistent, story-like sequences) instead of independently. Default false.
- `lora_id` (string or null; default: null) — ID of a LoRA style template to apply (required when `model=FLUX_KLEIN_LORA`). Browse available templates and their ids via GET /templates.

**Outputs:** `image_assets` (array[string])

### collaber

Put two characters together in one frame: bring two reference subjects and an optional background/location photo; the tool combines them into a single scene (1-4 images). Strength: keeps both characters' likeness — a convenient preset for collabs and duets.

**Price:** 1 credit per image (multiplied by `number_of_images`).

Input fields:
- `asset_image_1` (string; required) — Asset ID of the first persona/subject.
- `asset_image_2` (string; required) — Asset ID of the second persona/subject.
- `location_asset` (string or null; default: null) — Optional asset ID of a location/background reference for the scene.
- `prompt` (string or null; default: null) — What the two subjects are doing together and the scene to place them in.
- `ratio` (one of: "16:9" | "1:1" | "21:9" | "2:3" | "3:2" | "3:4" | "4:3" | "9:16"; default: "1:1") — Aspect ratio of the output (e.g. 1:1, 16:9, 9:16).
- `width` (integer; default: 1024) — Output width in pixels (used when `ratio` is not set).
- `height` (integer; default: 1024) — Output height in pixels (used when `ratio` is not set).
- `number_of_images` (integer; default: 1) — Number of images to generate, 1-4. Default 1.

**Outputs:** `image_assets` (array[string])

### video2video

Transfer motion or replace a character in a video using a reference video or an Instagram/TikTok URL; the original soundtrack can be kept. SFW and NSFW modes. Provide the source as exactly one of `video_asset`, `instagram_url` or `tiktok_url`. Output: one video.

**Modes:**
- `kling_2_6_sfw` — Handles character replacement best (censored); billed per second.
- `replace_sfw` / `replace_nsfw` — Replace the person in the source video with your image's subject; `replace_nsfw` is uncensored.
- `animate_sfw` / `animate_nsfw` — Motion transfer / animation of your image; `animate_nsfw` is uncensored.
- `dreamactor_m2` — Same character-replacement logic, uncensored.

*Note:* the uncensored modes trade some quality and prompt understanding — you may need to change the input (the source video or the character) to get a good result on the first try.

**Price:** depends on `mode` and the duration of the source video (`video_asset` / `instagram_url` / `tiktok_url`): `dreamactor_m2` — 1 credit per second; `kling_2_6_sfw` — 2 credits per second (both capped at 30 seconds of source video); other modes — at 480p flat 7 credits for videos up to 5 seconds, then 1.4 credits per second; at 720p flat 13 credits up to 5 seconds, then 2.8 credits per second; at 1080p always 2.8 credits per second.

Input fields:
- `image_asset` (string; required) — Asset ID of the image whose subject is animated or swapped in.
- `video_asset` (string or null; default: null) — Asset ID of the source/driving video (one source only).
- `instagram_url` (string or null; default: null) — Instagram post/reel URL used as the source video (one source only).
- `tiktok_url` (string or null; default: null) — TikTok post/reel URL used as the source video (one source only).
- `mode` (one of: "animate_nsfw" | "animate_sfw" | "dreamactor_m2" | "kling_2_6_sfw" | "replace_nsfw" | "replace_sfw"; default: "replace_sfw") — How the image and source video are combined. `replace_sfw` / `replace_nsfw` — replace the person in the source video with your image's subject; `animate_sfw` / `animate_nsfw` — animate your image using the source video's motion; `kling_2_6_sfw` — Kling 2.6 driven animation; `dreamactor_m2` — DreamActor M2 driven animation.
- `resolution` (one of: "1080p" | "480p" | "720p"; default: "720p") — Output resolution: `480p`, `720p` or `1080p`.
- `keep_original_sound` (boolean; default: true) — Keep the source video's audio (only for `kling_2_6_sfw`). Default true.

**Outputs:** `video_asset` (string)

### lipsync

Make a character speak your audio (talking head): bring an audio file and a portrait first frame, and get a video in which the character talks or sings in sync. Up to 35 seconds of audio; image JPG/PNG under 5 MB. Use it to voice a character or avatar. Niche — used occasionally. Priced at 3 credits per second of audio.

**Price:** 3 credits per second of audio, measured from the uploaded `audio_asset` (audio longer than 35 seconds is rejected).

Input fields:
- `image_asset` (string; required) — Asset ID of the portrait image (JPG/PNG, under 5MB, up to 4096x4096).
- `audio_asset` (string; required) — Asset ID of the audio track to lip-sync to (under 35 seconds).
- `prompt` (string; default: "") — Optional text to guide expression/performance.
- `model` (one of: "GENERAL_NSFW"; default: "GENERAL_NSFW") — Lip-sync model (e.g. `GENERAL_NSFW`).

**Outputs:** `video_asset` (string)

### undress

Fully removes clothing from the character in an image. Input: one image. Output: one image. A male variant exists as a separate tool for trusted accounts. This is a convenience preset built on a Flux Klein LoRA — the same result is available through `image_editor` with the Flux Klein LoRA presets, including presets that handle paired photos.

**Price:** 3 credits.

Input fields:
- `image_asset` (string; required) — Asset ID of the image to process.

**Outputs:** `image_asset` (string)

### video_upscaler

Bring a medium-quality video and get a sharper, higher-resolution result, up to 5K on the longest side. Use it as a final polish or to restore low-quality footage. Niche — used occasionally. Priced by output megapixels x duration x a frame-rate multiplier.

**Price:** 2 credits x output megapixels x duration in seconds x FPS multiplier (x1 up to 30 fps, x2 up to 60 fps, x3 above). Output size is the source resolution multiplied by `scale_factor`; `"max"` upscales to at most 5120 px on the longest side.

Input fields:
- `video_asset` (string; required) — Asset ID of the video to upscale.
- `scale_factor` (string; optional) — How much to enlarge the video. `max` upscales to at most 5120px on the longest side; otherwise a multiplier of the source size: 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 75 or 100.

**Outputs:** `video_asset` (string)

### video_merger

Concatenate 2-5 video clips with transitions

**Price:** 1 credit.

Input fields:
- `clips` (array[object]; required) — Clips to concatenate (asset_id + trim + duration)
- `transition` (one of: "cut" | "dissolve" | "fade" | "slide"; default: "cut") — cut | fade | slide | dissolve
- `keep_audio` (boolean; default: true) — Preserve audio tracks
- `fps` (integer; default: 30) — Output FPS (24 or 30)
- `width` (integer; default: 1280) — Output width
- `height` (integer; default: 720) — Output height

**Outputs:** `video_asset` (string)
