intbanner-bg

Pricing that scales from idea to production

Start with the fastest model APIs, boost performance with cost-efficient customization, and evolve to compound AI systems to build powerful applications.

Responsive Banner

Serverless Text Models

Text-embedding-3-large is a robust language model by OpenAI

Up to 4B

Base Model

$ 0.085

/1M Tokens | input and output

4.1B - 8B

Base Model

$ 0.17

/1M Tokens | input and output

8.1B - 21B

Base Model

$ 0.255

/1M Tokens | input and output

21.1B - 41B

(e.g. Mistral 8x7B)

$ 0.68

/1M Tokens | input and output

41.1B - 80B

Base Model

$ 0.765

/1M Tokens | input and output

80.1B - 110B

Base Model

$ 1.44

/1M Tokens | input and output

MoE 1B - 56B

(e.g. Mistral 8x7B)

$ 0.425

/1M Tokens | input and output

MoE 56.1B - 176B

(e.g. DBRX, Mistral 8x22B)

$ 0.96

/1M Tokens | input and output

Deepseek-v3

Base Model

$ 0.72

/1M Tokens | input and output

Deepseek-r1

Base Model

$ 6.40

/1M Tokens | input and output

DeepSeek LLM Chat 67B

Base Model

$ 0.765

/1M Tokens | input and output

Yi Large

Base Model

$ 2.55

/1M Tokens | input and output

LLAMA 3 70B

Base Model

$ 0.88

/1M Tokens / input and output

Meta Llama 3.1 405B

Base Model

$ 2.55

/1M Tokens / input and output

Mistral 7B

Base Model

$ 0.25

/1M Tokens | input and output

i

Note: The prices listed are calculated per 1 million tokens, encompassing both input and output tokens for various models, including chat, multimodal, language, and code models. This pricing structure allows users to estimate costs based on their usage of the models in different applications.

Responsive Banner

Image Models

Text-embedding-3-large is a robust language model by OpenAI

All Non-Flux Models

(SDXL, Playground, etc)

$0.000104

(price per step image)

FLUX.1

[dev]

$0.000425

(price per step image)

FLUX.1

[schnell]

$0.0002975

(price per step image)

FLUX.1 Canny

[dev]

$ 0.025

(price per step image)

FLUX.1 Depth

[dev]

$ 0.025

(price per step image)

FLUX.1 Redux

[dev]

$ 0.025

(price per step image)

Pixtral 12B

$ 0.12

(Per 1M token)

i

Note: For image generation models such as SDXL, the pricing is based on the number of inference steps, which refers to the denoising iterations involved in the image creation process. All the FLUX models share the same pricing structure.
The pricing for all FLUX models is based on a standard number of processing steps. Additionally, users should be aware that more steps can enhance the quality and detail of the generated images, making it important to balance cost with desired output quality.

Responsive Banner

Speech-to-text Models

Text-embedding-3-large is a robust language model by OpenAI

Whisper-v3-large

$ 0.001275

/audio min (billed per sec)

Whisper-v3-large-turbo

$ 0.000765

/audio min (billed per sec)

Streaming transcription service

$ 0.00256

/audio min (billed per sec)

i

Note:For speech-to-text models, we bill based on the duration of audio input, charging per second. This pricing structure allows users to efficiently manage costs based on the length of the audio they wish to transcribe.

Responsive Banner

Embedding Models

Text-embedding-3-large is a robust language model by OpenAI

Up to 150M

$ 0.0064

/1M input tokens

150M - 350M

$ 0.0128

/1M input tokens

i

Note: The pricing for embedding models is determined by the quantity of input tokens that the model processes. This means that the cost will vary depending on the length and complexity of the text being analyzed. It means more tokens lead to higher costs.

Responsive Banner

Fine-tuning Models

Text-embedding-3-large is a robust language model by OpenAI

Models up to 16B parameters

$ 0.40

/ 1M tokens in training

Models 16.1B - 80B

$ 2.55

/ 1M tokens in training

MoE 1B - 56B

(e.g. Mistral 8x7B)

$ 1.70

/ 1M tokens in training

MoE 56.1B - 176B

(e.g. DBRX, Mistral 8x22B)

$ 5.10

/ 1M tokens in training

Mistral NeMo

$ 0.85

/ 1M tokens in training

Mistral Small

$ 2.40

/ 1M tokens in training

Codestral

$ 2.55

/ 1M tokens in training

i

Note: The charges are based on the total number of tokens in your fine-tuning dataset, calculated as the dataset size multiplied by the number of epochs. The bill is only for the fine-tuning process itself with no additional fees for deploying fine-tuned models, and the inference costs remain the same as those for the base model. Users can deploy and manage multiple fine-tuned models without incurring extra costs.

Responsive Banner

On-demand Deployments

Text-embedding-3-large is a robust language model by OpenAI

A100 80 GB GPU

$ 2.32

/hour

H100 80 GB GPU

$ 4.64

/hour

H200 141 GB GPU

$ 8.492

/hour

AMD MI300X

$ 4.242

/hour

RTX-6000 48 GB GPU

$ 2.04

/hour

L40S 48 GB GPU

$ 2.88

/hour

V100 80 GB GPU

$ 0.954

/hour

A100 SXM 80 GB GPU

$ 2.92

/hour

i

Note: On-demand deployments are charged based on GPU usage measured in GPU-seconds. The pricing increases proportionally when utilizing multiple GPUs. Additionally, users can optimize their costs by selecting the appropriate model based on their specific workload requirements and budget constraints.