Endpoints
Available on Request
Reference
Need more?
We expose additional endpoints on request — model training, transcription, labeling, and more.
Talk to us →/v2/models/inference Multi-Model Inference
Run multiple models against the same texts in a single request. Every model executes in parallel — total latency is roughly the same as a single model call. This is the backbone of hierarchical classification.
Real-world scenario
A customer calls about their mobile broadband bill. From a single transcript, you need to know: which product? what issue? what root cause? what sentiment? With multi-model inference, you run 4 models in one request and get all four answers back in the time it takes to run one.
Product
Mobile Broadband
Issue type
Billing
Root cause
Incorrect charge
Sentiment
Frustrated
Hierarchical classification
Most real-world use cases require multiple classification dimensions. A single customer interaction carries signal about the product, the issue, the root cause, and the customer's emotional state. Instead of chaining sequential API calls, multi-model inference lets you define all models upfront and get every dimension classified in a single round-trip.
Each model in the model_settings array is independent — it sees the same input texts but applies its own labels and confidence thresholds. You control how many predictions each model returns and which labels to filter for.
Request body
| Parameter | Type | Description |
|---|---|---|
texts | string[] | Required Texts to classify. Max 8 per request. |
model_settings | object[] | Required
Array of model configurations. Each object accepts:
|
Per-model settings
Each model in the array can be configured independently. This is important because different classification tasks have different requirements:
max_predictions
Controls how many labels are returned per text. Set to 1 for single-label tasks like sentiment, or 3 for multi-label tasks where a conversation might touch several topics. Omit to get all labels with their scores.
label_filter
Restricts the response to specific labels. A sentiment model might have 5 labels (Very positive, Positive, Neutral, Negative, Very negative), but your workflow only needs Positive vs. Negative. Filtering reduces payload size and simplifies downstream logic.
Request example
Classify a single transcript across four models — product, issue type, root cause, and sentiment — in one call.
POST /v2/models/inference
Authorization: Bearer your-api-key
Content-Type: application/json
{
"texts": [
"Hi, I'm calling about my mobile broadband. I was charged 349 kr this month but my plan is supposed to be 199 kr. This has happened two months in a row and I'm really frustrated. I've been a customer for six years and I'm starting to think about switching."
],
"model_settings": [
{
"model_id": 42,
"max_predictions": 1
},
{
"model_id": 87,
"max_predictions": 2
},
{
"model_id": 156,
"max_predictions": 3
},
{
"model_id": 201,
"label_filter": ["Frustrated", "Angry", "Neutral", "Satisfied"]
}
]
} | Model ID | Name | Purpose | Labels (subset) |
|---|---|---|---|
| 42 | Product | Which product is the customer calling about? | Mobile Broadband, Fixed Line, TV, ... |
| 87 | Issue type | What category of issue is raised? | Billing, Technical, Cancellation, Upgrade, ... |
| 156 | Root cause | What caused the issue? | Incorrect charge, System error, Policy change, ... |
| 201 | Sentiment | How does the customer feel? | Frustrated, Angry, Neutral, Satisfied |
Response
The response groups results by model. Each model returns an array of predictions per input text, sorted by confidence descending.
{
"results": [
{
"model_id": 42,
"model_name": "Product",
"predictions": [
[
{ "label": "Mobile Broadband", "score": 0.94 }
]
]
},
{
"model_id": 87,
"model_name": "Issue Type",
"predictions": [
[
{ "label": "Billing", "score": 0.91 },
{ "label": "Cancellation", "score": 0.38 }
]
]
},
{
"model_id": 156,
"model_name": "Root Cause",
"predictions": [
[
{ "label": "Incorrect charge", "score": 0.88 },
{ "label": "Recurring billing error", "score": 0.72 },
{ "label": "Price plan mismatch", "score": 0.45 }
]
]
},
{
"model_id": 201,
"model_name": "Sentiment",
"predictions": [
[
{ "label": "Frustrated", "score": 0.82 },
{ "label": "Angry", "score": 0.14 },
{ "label": "Neutral", "score": 0.03 },
{ "label": "Satisfied", "score": 0.01 }
]
]
}
]
} Performance
All models run in parallel
Total latency is approximately equal to the slowest individual model, not the sum of all models. Running 4 models takes roughly the same time as running 1. This makes multi-model inference the recommended approach for any production pipeline that needs multiple classification dimensions.
When to use multi-model
Contact center analysis — Classify every call by product, issue, root cause, sentiment, and churn risk in one pass. Feed results into dashboards and retention workflows.
Chat/ticket triage — Route incoming tickets to the right team by combining urgency, topic, and language detection in a single request.
Quality assurance — Score agent performance across multiple dimensions (empathy, resolution, compliance) from one transcript.
Batch processing — Classify up to 8 conversations across all your models per request. For higher throughput, send requests concurrently.