Endpoints
Available on Request
Reference
Need more?
We expose additional endpoints on request — model training, transcription, labeling, and more.
Talk to us →Models
List your deployed models, fetch details including labels, accuracy metrics, and configuration. Models are the core of Labelf — everything from zero-shot prototypes to production fine-tuned classifiers.
/v2/models List all deployed models in your workspace. Returns model IDs, names, types, and status.
/v2/models/{model_id} Get full details for a specific model — labels, type, training status, and accuracy metrics.
Model types
Labelf supports a progression of model types, from instant prototypes to production-grade classifiers. Each type builds on the previous, letting you start classifying immediately and improve accuracy as you gather data.
Describe and classify
Define your categories in plain text — the model starts classifying immediately with no training data. Ideal for prototyping: describe what "Billing complaint" or "Churn risk" means, deploy, and start getting predictions within minutes. Accuracy is typically 70-85% depending on task complexity.
Active Learning
Label 50–200 examples per class and the model learns your specific domain. Labelf's Active Learning system recommends which examples to label next — it finds model weaknesses and edge cases so each labeled example has maximum impact. Accuracy typically reaches 85–93%.
Custom model
Full custom model trained on your data. Learns domain-specific vocabulary, jargon, and patterns that generic models miss. A telecom fine-tuned model knows that "Hemma Bredband" is a product name, not a description. Highest accuracy (90–97%) and fastest inference latency.
Prompt-tuned
For generative tasks that go beyond classification: summarization, entity extraction, reasoning, and structured output. Uses large language models with custom prompts and guardrails. Ideal for extracting action items from calls, generating ticket summaries, or answering "why did the customer churn?"
Model evaluation
Every model in Labelf comes with built-in evaluation metrics. You see exactly how well your model performs, per class, before deploying to production.
| Metric | What it tells you |
|---|---|
| Confusion matrix | Where the model gets confused — e.g. it mislabels "Billing" as "Cancellation" 12% of the time. Shows you exactly which categories overlap. |
| Precision | When the model says "Churn risk", how often is it right? High precision = fewer false alarms. |
| Recall | Of all actual churn-risk conversations, how many does the model catch? High recall = fewer missed cases. |
| F1 score | Harmonic mean of precision and recall. The single number that tells you overall per-class performance. |
| Confidence threshold | Tune the cutoff per model. Higher threshold = more precise but fewer predictions. Lower = broader coverage but more noise. Labelf shows how each threshold affects your metrics in real time. |
Response example
A model object includes its configuration, label set, training type, deployment status, and accuracy metrics.
{
"id": 42,
"name": "Contact Reason v3",
"type": "fine-tuned",
"status": "deployed",
"labels": [
"Billing",
"Technical",
"Cancellation",
"Upgrade",
"General inquiry",
"Complaint"
],
"metrics": {
"accuracy": 0.94,
"f1_macro": 0.92,
"per_class": {
"Billing": { "precision": 0.96, "recall": 0.93, "f1": 0.94 },
"Technical": { "precision": 0.91, "recall": 0.95, "f1": 0.93 },
"Cancellation": { "precision": 0.93, "recall": 0.89, "f1": 0.91 },
"Upgrade": { "precision": 0.95, "recall": 0.92, "f1": 0.93 },
"General inquiry": { "precision": 0.88, "recall": 0.91, "f1": 0.89 },
"Complaint": { "precision": 0.90, "recall": 0.87, "f1": 0.88 }
}
},
"training_examples": 4280,
"last_trained": "2026-03-15T09:14:00Z",
"confidence_threshold": 0.65
} Active Learning
Labeling data is expensive. Active Learning makes every labeled example count by recommending which examples to label next. Instead of randomly sampling from your dataset, the system:
- Finds model weaknesses — surfaces examples where the model is least confident, targeting the decision boundaries between confusable classes
- Samples for diversity — ensures you label examples from different clusters, not just the same type of edge case over and over
- Surfaces rare classes — actively seeks out underrepresented categories that would otherwise take thousands of random samples to find
- Prioritizes impact — ranks examples by expected accuracy gain so each labeling session moves the needle as much as possible
In practice, a skilled annotator can label 200–400 examples per hour using the Labelf UI. With Active Learning, 200 well-chosen examples often outperform 2,000 randomly labeled ones. This means you can go from zero-shot prototype to production-grade model in a single afternoon.
Full lifecycle API — Available on request
The read-only model API documented above is available to all customers. The full lifecycle API — programmatic model creation, training, deployment, retraining, and evaluation — is available to enterprise customers.