ML Atlas

ML Problem Types

How to map a business problem to the right ML task family.

Beginner
20 min read
What is Machine Learning?
  • Price prediction (regression)
  • Spam detection (binary classification)
  • News topic tagging (multiclass)
  • User segmentation (clustering)
  • Intrusion detection (anomaly detection)
  • Search ordering (ranking)
  • Feed personalization (recommendation)
  • Demand planning (forecasting)
01

In Plain English

Problem type defines output shape and evaluation strategy before model choice.

Why It Exists

Different outputs require different losses, metrics, and tradeoffs.

Problem It Solves

Prevents solving the right business question with the wrong ML framing.

Real-Life Analogy

"Before choosing tools, decide if you're painting, drilling, or measuring."

When To Use

  • At project scoping
  • Before feature engineering and model selection

When NOT To Use

  • Never skip this; it is always needed
02

Most early ML failures come from wrong problem framing, not wrong algorithm.

For the same data, framing as classification vs ranking can produce very different outcomes.

Metrics must match the task: RMSE for regression, F1/AUC for classification, NDCG for ranking.

The Metaphor

"Task type is the contract. Models are implementations of that contract."

Beginner Mental Model

First decide output type, then decide model.

03

A problem type is defined by output space Y, objective function, and evaluation metric.

Regression
Predict continuous values.
Binary Classification
Predict one of two classes.
Multiclass Classification
Predict one class among many.
Clustering
Group unlabeled data by similarity.
Anomaly Detection
Detect rare or abnormal patterns.
Ranking
Order items by relevance.
Recommendation
Predict user-item preference.
Forecasting
Predict future values over time.
  1. 1. Define business decision.
  2. 2. Define prediction unit.
  3. 3. Define output and horizon.
  4. 4. Map to problem type.
  5. 5. Select metric tied to decision quality.

Problem statement, available data, and decision objective.

Task family and baseline metric plan.

01Labels or weak signals are available when needed.
02Evaluation data represents target production behavior.
  • Multi-objective tasks
  • Label ambiguity
  • Class imbalance in rare-event tasks
04

This stage prevents costly rework in later modeling and deployment phases.

  • 01.Define label generation logic clearly.
  • 02.Check if labels are stable over time.
  • 03.For ranking/recommendation, define interaction windows.
  • 01.Build a baseline model for the chosen task family.
  • 02.Validate metric alignment with business outcomes.
  • 03.Iterate framing if offline and online goals diverge.
  1. 1Write task spec
  2. 2Create baseline dataset
  3. 3Train baseline
  4. 4Evaluate against product objective
05
06
text
1Business goal -> Prediction target -> Output type -> Candidate metrics -> Problem type
Goal: reduce support escalations
Task: binary classification (escalate vs not), metric: recall at fixed precision
  • Task framing quality dominates early project success.
  • Metrics should proxy real decision quality, not convenience.
  • Using accuracy for highly imbalanced tasks
  • Framing ranking tasks as plain classification
07
database

Labeled tabular

Excellent

Best for supervised tasks

💡 Ensure label quality.
activity

Unlabeled event logs

Good

Useful for clustering/anomaly tasks

💡 Feature engineering matters heavily.
08

Mandatory Visual Blueprint

What should move

At least one parameter, threshold, split, cluster state, or metric should change interactively.

What to observe

The learner should see how the concept affects error, fit, grouping, or decision quality.

Planned visual type

Interactive chart, step animation, or side-by-side failure-mode comparison.

Reference image slot

If no live lab exists yet, attach a relevant diagram/reference image before marking the page complete.

Topic key: ml-problem-types

Problem Type Decision Flow

Choose by output: numeric, class, groups, order, or future horizon.

Flowchart recommendation: output numeric -> regression; categorical -> classification; no labels -> clustering/anomaly; ordered list -> ranking; user-item next action -> recommendation; future timestamp values -> forecasting.

Metric Tradeoff Snapshot

Same model can look different under different metrics.

Compare Accuracy vs F1 on imbalanced data and NDCG vs CTR for ranking.
09
  • Clearer System Design

    Task family clarifies data, model, and metric choices.

  • Better Stakeholder Alignment

    Shared language around output and tradeoffs.

  • Ambiguous Boundaries

    Some products need multiple task families combined.

  • Metric Drift Risk

    Business objective may change, requiring re-framing.

10
Search

Ranking

Order results by relevance and freshness.

Media

Recommendation

Personalized content ranking per user.

11

Different task families optimize different objectives.

Classification

Discrete outputs

Predicts class labels

Decision categories are explicit.

Ranking

Can use relevance labels

Optimizes order quality

Top-k ordering matters more than exact class.

AspectClassificationRanking
Primary MetricF1/AUCNDCG/MAP

Choose based on the product decision you need to automate.

12

RMSE/MAE

Numeric error for regression.

F1/AUC

Class balance-aware classification quality.

NDCG

Top-rank relevance quality.

  1. 01.Confirm business objective
  2. 02.Select task-family metric
  3. 03.Validate against baseline
  • Metric mismatch with product KPI
  • Ignoring threshold strategy in binary tasks

Higher AUC may still underperform product KPI if thresholding is poorly tuned.

13
  • ×Memorizing algorithms without task framing.
  • ×Choosing model before deciding output contract.
  • ×Confusing multiclass with multilabel.
  • ×Using one metric for all tasks.
14

What kind of bias does this model have?

Bias depends on model assumptions and feature expressiveness.

What kind of variance does it have?

Variance grows with model flexibility and weak regularization.

How does it overfit?

Overfitting usually appears as strong train performance but weaker validation/test behavior.

How do we regularize it?

Use complexity constraints, robust validation, and data-centric cleanup.

What kind of data does it like?

Prefers representative, low-leakage data with stable feature definitions.

What kind of data breaks it?

Breaks under leakage, severe distribution drift, noisy labels, and poorly engineered features.

14

Quick Revision Reference

  • Task type first, model second.
  • Metrics must follow task + business objective.
BCE
  • Project scoping and baseline planning
  • Skipping due to time pressure
Explain regression vs classification vs ranking with concrete examples.
15
16

These questions are designed to break assumptions and expose weak understanding. Most people will answer them wrong on their first attempt. Work through each one carefully.