Key Terminologies in Machine Learning

Some key terminologies to know if you are new to Machine Learning:

DataSet

The collection of input data from which the algorithm learns.
For example, the collection of images of handwritten digits is the dataset for training an algorithm to recognize handwritten digits.

Feature

Features are the collection of input variables. Usually, dimensions in dataset are called features  or variables.
In spam detector, features include:
  • words in the email body
  • name of sender

Label

Label or target is the thing to predict. For example, future price of stock, type of animal shown in the picture, etc.
In spam detector, label is email class whether it is spam or not (1 or 0).

Model

It is the instance of an algorithm that defines relationship between input features and label. Any model is trained on labeled data and is used to make inference on new unlabeled data.

Regression

A regression model predicts continuous values. Regression models make predictions that answer questions like:
  • What is the value of a house in California?
  • What is the probability that the google stock price will increase?

Classification

A classification model predicts discrete values. Classification models make predictions that answer questions like:
  • Is a given email message spam or not spam?
  • Is this an image of a dog or cat?

Comments