The Big Picture

1. Definitions

Artificial Intelligence is a group of techniques enabling computers to mimic human behavior and intelligence.
Machine Learning is a subset of AI techniques that use statistical methods to enable machines to improve with experience.
Deep Learning is a subset of ML techniques that use multi-layered neural networks to model complex patterns in data.

2. Machine Learning Settings

There are 3 settings to machine learning:

Supervised Learning: learn a function that maps inputs to output labels based on example input-output pairs.
Unsupervised Learning: learn patterns in input data without labeled outputs (clustering, dimensionality reduction).
Reinforcement Learning: learn a policy to maximize cumulative reward through trial and error in an environment.

In semi-supervised learning, only some of the data has output data - the aim is to learn using existing labels to annotate data missing labels.

In weakly-supervised learning, the labels are inexact.

3. Machine Learning Tasks

There are two common ML tasks:

Classification: predict a discrete label from a fixed set of classes (e.g. spam detection, image recognition).
Regression: predict a continuous value (e.g. house prices, stock prices).

In general:

More data leads to more accurate predictions.
Selecting good features is crucial.
Different classifiers make predictions differently.

4. The Pipeline

We build a model to approximate the true function . and .

The dataset has sequences and labels . We also have a testing set of and labels . The test set is only used to evaluate the model after training.

4.1 Feature Encoding

We can put through a feature encoding to create , which is a more useful representation of the data. A feature vector (). Each feature should be standardised as .

Each feature encodes a characteristic / attribute of the input data. However increasing the number of features can lead to the curse of dimensionality - data becomes sparse, computational complexity increases and overfitting can occur.

There are various ways of mapping data to features:

Feature Selection: manually select a subset of relevant features.
Feature Extraction: automatically transform raw data into features (e.g. PCA, autoencoders).

4.2 Machine Learning Algorithm

Lazy Learning stores training data and make predictions based on similarity to training examples (e.g. k-NN).
Eager Learning builds a model from training data and makes predictions using the model (e.g. decision trees, neural networks).

A non-parametric model's complexity grows with the data (e.g. k-NN, decision trees). A parametric model has a fixed number of parameters (e.g. linear regression, neural networks).

In a linear model, data is linearly separable (by a hyperplane). In a non-linear model, data is non-linearly separable. To make it linearly separable, we can do a feature space transformation. This is done by the kernel trick in SVMs and in NNs by using non-linear activation functions. Additionally, we can combine multiple simple classifiers as in a decision tree.

4.3 Bias Variance Tradeoff

Underfitting is when the model is too simple to capture the underlying patterns in the data - high bias (low variance). Overfitting is when the model is too complex and captures noise in the training data - high variance (low bias).

We want to strike a balance between bias and variance. This is a good fit.

4.4 Evaluation

For classifiers we can simply find the accuracy. For regression, we can compute the mean squared error .

We compare accuracy to a baseline (random chance or a previous model).

Back to Home

Table of Contents