The Big Picture

1. Definitions

2. Machine Learning Settings

There are 3 settings to machine learning:

In semi-supervised learning, only some of the data has output data - the aim is to learn using existing labels to annotate data missing labels.

In weakly-supervised learning, the labels are inexact.

3. Machine Learning Tasks

There are two common ML tasks:

In general:

4. The Pipeline

We build a model to approximate the true function . and .

The dataset has sequences and labels . We also have a testing set of and labels . The test set is only used to evaluate the model after training.

4.1 Feature Encoding

We can put through a feature encoding to create , which is a more useful representation of the data. A feature vector (). Each feature should be standardised as .

Each feature encodes a characteristic / attribute of the input data. However increasing the number of features can lead to the curse of dimensionality - data becomes sparse, computational complexity increases and overfitting can occur.

There are various ways of mapping data to features:

4.2 Machine Learning Algorithm

A non-parametric model's complexity grows with the data (e.g. k-NN, decision trees). A parametric model has a fixed number of parameters (e.g. linear regression, neural networks).

In a linear model, data is linearly separable (by a hyperplane). In a non-linear model, data is non-linearly separable. To make it linearly separable, we can do a feature space transformation. This is done by the kernel trick in SVMs and in NNs by using non-linear activation functions. Additionally, we can combine multiple simple classifiers as in a decision tree.

4.3 Bias Variance Tradeoff

Underfitting is when the model is too simple to capture the underlying patterns in the data - high bias (low variance). Overfitting is when the model is too complex and captures noise in the training data - high variance (low bias).

We want to strike a balance between bias and variance. This is a good fit.

4.4 Evaluation

For classifiers we can simply find the accuracy. For regression, we can compute the mean squared error .

We compare accuracy to a baseline (random chance or a previous model).

Back to Home