The Big Picture
1. Definitions
- Artificial Intelligence is a group of techniques enabling computers to mimic human behavior and intelligence.
- Machine Learning is a subset of AI techniques that use statistical methods to enable machines to improve with experience.
- Deep Learning is a subset of ML techniques that use multi-layered neural networks to model complex patterns in data.
2. Machine Learning Settings
There are 3 settings to machine learning:
- Supervised Learning: learn a function that maps inputs to output labels based on example input-output pairs.
- Unsupervised Learning: learn patterns in input data without labeled outputs (clustering, dimensionality reduction).
- Reinforcement Learning: learn a policy to maximize cumulative reward through trial and error in an environment.
In semi-supervised learning, only some of the data has output data - the aim is to learn using existing labels to annotate data missing labels.
In weakly-supervised learning, the labels are inexact.
3. Machine Learning Tasks
There are two common ML tasks:
- Classification: predict a discrete label from a fixed set of classes (e.g. spam detection, image recognition).
- Regression: predict a continuous value (e.g. house prices, stock prices).
In general:
- More data leads to more accurate predictions.
- Selecting good features is crucial.
- Different classifiers make predictions differently.
4. The Pipeline
We build a model
The dataset
4.1 Feature Encoding
We can put
Each feature encodes a characteristic / attribute of the input data. However increasing the number of features can lead to the curse of dimensionality - data becomes sparse, computational complexity increases and overfitting can occur.
There are various ways of mapping data to features:
- Feature Selection: manually select a subset of relevant features.
- Feature Extraction: automatically transform raw data into features (e.g. PCA, autoencoders).
4.2 Machine Learning Algorithm
- Lazy Learning stores training data and make predictions based on similarity to training examples (e.g. k-NN).
- Eager Learning builds a model from training data and makes predictions using the model (e.g. decision trees, neural networks).
A non-parametric model's complexity grows with the data (e.g. k-NN, decision trees). A parametric model has a fixed number of parameters (e.g. linear regression, neural networks).
In a linear model, data is linearly separable (by a hyperplane). In a non-linear model, data is non-linearly separable. To make it linearly separable, we can do a feature space transformation. This is done by the kernel trick in SVMs and in NNs by using non-linear activation functions. Additionally, we can combine multiple simple classifiers as in a decision tree.
4.3 Bias Variance Tradeoff
Underfitting is when the model is too simple to capture the underlying patterns in the data - high bias (low variance). Overfitting is when the model is too complex and captures noise in the training data - high variance (low bias).
We want to strike a balance between bias and variance. This is a good fit.
4.4 Evaluation
For classifiers we can simply find the accuracy. For regression, we can compute the mean squared error
We compare accuracy to a baseline (random chance or a previous model).