Day 4 - Regression and Classification using MLPs
MLPs for Regression
- Number of neurons in the input layer depends on data.
- Use one output neuron per output dimension required (e.g. one neuron to predict price of house, two neurons to predict x and y coordinates).
Choice of activation function
- Allow output of any range of values: no activation function.
- Allow only positive values: ReLU or softplus function.
- Softplus is a smooth variant of ReLU, close to 0 when is negative and close to when is positive.
-
- Allow values within a given range:
- Logistic function and scale, for values between 0 and .
- Hyperbolic tangent function and scale, for values between to .
Choice of loss function
- Mean Squared Error (MSE)
- Mean Absolute Error (MAE) - suitable when there are outliers in the data
- Huber loss - combination of both of the above:
- Quadratic when error is smaller than a threshold (), usually 1 - allows to converge faster and be more preecise than MAE.
- Linear when error larger than threshold () - makes it less sensitive to outliers.
Typical MLP architecture for Regression
Hyperparameter | Typical value |
---|---|
Number of neurons in input layer | Equal to number of input features |
Number of hidden layers | Depends on the problem (typically 1 to 5) |
Number of neurons per hidden layer | Depends on the problem (typically 10 to 100) |
Number of neurons in output layer | Equal to number of prediction dimensions |
Hidden layer activation | ReLU or SeLU |
Output activation | None ReLU/softplus for positive outputs Logistic/tanh with scaling for bounded outputs |
Loss function | MSE MAE/Huber if there are outliers |
MLPs for Classification
Binary classification
- Use single output neuron using logistic activation.
- Output between 0 and 1 can be trated as estimated probability of positive class.
- Probability of negative class = 1 - output of network.
Multilabel binary classification
- Use multiple neurons with each using logistic activation function.
- Use one output neuron for each positive class.
- The output classes are not exclusive (an instance can belong to multiple classes at once) so the output probabilities don't add up to 1.
Multiclass classification
- The output classes are exclusive (each instance can belong to only one of multiple classes).
- Use one output neuron per class with softmax activation in the output layer.
- All outputs will be between 0 and 1, and add up to 1.
Choice of loss function
Cross-entropy loss (a.k.a log loss) is a good choice for loss function:
if is the right class, and 0 otherwise.
is the probability the network predicts for class .
Typical MLP architecture for Classification
Hyperparameter | Binary classification | Multilabel binary classification | Multiclass classification |
---|---|---|---|
Number of input and hidden layers | Same as regression | Same as regression | Same as regression |
Number of neurons in output layer | 1 | 1 per label | 1 per class |
Output layer activation function | Logistic | Logistic | Softmax |
Loss function | Cross entropy | Cross entropy | Cross entropy |