Day 4 - Regression and Classification using MLPs

MLPs for Regression

Choice of activation function

Choice of loss function

Typical MLP architecture for Regression

Hyperparameter Typical value
Number of neurons in input layer Equal to number of input features
Number of hidden layers Depends on the problem (typically 1 to 5)
Number of neurons per hidden layer Depends on the problem (typically 10 to 100)
Number of neurons in output layer Equal to number of prediction dimensions
Hidden layer activation ReLU or SeLU
Output activation None ReLU/softplus for positive outputs Logistic/tanh with scaling for bounded outputs
Loss function MSE MAE/Huber if there are outliers

MLPs for Classification

Binary classification

Multilabel binary classification

Multiclass classification

Choice of loss function

Cross-entropy loss (a.k.a log loss) is a good choice for loss function:

c y c log ( p c ) - \displaystyle \sum_{c} y_c \cdot \log(p_c) 

y c = 1 y_c = 1  if c c  is the right class, and 0 otherwise.

p c p_c  is the probability the network predicts for class c c .

Typical MLP architecture for Classification

Hyperparameter Binary classification Multilabel binary classification Multiclass classification
Number of input and hidden layers Same as regression Same as regression Same as regression
Number of neurons in output layer 1 1 per label 1 per class
Output layer activation function Logistic Logistic Softmax
Loss function Cross entropy Cross entropy Cross entropy