Prerequisites: Introduction to DeepLearning4J, knowledge of Java.
DLJ4 comes with a large amount of examples. Based on one of them, our first neural network code example is an MLP classifier for handwritten digit recognition.
MNIST Classification Task
The neural network in this example takes on the classification task of the MNIST database of handwritten digits. This database consists of numerous handwritten samples of the ten digits. The dimensions of each sample are 28×28 pixels (a total of 784 pixels), and each pixel is represented with a grey-level value between 0 and 255.
The MNIST database is divided to two sets: 60,000 samples that are used for training a model, and 10,000 that are used to test the trained model.
The Suggested Model
The model used in this example is a MultiLayer Perceptron with a single hidden layer. The input layer consists of 784 nodes, one for each given pixel, and the output layer has 10 neurons representing the ten digit classes. In between them resides the hidden layer, consisting of 1000 neurons. This setup is depicted the diagram below:
The Code
Our sample code resides in https://github.com/ai4java/dl4j-examples, which is part of the Ai4Java Github repository.
The code is setup as a Maven project, and the pom.xml file contains dependencies for DL4J (our Deep Learning library of choice), ND4J (the underlying mathematical library) and SLF4J (for logging).
The main file used for this example is the class SingleLayerMLP; lets take a close look at this class.
Numerical Settings
The first few lines in the main() method are responsible for setting up various numerical values. The rngSeed value serves as a seed for the random number generator used for initializing the model weights as well as for shuffling the dataset samples. Setting the seed to a predetermined value enables us to reproduce the same results whenever we run this example.
35 36 37 38 39 |
final int rngSeed = 123; // random number seed for reproducibility final int numOfInputs = 28 * 28; // numRows * numColumns in the input images final int numOfOutputs = 10; // number of output classes ('0'..'9') final int hiddenLayerSize = 1000; // number of nodes in hidden layer |
The next variables control the batch size and the number of epochs used during the training of the model:
41 42 |
final int batchSize = 125; // batch size for each epoch final int numEpochs = 15; // number of epochs to perform |
Each epoch is a complete pass over all the training samples; the samples, however, are grouped in batches (sometimes referred to as ‘mini-batches’), all of which are used in every iteration of the training process. In our case, since we have 60,000 training samples and the batch size is set to 125, it will take 480 iterations to complete one epoch.
Data Preparation
To prepare the data, samples are first taken from the MNIST database. The classes DataSet and DataSetIterator, which are part of the ND4J library, enable us to create and manipulate datasets; these datasets are then used for training and testing the DL4J neural network. The MnistDataSetIterator class enables us to access the MNIST database directly:
45 46 |
DataSetIterator mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed); DataSetIterator mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed); |
The second argument in the constructor distinguishes between the train and test portion of the database, while the third argument affects the random shuffling of the samples.
Creating the Neural Network
In the create() method, the MultiLayerConfiguration class encapsulates the information used to create the actual neural network. It uses the Builder class, which presents a fluent interface that helps keep the code concise and readable. After setting the random seed, a couple of parameters are set for the algorithm that is used to train the network. The updater() call determines the algorithm used to update the weights, while l2() sets a regularization coefficient for the weights, helping to keep their values small.
75 76 77 78 79 80 81 |
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(rngSeed) //include a random seed for reproducibility // use stochastic gradient descent as an optimization algorithm .updater(new Nesterovs(0.006, 0.9)) .l2(1e-4) |
Next comes the part that determines the architecture of the neural network, describing it layer by layer. The call to list() returns a ListBuilder instance, ready to accept the list of layer configurations.
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
.list() // hidden layer: .layer(0, new DenseLayer.Builder() .nIn(numOfInputs) .nOut(hiddenLayerSize) .activation(Activation.RELU) .weightInit(WeightInit.XAVIER) .build()) // output layer: .layer(1, new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD) .nIn(hiddenLayerSize) .nOut(numOfOutputs) .activation(Activation.SOFTMAX) .weightInit(WeightInit.XAVIER) .build()) |
The first layer described, layer 0, is the hidden layer. It implicitly contains the definition of the input layer by setting the nIn() value to the number of inputs. the nOut() value determines the number of nodes in this layer. The activation function of the neurons in this layer is rectified linear unit (ReLU), which is currently the most common choice for use in hidden layers. The weights of this layer are initialized using the Xavier algorithm, ensuring that the initial weights are not too small or too large, so they can propagate the signals through the layers without shutting down or saturating the neurons.
The output layer’s parameters differ in two aspects from those of the hidden layer:
- This layer is configured with a ‘loss function‘ (also called a ‘cost function’), which is used to calculate the ‘error’ between the actual outputs and the desired outputs. The training algorithm attempts to minimize this ‘error’, by propagating it back through the network’s layers and and adjusting the various weights according to their contribution to the error. Here, the loss function used is Negative Log Likelihood.
- The activation function is Softmax rather than RELU. Softmax is often used in classifiers’ output layers, as it results with each of the output neurons producing a value between 0 and 1, while the sum of the outputs is exactly 1. This enables us to interpret each output value as the probability of the class represented by that output to be the chosen class, or the ‘confidence’ of the network in that class being chosen for the given inputs. The negative Log Likelihood used as the loss function works well with Softmax, as it enhances errors where the network’s confidence in the (wrong) output is high.
Next come two training-related settings:
100 101 |
.pretrain(false) .backprop(true) //use backpropagation to adjust weights |
Pretraining means setting the initial values of the weights based on previous training (e.g. with a small initial dataset). In feed-forward networks such as the one in this example, it is not particularly useful. Backpropagation is the algorithm of choice for our type of network.
Training the Neural Network
In the train() method, a listener is first set, that will output the network’s score every 100 iterations. Then the training is carried out by repeatedly calling the fit() method of the model with the same training set, once for each epoch:
112 113 114 115 116 117 118 |
// print the score every 100 iterations: mlp.setListeners(new ScoreIterationListener(100)); for( int i=0; i<numEpochs; i++ ){ log.info("epoch {}....", i); mlp.fit(trainSet); } |
When running the program (by launching the main() method), the relevant output will look similar to this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
INFO [SingleLayerMLP] - Training model.... INFO [SingleLayerMLP] - epoch 0.... INFO [ScoreIterationListener] - Score at iteration 0 is 2.346876953125 INFO [ScoreIterationListener] - Score at iteration 100 is 0.2718331298828125 INFO [ScoreIterationListener] - Score at iteration 200 is 0.2462818603515625 INFO [ScoreIterationListener] - Score at iteration 300 is 0.11042080688476562 INFO [ScoreIterationListener] - Score at iteration 400 is 0.13886257934570312 INFO [SingleLayerMLP] - epoch 1.... INFO [ScoreIterationListener] - Score at iteration 500 is 0.0950164794921875 INFO [ScoreIterationListener] - Score at iteration 600 is 0.14483111572265625 INFO [ScoreIterationListener] - Score at iteration 700 is 0.09323027801513672 INFO [ScoreIterationListener] - Score at iteration 800 is 0.05594816207885742 INFO [ScoreIterationListener] - Score at iteration 900 is 0.10358106994628906 INFO [SingleLayerMLP] - epoch 2.... INFO [ScoreIterationListener] - Score at iteration 1000 is 0.014757573127746582 INFO [ScoreIterationListener] - Score at iteration 1100 is 0.0348522834777832 INFO [ScoreIterationListener] - Score at iteration 1200 is 0.034441551208496096 INFO [ScoreIterationListener] - Score at iteration 1300 is 0.019467761993408203 INFO [ScoreIterationListener] - Score at iteration 1400 is 0.02450384521484375 INFO [SingleLayerMLP] - epoch 3.... INFO [ScoreIterationListener] - Score at iteration 1500 is 0.012069838523864746 INFO [ScoreIterationListener] - Score at iteration 1600 is 0.01041600227355957 ... |
The ScoreIterationListener is called every 100 iterations as was set; and as we calculated earlier, it takes 480 iterations to complete one epoch, which is reflected in the output.
The score value shown at each line represents the value of the loss function, or ‘error’. This value is expected to generally decrease during the training, but will not necessarily decrease at every step of the way.
Evaluating the Trained Neural Network
In the evaluate() method, the trained model is evaluated using the test set, which was not used until this point:
124 125 126 127 128 129 130 131 132 133 134 |
Evaluation eval = new Evaluation(numOfOutputs); // create an evaluation object with 10 possible classes int batchCounter = 0; while(testSet.hasNext()){ DataSet next = testSet.next(); INDArray output = mlp.output(next.getFeatureMatrix()); //get the networks prediction log.info("Evaluating next batch ({})...", batchCounter++); eval.eval(next.getLabels(), output); //check the prediction against the true class } log.info(eval.stats()); |
The evaluation is done batch-by-batch, while the results are accumulated and aggregated, using the Evaluation instance:
1 2 3 4 5 6 7 |
INFO [SingleLayerMLP] - Evaluating model.... INFO [SingleLayerMLP] - Evaluating next batch (0)... INFO [SingleLayerMLP] - Evaluating next batch (1)... INFO [SingleLayerMLP] - Evaluating next batch (2)... INFO [SingleLayerMLP] - Evaluating next batch (3)... ... INFO [SingleLayerMLP] - Evaluating next batch (79)... |
Then, the stats() method of that instance is called, and outputs a multitude of information. First, the classification results for each of the digits (0..9) are printed out. This is sometimes referred to as the Confusion Matrix:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Examples labeled as 0 classified by model as 0: 973 times Examples labeled as 0 classified by model as 1: 1 times Examples labeled as 0 classified by model as 2: 1 times Examples labeled as 0 classified by model as 4: 1 times Examples labeled as 0 classified by model as 6: 2 times Examples labeled as 0 classified by model as 7: 1 times Examples labeled as 0 classified by model as 8: 1 times Examples labeled as 1 classified by model as 1: 1127 times Examples labeled as 1 classified by model as 2: 2 times Examples labeled as 1 classified by model as 3: 1 times Examples labeled as 1 classified by model as 6: 2 times Examples labeled as 1 classified by model as 7: 1 times Examples labeled as 1 classified by model as 8: 2 times ... |
Finally, the results are summarized using several the standard measures of Accuracy, Precision, Recall and F1 Score, that are calculated from the confusion matrix above:
1 2 3 4 5 6 7 8 |
==========================Scores======================================== # of classes: 10 Accuracy: 0.9844 Precision: 0.9844 Recall: 0.9843 F1 Score: 0.9844 Precision, recall and F1: macro-averaged (equally weighted avg. of 10 classes) ======================================================================== |
In our case, these measure are all around 98.4% (or 1.6% error), which is considered reasonable for this type of network working with the MNIST database benchmark, as can be seen here.
What’s Next?
A good way to get a better feel for the way Neural Networks work is to experiment with the values of the various parameters of the model and the training algorithm. For example, the number of nodes in the hidden layer, the l2() and updater() values, the weight-initialization algorithm and the number of epochs. It could also be interesting to add another hidden layer (or several hidden layers) to the network.
In future posts we will see how to visualize the training process and the results achieved, and try out more advanced networks to perform the same task.
Thank you for the article. I’m an experienced Java developer and have an interest in using DL4J for an application idea. I’ve been using the MnistClassifier example in DL4J to train a set of images I’ve extracted from golf scorecards. (FIrst thing I did was write some code to extract the scores into 28×28 images for each player). Anyway, I’ve had some success with a very small training set – I just need to gather more images from other scorecards to have good data for training. But once that is done my goal is to preload the model and evaluation “real” scores against the model to then calculate the golfer’s score. So if I have 18 scores (images of scores) but at runtime do not “know” whether they are 4, 5, 6, etc., – so I don’t have the labels for them yet – can you advise as to what API or object/method/approach to utilize? Thanks.
Hello Mike, and thank you for reading and commenting!
Can you provide more information about your experiment?
I am not completely clear about the issue you described. Do you mean that only some of the labels are available at training time?
Regards,
Eyal