MATLAB based neural network for image classification
Project overview :

This task includes five algorithms to classify the characters segmented from the license plates of automobiles using MATLAB

  • MLP (multi-layer perceptron)
  • CNN (convolutional neural network)
  • LVQ (learning vector quantization)
  • RBF1 with k-means clustering
  • RBF2 with SOM (self-organizing map)

Files running order

  1. run the figure_preprocessing.m and create a new file folder ass2_processede_data
  2. run data_partition.m split the dataset in 8:2 version and returned the X_train, X_test, y_train, y_test and save as train_test_data.mat
  3. run ass2_CNN.m (this script did not use the processed data, but read the figure directly from the original file)
  4. run ass2_mlp.m
  5. run ass2_lvq.m
  6. run ass2_rbf_kmean.m
  7. run ass2_rbf_som.m
  8. run ass2_confusion_matrix_summary.m

Components/Scripts inside the project

The script part


This is the script to extract the ass2data and rewrite to a new file folder ass2_processed_data.

- read both jpeg and jpg files in all folders ass2_data
- rewrite into another folders ass2_processed_data with uniform style (jpeg) and uniform naming styles (label+number) e.g. (A1,B10,C99)


This is the script splitted the processed data into training and testing dataset in 8:2 ratio. It adjusted the size and columns or rows for fitting the designed model’s network.

- read images from the processed folder ass2_processed_data. 
- split the dataset into 8:2
- save the splitted samples and parameters as train_test_data.mat


This is the script to conduct the CNN.

- use imageDataset to store and train the model
- use splitEachLabel to split the training and testing datasets
- do not use the train_test_data as the input since CNN has standard samples in Matlab documents
- imageInputLayer(48,24,1)
- Layer 1:
    - kernel size: 3; filters number: 8; padding: same
    - batchNormalizationLayer
    - Relu layer
    - Maxpooling with a pooling size equals [2 2] and Stride equals [2 2]
- Layer 2:
    - kernel size: 3; filters number: 16; padding: same    
    - batchNormalizationLayer
    - Relu layer
    - Maxpooling with a pooling size equals [2 2] and Stride equals [2 2]
- Layer 3:
    - kernel size: 3; filters number: 32; padding: same    
    - batchNormalizationLayer
    - Relu layer
    - Maxpooling with a pooling size equals [2 2] and Stride equals [2 2]
- Layer 4:
    - fully connected layer with neurons equals 24
    - softmaxLayer
- calculate the training and testing accuracy
- return and save the confusion matrix as C_CNN.mat


This is the script to conduct the MLP

- load the dataset obtained previously train_test_data.mat
- lr_rate = 0.2;
- momentum = 0.4;
- epochs = 1000;
- 3 layers with 50, 100, 100 
- Hyperbolic tangent sigmoid transfer function (tansig)
- gradient descent to update the weights
- calculate the training and testing accuracy
- return and save the confusion matrix as C_mlp.mat


- load the dataset obtained previously train_test_data.mat
- set the cluster into 360 due to its performance, you could also set cluster into 24 for computational convenience
- lvqnet(24) could achieve ideal performance, but the performance could not be comparable with the cluster set into 360
- save the lvqnet(360) as lvq_360.mat, uncomment the line if you wish to see the performance
- calculate the training and testing accuracy
- return and save the confusion matrix C_lvq.mat


- load the dataset obtained previously train_test_data.mat
- use kmeans to return the center of each cluster, the number of cluster is determined as 360
- use assembled function RBF_training_kmeans to calculate the W (weights), sigma (the variance of the RBF kernel) and the coordinate of each cluster's center 
- use the previous returned parameter to return the training prediction by assembled function: RBF_predict
- calculate the training and testing accuracy
- return and save the confusion matrix C_rbf_kmeans.mat


- load the dataset obtained previously train_test_data.mat
- define the SOM network, the dimension is set to 18*20 for convenience, keep it uniform to the previous number of clusters: 360
- coverSteps = 10 %% Number of training steps for initial covering of the input space (default = 100)
- initNeighbor = 80 %% Initial neighborhood size (default = 3)
- topologyFcn = 'hextop' %% Layer topology function (default = 'hextop')
- distanceFcn = 'dist' %%  Neuron distance function (default = 'linkdist')
- once finished the training of SOM network, use assembled function RBF_training_som.m to calculate the weights W, sigma (the variance of the RBF kernel) and center of SOM clusters, which is the IW{1,1}.
- use the previous returned parameter to return the training prediction by assembled function: RBF_predict
- return and save the confusion matrix C_rbf_som.mat


- load the previous saved mat file
- reshow the confusion matrix
- compare the performance returned by different neural networks
- compare the accuracy returned by different neural networks  

The function part (assembled function for reproduction)


- Input: the file path, in this case the processed file with renamed figure and same format jpeg (ass2_processed_data) 
- Output: Transormed the numerical data into the one-hot encoding vector format (one-hot format)


- Input: the file path, in this case the processed file with renamed figure and same format jpeg (ass2_processed_data) 
    - normalize the data into range 0-1
    - assign the numerical labels to each character
    - combine all the processed plot into a dataset 2400 * 1152
    - 2400 stands for the number of samples; 1152 stands for the features in one figure
- Output: The normalized figure data (0-1), with data and corresponding labels. Labels are in numerical format (1,2,3...24) (without one-hot)
- This function method is selected in the whole project


- Input: data, labels, number of clusters to be determined by kmeans
- sigma is determined by the mean Euclidean distance between two clusters
- k weight matrix is calculated by the radbas(distance of samples between clusters' centers/2*sigma^2)
- W weights is calculated by the pesudo inverse of (k'*k)*k'*labels


- Input: data, labels and net
- net is pre-trained by som networks
- the cluster center is returned by the first layer of som network, which is denoted by net.IW{1,1}
- sigma is determined by the mean Euclidean distance between two clusters
- k weight matrix is calculated by the radbas(distance of samples between clusters' centers/2*sigma^2)
- W weights is calculated by the pesudo inverse of (k'*k)*k'*labels


- Input: data, W, sigma, C trained previously from either RBF_training_kmeans or RBF_training_som
- Output: vectors of the final prediction
- data could be either training data or testing data


- Input: vecs - matrix of column vectors (returned from the RBF_predict.m)
- Output: cls - matrix where the largest element in each column in vectors is set to 1 and the rest to 0   Ex: vecs = [2 4; 1 5], gives c = [1 0; 0 1]
- This function is used to return the most likely label in multi-variable classification, especially after the one-hot encoding method


- Input: matrix of class vectors
- Computes the percentage of equal columns in t1 and t2, can be used to compute the rate of correct classified patterns in a pattern recognition application
- Output: number of matching vectors

Saved parameters


- the value obtained from the data_partition.m
- X_train, X_test, y_train, y_value


- the network trained by som (it takes long time, for computation convenience)


- lvq with cluster set to 360


- lvq with cluster set to 24 


- calculate silhouette value to find the suitable cluster k, but the results are not satisfied
- return the plot of each epoch

Confusion_matrix summary:

- CNN_confusion.mat
- C_rbf_som.mat
- C_rbf_kmeans.mat
- C_mlp.mat
- C_lvq.mat
- C_CNN.mat


