Kera Training / Testing
The basic keras neural network inherits the following methods:
This trains the model. It also supplies the data for a learning curve. The input data that was defined in the model is split into test, train.
- batch_size = defualt is 100. This is the number of datapoints for each step. If the dataset size is big, a bigger batch_size is recomended.
- scale = default is False. If set to True then a data scaler is fitted around the training data and stored. The training data is then scaled. The testing data is scaled using the same scaler that was fitted on the training data. The saved scaling tool is packaged with the model when deployed so it can be used in production.
- scaling_tool = default is "standard". The scaling tool can also be "min max" or "normalize". However, be careful with normalizing. This is just precrocessing the data. It's not stored like other scalers as it's not fitted to any data.
- remsample = default is False. If set to True, the training data is resampled using the SMOTE algorithm in order to address unbalanced data. This is done after the spliting of data into test and train to prevent resampled datapoints bleeding into the test data giving a false high accuracy in testing.
- resample_ratio = default is 1. This is the ratio of one outcome to another. If left at one, this means that there will be 50% of the training data belonging to one category, and the other 50% belonging to the other (a ratio of 1 to 1)
- batch_size = default is 100. The larger the batch size, the quicker the training. However, there are trade offs.
This function shows the learning curve. Must only be fired if the plot_learning _curve function is fired.
- save = default is False. If set to True, learning curve will be saved as a file in the folder where the script is running.
This function displays the ROC curve for false positives and true positives trade-off
- save = default is False. If set to True, ROC curve will be saved as a file in the folder where the script is running.
This function takes no arguements. It gets the testing data, and evaluates the model, giving accuracy and recall. The report is also cached so it can be packaged when the model is deployed.