SK-Learn Decision Tree / Random Forest
Decision trees can be useful in classification. With just one tree we call is a decision tree. With multiple trees we call it a random forest.
- number_of_trees = this is the number of trees involved in training and calculating a probability. The more trees the less likely the model is going to over fit. You will start to see diminishing returns past the number of 300.
- max_features = defualt is "auto". This is the number of features considered when looking for a best split. If “auto”, then max_features=sqrt(n_features), If “sqrt”, then max_features=sqrt(n_features) (same as “auto”), If “log2”, then max_features=log2(n_features), If None, then max_features=n_features
- max_depth = defualt is None. This is an int and denotes the maximum depth of a tree. This reduces the chances of the model over fitting.
- min_samples _leaf = default is 1. This is the minimum number of samples required for a category to be a leaf node.
- model_title: set to "Random Forest" for deployment package if number of trees is above one. "Decision Tree" if not.
The tree model has no unique methods yet, but does have all the standard training and deployment methods.
from deployml.sklearn.models.decision_tree import DecisionTree # We define the model DT = DecisionTree(number_of_trees=200) # We define the data (pandas data frame) DT.data = input_data # We define the key of the column we are trying to predict DT.outcome_pointer = 'attended' # We now train the random forest. These things can take a lot # of time to train so it's advise to use the quick_train method with # scaled data DT.quick_train(scale=True) # We then print out the precision, recall and F-1 score DT.evaluate_outcome() # And show the ROC curve DT.show_roc_curve()
It's understandable that you might want a more custom model. Whilst we are always working on making Deploy-ML more versatile, you can define your own SK-Learn model and import it into the Deploy-ML learning and packaging framework!