Classification

By default a classification node automatically generates four models, one each using Decision Tree, General Linear Model, Naive Bayes, and Support Vector Machine.

All four models have the same input data, the same target, and the same case ID (if a case ID is specified).

If you want to never build models using one of the algorithms by default, deselect that algorithm. A user will still be able to add models using the deselected algorithm to a classification node.

By default, the node generates these test results for tuning: Performance Metrics, Performance Matrix (also known as Confusion Matrix), ROC Curve (binary only), and Lift and Profit.

For Lift and Profit, the default is top 5 target values by frequency; you can edit this.

The node does not generates selected metrics for model tuning by default; you can select this.

You can deselect any of the test results; for example, if you deselect Performance Matrix, a Performance Matrix is not generated by default.

By default, split data is used for test data. 40% of the data is used for test, and the split data is created as a table. You can change the percent used for testing and you can create the split data as a view instead of a table. If you create a table you can create it in parallel. You can use all of the build data for testing, or you can use a separate test source.

For detailed information about testing classification models, see Testing Classification Models and Tuning Classification Models.