Classification trees are used with a categorical response variable. The goal of a
classification tree is to derive a model that predicts to which category a particular
subject or individual belongs, based one or more explanatory factors. For example,
we could use a classification tree to predict the diagnosis (Benign or Malignant) of
a particular patient based upon information obtained by doctors through scanned
images. These classification trees are displayed as a decision tree that has a start
node which then branches into other nodes.
When presented with a dataset, it is beneficial to identify any relationships or trends. One way in which
we can accomplish this is through the application of cluster analysis, a method for developing taxonomies
within a set of observations. While this technique is beneficial in marketing, research, or any profession
requiring data analysis, there are many algorithms for dfining clusters in a dataset. As a result, we raise
the question, which clustering algorithm is the best in various scenarios?