When presented with a dataset, it is beneficial to identify any relationships or trends. One way in which
we can accomplish this is through the application of cluster analysis, a method for developing taxonomies
within a set of observations. While this technique is beneficial in marketing, research, or any profession
requiring data analysis, there are many algorithms for dfining clusters in a dataset. As a result, we raise
the question, which clustering algorithm is the best in various scenarios?
Markov Chain Monte Carlo (MCMC) methods are powerful algorithms that enable
statisticians to explore information about probability distributions through computer
simulations when exact theoretical methods are not feasible. The Gibbs sampler,
for example, allows us to gather information about marginal and joint distributions
of multivariate densities assuming that we know information about the conditional
distributions. Of particular interest is the use of MCMC methods in Bayesian statistics
to help estimate posterior distributions.
Baseball is the great American pastime. In this study we examine different aspects of baseball games to determine what factors play a role in predicting the winning team for a specific game or an entire season. To predict who is likely to win individual games, we consider factors such as each team’s offensive or defensive ability, past game scores, and previous winning percentage. In particular, we examine the extent to which a team playing at home has an advantage over the visiting team.