The clustering technique in Statistical Analysis is used to determine the subsets as clusters in the data using the specified distance measure. However, this technique cannot be applied easily for longitudinal or time-series data. In this blog, I will discuss some of the methods used for modeling longitudinal or panel data using the Clustering Analysis technique as explained in Schmatter (2011).

Longitudinal data is actually a sample of observations which are measured repeatedly over time. And, nowadays, longitudinal/repeated measure data or panel data exists in all areas of Applied statistics such as finance, psychology, economics, and social sciences. Most studies deals with analyzing homogeneity in such Time series data (Diggle et al 2002), however, there are few researchers’ shows interest in analyzing the heterogeneity in such data and they proposed different modeling technique for the same.

Let us now discuss the applicability of the model-based clustering technique by means of an example as discussed in Schmatter (2011). The data consists of 237 teenagers who use marijuana for the year 1976–1980. The use of marijuana is categorized into three types as never, not more than once a month and more than once a month. This gives the idea that the data contains the categorical variables in this study. The following figure represents the sample of 10 observed responses to the use of marijuana usage among the 237 teenagers.

To sum up, the model-based clustering technique along with the Bayesian flavor yields better results since it provides an answer to the most troublesome problems in the cluster analysis. In longitudinal or Panel data studies, usage of euclidean distance may be a valid one and hence a kernel-based clustering for Time series data Analysis is considered and selection of the best method is analyzed using different information criteria. In addition to the illustration explained in this paper, an MCMC simulation is carried out to find the optimal clustering methodology. However, this may not be taken as granted for all applications, and a more appropriate method concerning the prior distribution and the choice of kernel is needed in analyzing a time series panel data.

Author's Bio:

Why Statswork?
Plagiarism Free | Unlimited Support | Prompt Turnaround Times | Subject Matter Expertise | Experienced Bio-statisticians & Statisticians | Statistics Across Methodologies | Wide Range Of Tools & Technologies Supports | Tutoring Services | 24/7 Email Support | Recommended by Universities