PCA Score of WGI

Principal Component Analysis

 The principal component analysis(PCA) is a method of reducing the dimension of variables. It is a process of converting variables into a linear combination and inducing it to a variable that is not correlated with each other called the principal component. If the number of variables is p then the maximum number of principal components that could be extracted is also p. The first principal component is a variable that can explain the total variance of the observed variables as much as possible, and the second principal component is a variable that can explain the remaining total variance as much as possible. The World Governance Indicator (WGI) is composed of six variables. Since the variables are highly correlated, they cause multicollinearity problems when applied for regression. Thus, many researchers used PCA to overcome the multicollinearity problem of WGI. To provide ready-made usable data, ARIC calculated the PCA score of WGI and the following is the analysis result of the PCA. 

Result Table

 Before the PCA, it is recommended to normalize the variables. Since the WGIs are normalized indicators, ranging between -2.5 to 2.5,  we did not manipulate the data. The number of observations is 214 for each year and the period is from 1996 to 2020 (until 2002, the indicator was provided biannually). If the PCA is calculated by year respectively, then the reduced dimension of each year differs. In other words, the principal component of each year may not be the same variable. Thus, we calculated the aggregate data of WGI from 1996 to 2020. Since the calculation is based on the whole data, the PCA score will be different every year the new data is collected. 

Descriptive Statistics

There were about 10 to 20 missing values in each year. Because the WGIs are normalized indicators, the mean and standard deviation are almost 0 and 1. To use the PCA, there must be correlations between the variables. The correlation tables show that the minimum correlation coefficient is 0.6553 between WGI 2 and WGI 4, and the maximum is 0.9426 between WGI 5 and WGI 6. 

Principal Components

The total covariance of the six WGIs is 6 and the eigenvalue is the variance of each principal component. The eigenvalue can be interpreted as the degree to which each principal component explains the total variance. The proportion of eigenvalue of the first principal component is 84.77% (5.08629144 / 6). In other words, the first principal component explains the 84.77% of total covariance of the six WGIs. Because it is recommended to select the number of principal components that can explain more than 70% of total covariance(O'Rourke & Hatcher 2013: 19), we selected only the first principal component.

 The factor pattern shows the correlation between six variables and principal components. The first principal component (Factor 1) is positively correlated with all six variables. The least correlated variable is WGI 2 and the most correlated variable is WGI 5.

sensitivity test

 Because we used the whole observations from 1996 to 2020 to calculate the principal components, we tested the sensitivity of the analysis result. To test, we compared the order according to the principal components of 2020 and the whole period. The 'Rank' variable of the following table indicates the order of the principal components of 2020 and the whole period. The 'Difference' variable is 1 when the rank is different between 2020 and the whole period. There are 36 countries that are different in order.  However, there was no case that the order is different more than two ranks. For example, the order of Norway and Finland is switched by one rank and other different countries were the same. Therefore we can conclude that there was not much sensitive change between 2020 and the whole period.   

PCAcompare.xlsx