Internet Windows Android

Correlation matrix. Center for Systemic Business Optimization and Quality Management - Factor Analysis Correlation Matrix for Factor Analysis

They are a set of statistical procedures aimed at extracting from a given set of variables subsets of variables that are closely related (correlated) with each other. Variables included in one subset and correlated with each other, but largely independent of variables from other subsets, form factors. The goal of factor analysis is to identify overtly observable factors using a variety of observable variables. An additional way to check the number of selected factors is to calculate the correlation matrix, which is close to the original one, if the factors are selected correctly. This matrix is ​​called reproduced correlation matrix. In order to see how this matrix deviates from the original correlation matrix (with which the analysis began), you can calculate the difference between them. The residual matrix can indicate "disagreement", that is, the fact that the considered correlation coefficients cannot be obtained with sufficient accuracy based on the available factors. In the methods of principal components and factor analysis, there is no such external criterion that makes it possible to judge the correctness of the solution. The second problem is that after isolating the factors, an infinite number of rotation options arise, based on the same initial variables, but giving different solutions (factor structures are determined in a slightly different way). The final choice between possible alternatives within an infinite set of mathematically equivalent solutions depends on the meaningful understanding by researchers of the interpretation results. And since there is no objective criterion for evaluating various solutions, the proposed justifications for choosing a solution may seem unfounded and unconvincing.


It should be noted that there are no clear statistical criteria for the completeness of factorization. Nevertheless, its low values, for example, less than 0.7, indicate the desirability of reducing the number of features or increasing the number of factors.

Met The coefficient of the relationship between some feature and a general factor, which expresses the measure of the influence of a factor on a feature, is called the factor load of a given feature for this general factor.

A matrix consisting of factor loadings and having the number of columns equal to the number of common factors and the number of rows equal to the number of original features is called a factor matrix.

The basis for calculating the factor matrix is ​​the matrix of paired correlation coefficients of the original features.

The correlation matrix captures the degree of relationship between each pair of features. Similarly, the factor matrix fixes the degree of linear relationship of each feature with each common factor.

The magnitude of the factorial load does not exceed unity in modulus, and its sign indicates a positive or negative relationship between a trait and a factor.

The greater the absolute value of the factorial load of a feature for a certain factor, the more this factor determines this feature.

The value of the factor load for some factor, close to zero, suggests that this factor practically does not affect this feature.

The factor model makes it possible to calculate the contributions of factors to the total variance of all features. Summing up the squares of the factor loadings for each factor for all characteristics, we obtain its contribution to the total variance of the system of characteristics: the higher the share of this contribution, the more significant and significant this factor is.

At the same time, it is possible to identify the optimal number of common factors that describe the system of initial features well enough.

The value (measure of manifestation) of a factor in an individual object is called the factorial weight of the object for this factor. Factor weights allow you to rank, order objects for each factor.

The greater the factorial weight of a certain object, the more that side of the phenomenon or that pattern is manifested in it, which is reflected by this factor.

Factor weights can be either positive or negative.

Due to the fact that the factors are standardized values ​​with an average value equal to zero, factor weights close to zero indicate the average degree of manifestation of the factor, positive ones - that this degree is higher than the average, negative ones - about that. then it is below average.

In practice, if the number of principal components (or factors) already found is not more than m/ 2, the variance explained by them is not less than 70%, and the next component contributes to the total variance not more than 5%, the factor model is considered to be quite good.

If you want to find the values ​​of the factors and save them as additional variables, turn on the switch Scores ... (Values) The factor value, as a rule, lies in the range -3 to +3.

Factor analysis is a more powerful and complex apparatus than the method of principal

component, so it is applied in the event that the results

component analysis is not entirely satisfactory. But since these two methods

solve the same problems, it is necessary to compare the results of the component and


factor analyzes, i.e. load matrices, as well as regression equations for

main components and common factors, comment on the similarities and differences

results.

Maximum possible number of factors m for a given number of features R is defined by the inequality

(p + m)<(р-m)2,

At the end of the entire procedure of factor analysis, using mathematical transformations, the factors fj are expressed through the initial signs, that is, the parameters of the linear diagnostic model are obtained in an explicit form.

Principal component and factor analysis methods are a set of statistical procedures aimed at extracting from a given set of variables subsets of variables that are closely related (correlated) with each other. Variables included in one subset and correlated with each other, but largely independent of variables from other subsets, form factors 1 ... The goal of factor analysis is to identify overtly observable factors using a variety of observable variables.

General expression for j-th factor can be written like this:

where Fj (j ranges from 1 to k) are general factors, Ui- characteristic, Aij- constants used in linear combination k factors. Common factors may not correlate with each other and with common factors.

Factor-analytical processing procedures applied to the obtained data are different, but the structure (algorithm) of the analysis consists of the same main stages: 1. Preparation of the initial data matrix. 2. Calculation of the matrix of the relationship of characteristics. 3. Factorization(in this case, it is necessary to indicate the number of factors identified in the course of the factorial solution and the method of calculation). At this stage (as well as at the next), one can also estimate how well the obtained factorial solution brings the initial data closer together. 4. Rotation - transformation of factors, facilitating their interpretation. 5. Calculation of factor values for each factor for each observation. 6. Interpreting data.

the invention of factor analysis was associated precisely with the need to simultaneously analyze a large number of correlation coefficients of various scales with each other. One of the problems associated with the methods of principal components and factor analysis is that there are no criteria that would allow you to check the correctness of the solution found. For example, in regression analysis, one can compare the indicators for dependent variables, obtained empirically, with indicators calculated theoretically on the basis of the proposed model, and use the correlation between them as a criterion for the correctness of the solution according to the correlation analysis scheme for two sets of variables. In discriminant analysis, the correctness of the decision is based on how accurately the subjects' belonging to one or another class is predicted (when compared with real belonging in life). Unfortunately, in the methods of principal components and factor analysis there is no such external criterion that allows one to judge the correctness of the solution.The second problem is that after the selection of factors, an infinite number of rotation options arise, based on the same initial variables, but giving different solutions ( factor structures are defined in a slightly different way). The final choice between possible alternatives within an infinite set of mathematically equivalent solutions depends on the meaningful understanding by researchers of the interpretation results. And since there is no objective criterion for evaluating various solutions, the proposed justifications for choosing a solution may seem unfounded and unconvincing.

The third problem is that factor analysis is often used to save a poorly designed study when it becomes clear that no statistical procedure is producing the desired result. The power of principal components and factor analysis allows them to build an ordered concept from chaotic information (which gives them a dubious reputation).

The second group of terms refers to matrices that are built and interpreted as part of a solution. Turn factors is the process of finding the most easily interpreted solution for a given number of factors. There are two main classes of turns: orthogonal and oblique... In the first case, all factors are a priori chosen to be orthogonal (not correlated with each other) and constructed factor loading matrix, which is a matrix of relationships between observed variables and factors. The magnitude of the loads reflects the degree of connection between each observed variable and each factor and is interpreted as the correlation coefficient between the observed variable and the factor (latent variable), and therefore varies from -1 to 1. The solution obtained after the orthogonal rotation is interpreted based on the analysis of the matrix of factorial loads by identifying which of the factors is most associated with a particular observable variable. Thus, each factor turns out to be given by a group of primary variables that have the highest factor loadings for it.

If oblique rotation is performed (i.e., the possibility of correlation between factors is a priori allowed), then several additional matrices are constructed. Factor correlation matrix contains correlations between factors. Factor loadings matrix, mentioned above, splits into two: structural matrix of relationships between factors and variables and factor mapping matrix, which expresses the linear relationship between each observed variable and each factor (without taking into account the influence of the superposition of some factors on others, expressed by the correlation of factors with each other). After oblique rotation, the interpretation of factors is based on the grouping of primary variables (similar to that described above), but using, first of all, the factor mapping matrix.

Finally, for both rotations, the factor value coefficient matrix, used in special equations of the regression type to calculate factor values ​​(factor points, indicators by factors) for each observation based on the values ​​of their primary variables.

Comparing the methods of principal components and factor analysis, we note the following. In the course of performing a principal component analysis, a model is built to best explain (maximize reproduction) the total variance of the experimental data obtained for all variables. As a result, the "components" are highlighted. In factor analysis, it is assumed that each variable is explained (determined) by a number of hypothetical general factors (affecting all variables) and characteristic factors (different for each variable). And the computational procedures are performed in such a way as to get rid of both the variance resulting from the measurement error and the variance explained by specific factors, and to analyze only the variances explained by hypothetically existing general factors. The result is objects called factors. However, as already mentioned, from a content-psychological point of view, this difference in mathematical models does not have a significant meaning, therefore, in the future, if no special explanations are given of which particular case is being discussed, we will use the term "factor" as in relation to components and in relation to factors.

Sample sizes and missing data. The larger the sample, the greater the reliability of the relationship indicators. Therefore, it is very important to have a large enough sample. The required sample size also depends on the degree of relationship between indicators in the population as a whole and the number of factors: with a strong and reliable relationship and a small number of well-defined factors, a small sample will suffice.

Thus, a sample of 50 subjects is assessed as very bad, 100 - bad, 200 - average, 300 - good, 500 - very good, and 1000 - excellent ( Comrey, Lee, 1992). Based on these considerations, it is recommended to study samples of at least 300 subjects as a general principle. For a solution based on a sufficient number of marker variables with high factor loadings (> 0.80), a sample of about 150 subjects ( Guadagnoli, Velicer, 1988). normality for each variable separately is checked by asymmetries(how much the curve of the studied distribution is shifted to the right or to the left in comparison with the theoretically normal curve) and excess(the degree of elongation upward or bent downward of the "bell" of the existing distribution, visually represented in the frequency diagram, in comparison with the "bell" of the density graph, characteristic of the normal distribution). If a variable has significant asymmetry and kurtosis, then it can be transformed by introducing a new variable (as a single-valued function of the considered one) in such a way that this new variable is normally distributed (for more details, see: Tabachnik, Fidell, 1996, ch. 4).

Eigenvectors and Corresponding Eigenvalues
for the case study in question

Eigenvector 1

Eigenvector 2

Eigenvalue 1

Eigenvalue 2

Since the correlation matrix is ​​diagonalizable, the matrix algebra of eigenvectors and eigenvalues ​​can be applied to it to obtain the results of factor analysis (see Appendix 1). If the matrix is ​​diagonalizable, then all the essential information about the factorial structure is contained in its diagonal form. In factor analysis, the eigenvalues ​​correspond to the variance explained by the factors. The factor with the largest eigenvalue explains the largest variance, and so on, until it comes to factors with small or negative eigenvalues, which are usually not considered in the analysis. The factor loadings matrix is ​​a matrix of relationships (interpreted as correlation coefficients) between factors and variables. The first column is the correlations between the first factor and each variable in turn: voucher cost (-.400), comfort of the complex (.251), air temperature (.932), water temperature(.956). The second column is the correlations between the second factor and each variable: voucher cost (.900), comfort of the complex(-.947), air temperature (.348), water temperature(.286). The factor is interpreted on the basis of variables strongly related to it (i.e., having high loads on it). So, the first factor is mainly "climatic" ( air and water temperature), while the second "economic" ( the cost of the tour and the comfort of the complex).

When interpreting these factors, one should pay attention to the fact that variables with high loads for the first factor ( air temperature and water temperature), are positively interconnected, while the variables with high loads for the second factor ( voucher cost and comfort of the complex), are negatively interconnected (one cannot expect great comfort from a cheap resort). The first factor is called unipolar (all variables are grouped at one pole), and the second is bipolar(the variables split into two groups, opposite in meaning - two poles). Variables having factorial loads with a plus sign form a positive pole, and those with a minus sign form a negative pole. In this case, the names of the poles "positive" and "negative" when interpreting the factor do not have the evaluative meaning of "bad" and "good". The choice of the sign occurs at random during the calculations. Orthogonal rotation

Rotation is usually applied after factoring out to maximize high correlations and minimize low ones. There are numerous methods of rotation, but the most commonly used rotation is varimax, which is a variance maximization procedure. This rotation maximizes the variance of the factor loads, making the high loads higher and the lower ones lower for each factor. This goal is achieved with transformation matrices Λ:

Transformation matrix is the matrix of sines and cosines of the angle Ψ to rotate. (Hence the name of the transformation - turn, because from a geometric point of view, the axes rotate around the origin of the factor space.) Having performed the rotation and obtained the matrix of factor loads after the rotation, a series of other indicators can be analyzed (see Table 4). Commonality of a variable is the variance calculated using factor loadings. This is the quadratic multiple correlation of the variable predicted by the factorial model. The generality is calculated as the sum of the squares of the factorial loadings (SKN) for the variable for all factors. Table 4 commonality for the cost of the tour is equal to (-.086) 2 + (. 981) 2 = .970, i.e. 97% of variance the cost of the tour is explained by factors 1 and 2.

The fraction of the variance of the factor for all variables is the SKN for the factor divided by the number of variables (in the case of orthogonal rotation) 7 ... For the first factor, the fraction of variance is equal to:

[(-.086)2+(-.071)2+(.994)2+(.997)2]/4 = 1.994/4 = .50,

that is, the first factor explains 50% of the variance of the variables. The second factor explains 48% of the variance of the variables and (due to the orthogonality of rotation) the two factors together explain 98% of the variance of the variables.

The relationship between factor loadings, communities, SKN,
variance and covariance of orthogonal factors after rotation

Communities ( h2)

Voucher cost

∑a2=.970

Comfort level

∑a2=.960

Air temperature

∑a2=.989

Water temperature

∑a2=.996

∑a2=1.994

∑a2=1.919

Percentage of variance

Fraction of covariance

The fraction of the variance of the solution explained by the factor is the fraction covariance is the SKN for a factor, divided by the sum of the generalities (the sum of the SKN over the variables). The first factor explains 51% of the solution variance (1.994 / 3.915); the second - 49% (1.919 / 3.915); the two factors together explain all of the covariance.

Eigenval - reflect the value of the variance of the corresponding number of factors. As an exercise, we recommend writing out all these formulas to obtain calculated values ​​for the variables. For example, for the first responder:

1.23 = -.086(1.12) + .981(-1.16)

1.05 = -.072(1.12) - .978(-1.16)

1.08 = .994(1.12) + .027(-1.16)

1.16 = .997(1.12) - .040(-1.16)

Or in algebraic form:

Z cost of the tour = a 11F 1 + a 12F 2

Z comfort of the complex = a 2l F 1 + a 22F 2

Z air temperature = a 31F 1 + a 32F 2

Z water temperature = a 41F 1 + a 42F 2

The greater the load, the more confidence we can assume that the variable determines the factor. Comrie and Lee ( Comrey, Lee, 1992) suggest that loads greater than 0.71 (explains 50% of the variance) are excellent, 0% of the variance) are very good, 0%) are good, 0%) are satisfactory, and 0.32 (explains 10% of the variance) are weak.

Suppose you are doing a (somewhat "stupid") study in which you measure the height of a hundred people in inches and centimeters. Thus, you have two variables. If you want to further investigate, for example, the effects of different nutritional supplements on growth, will you continue to use both variables? Probably not, since height is one characteristic of a person, no matter what units he is measured in.

Dependency between variables can be discovered using scatterplots... The regression line obtained by fitting gives a graphical representation of the relationship. If you define a new variable based on the regression line shown in this diagram, then such a variable will include the most significant features of both variables. So, in fact, you have reduced the number of variables and replaced two with one. Note that the new factor (variable) is actually a linear combination of the two original variables.

Factor analysis is a branch of mathematical statistics. Its purpose, like the goal of other branches of mathematical statistics, is to develop models, concepts and methods that allow analyzing and interpreting arrays of experimental or observed data, regardless of their physical form.

One of the most typical forms of presentation of experimental data is a matrix, the columns of which correspond to various parameters, properties, tests, etc., and the rows correspond to individual objects, phenomena, modes described by a set of specific parameter values. In practice, the size of the matrix turns out to be quite large: for example, the number of rows of this matrix can vary from several tens to several hundred thousand (for example, in sociological surveys), and the number of columns - from one or two to several hundred. Direct, “visual” analysis of matrices of this size is impossible, therefore, in mathematical statistics, many approaches and methods have emerged designed to “compress” the initial information contained in the matrix to an observable size, to extract the most “essential” from the initial information, discarding "secondary", "accidental".

When analyzing data presented in the form of a matrix, two types of problems arise. The tasks of the first type are aimed at obtaining a “short description” of the distribution of objects, while the tasks of the second type are aimed at revealing the relationship between the parameters.

It should be borne in mind that the main incentive for the appearance of these problems lies not only and not so much in the desire to shortly encode a large array of numbers, but in a much more fundamental circumstance of a methodological nature: as soon as it was possible to briefly describe a large array of numbers, then one can believe that that a certain objective regularity has been revealed, which has led to the possibility of a short description; and it is the search for objective patterns that is the main goal for which, as a rule, data is collected.

The mentioned approaches and methods for processing a data matrix differ in what type of data processing problem they are intended to solve, and in what size matrices they are applicable to.

As for the problem of a short description of relationships between parameters with an average number of these parameters, then in this case the corresponding correlation matrix contains several tens or hundreds of numbers and by itself it cannot yet serve as a “short description” of existing relationships between parameters, but should, with this in order to undergo further processing.

Factor analysis is just a set of models and methods designed to "compress" the information contained in the correlation matrix. Various models of factor analysis are based on the following hypothesis: the observed or measured parameters are only indirect characteristics of the object or phenomenon under study, in fact, there are internal (hidden, not directly observed) parameters or properties, the number of which is small and which determine the values ​​of the observed parameters. These internal parameters are usually called factors. The task of factor analysis is to present the observed parameters in the form of linear combinations of factors and, perhaps, some additional, "insignificant" values ​​- "noise". It is remarkable that, although the factors themselves are not known, such a decomposition can be obtained and, moreover, such factors can be determined, i.e. for each object, the values ​​of each factor can be indicated.

Factor analysis, regardless of the methods used, begins with processing the table of intercorrelation obtained on a set of tests, known as a correlation matrix, and ends with obtaining a factor matrix, i.e. a table showing the weight or load of each factor for each test. Table 1 is a hypothetical factor matrix with only two factors.

The factors are listed in the top row of the table from most significant to least significant, and their weights in each of the 10 tests are given in the corresponding columns.

Table 1

Hypothetical factorial matrix

Coordinate axes. It is customary to represent factors geometrically in the form of coordinate axes, relative to which each test can be depicted as a point. Rice. 1 explains this procedure. In this graph, each of the 10 tests shown in Table 1 is displayed as a point relative to two factors that correspond to axes I and II. Thus, test 1 is represented by a point with coordinates 0.74 along the I axis and 0.54 along the II axis. The points representing the remaining 9 tests are constructed in a similar way, using the values ​​of the weights from Table. 1.

It should be noted that the position of the coordinate axes is not fixed by the data. The original table of correlations determines only the position of the tests (i.e. points in Fig. 1) relative to each other. The same points can be plotted on a plane with any position of the coordinate axes. For this reason, when performing factor analysis, it is common to rotate the axes until the most suitable and easily interpreted display is obtained.

Rice. 1. A hypothetical factor mapping showing the weights of two group factors for each of the 10 tests.

In fig. 1, the axes I "and II" obtained after rotation are shown in dashed lines. This rotation is performed according to the criteria proposed by Thurstone positive variety and simple structure. The first involves rotating the axes to a position where all significant negative weights are eliminated. Most psychologists consider negative factor loadings to be logically inconsistent with ability tests, since such a load means that the higher an individual's score for a specific factor, the lower his score on the corresponding test. The simple design criterion essentially means that each test should have loads on as few factors as possible.

The fulfillment of both criteria provides factors that can be most easily and unambiguously interpreted. If a test has a high load on one factor and does not have significant loads on other factors, we can learn something about the nature of this factor by examining the content of this test. On the contrary, if a test has medium or low loads on six factors, then it will tell us little about the nature of any of them.

In fig. 1 it is clearly seen that after the rotation of the coordinate axes all verbal tests (1-5) are located along or very close to the I-axis ", and the numerical tests (6-10) are closely grouped around the II-axis". New factorial loads, measured with respect to rotated axes, are shown in table. 2. Factor loads in table. 2 have no negative values, with the exception of negligible values ​​that are clearly attributable to sampling errors. All verbal tests have high loads on factor I "and practically zero - on factor II". Numerical tests, on the other hand, have high loads for factor II "and negligible ones for factor I". Thus, the rotation of the coordinate axes significantly simplified the identification and naming of both factors, as well as the description of the factor composition of each test. In practice, the number of factors often turns out to be more than two, which, of course, complicates their geometric representation and statistical analysis, but does not change the essence of the considered procedure.

table 2

Factor matrix after rotation

Some researchers are guided by the theoretical model as the principle of rotation of the axes. It also takes into account the persistence, or confirmation, of the same factors in independently performed but comparable studies.

Interpretation of factors. Having received the factorial solution (or, more simply, the factorial matrix) after the rotation procedure, we can proceed to the interpretation and naming of the factors. This stage of work requires psychological intuition rather than statistical training. To understand the nature of a particular factor, we have no choice but to study tests that have high loads for this factor, and try to find common psychological processes for them. The more tests with high loads on this factor are, the easier it is to reveal its nature. From table. 2, for example, it is immediately clear that factor I is "verbal, and factor II" is numerical. Given in table. 2 factor loadings also reflect the correlation of each test with a factor.

Basic Provisions

Factor analysis is one of the newer areas of multivariate statistical analysis. This method was originally developed to explain the correlation between input parameters. The result of the correlation analysis is a matrix of correlation coefficients. With a small number of features (variables), a visual analysis of this matrix can be carried out. With an increase in the number of signs (10 or more), visual analysis will not give positive results. It turns out that the whole variety of correlations can be explained by the action of several generalized factors, which are functions of the studied parameters, while the factors themselves may be unknown, but they can be expressed through the studied features. The founder of factor analysis is the American scientist L. Thurstone.

Modern statisticians understand factor analysis as a set of methods that, on the basis of a really existing connection between features, makes it possible to reveal latent (hidden) generalizing characteristics of the organizational structure and the mechanisms of development of the phenomena and processes under study.

Example: suppose that n cars are evaluated according to 2 criteria:

x 1 - the cost of the car,

x 2 - the duration of the working life of the motor.

If x 1 and x 2 are correlated, a directed and rather dense cluster of points appears in the coordinate system, formally displayed by the new axes and (Fig. 5).

Fig. 6

Salient feature F 1 and F 2 is that they pass through dense clusters of points and, in turn, correlate with x 1 x 2.Maximum

the number of new axes will be equal to the number of elementary features. Further development of factor analysis showed that this method can be successfully applied in problems of grouping and classification of objects.

Presentation of information in factor analysis.

To carry out factor analysis, information must be presented in the form of an m x n matrix:

The rows of the matrix correspond to the objects of observation (i =), and the columns correspond to the features (j =).

The attributes that characterize the object have different dimensions. In order to bring them to the same dimension and ensure the comparability of features, the matrix of the initial data is usually normalized by introducing a single scale. The most common way of normalizing is standardization. From variables to variables

Mean j sign,

Standard deviation.

This transformation is called standardization.

Basic Factor Analysis Model

The basic model of factor analysis is as follows:

z j - j-th sign (random value);

F 1 , F 2 , ..., F p- general factors (random values, normally distributed);

u j- a characteristic factor;

j1 , j2 , …, jp load factors characterizing the significance of the influence of each factor (model parameters to be determined);

Common factors are essential for the analysis of all attributes. The characteristic factors show that it refers only to the given attribute, this is the specificity of the attribute, which cannot be expressed through factors. Factor loads j1 , j2 , …, jp characterize the magnitude of the influence of one or another general factor in the variation of a given feature. The main task of factor analysis is to determine factor loadings. Variance S j 2 of each feature can be divided into 2 components:

    the first part determines the action of common factors - the generality of h j 2;

    the second part determines the action of a characteristic factor - character - d j 2.

All variables are presented in a standardized form, therefore the variance - state sign S j 2 = 1.

If the general and characteristic factors do not correlate with each other, then the variance of the j-th feature can be represented as:

where is the fraction of the variance of the feature attributable to k th factor.

The full contribution of any factor to the total variance is:

Contribution of all common factors to the total variance:

It is convenient to present the results of factor analysis in the form of a table.

Factor loads

Communities

a 11 a 21 … A p1

a 12 a 22 a p2

… … … …

a 1m a 2m a pm

factors

V 1 V 2 ... V p

A- factor loadings matrix. It can be obtained in various ways, currently the method of principal components or principal factors is the most widespread.

Computational procedure of the method of principal factors.

Solving the problem using principal components is reduced to a step-by-step transformation of the initial data matrix X :

NS- matrix of initial data;

Z- a matrix of standardized feature values,

R- matrix of pairwise correlations:

Diagonal matrix of eigen (characteristic) numbers,

j are found by solving the characteristic equation

E–Unit matrix,

 j is the dispersion index of each principal component,

subject to standardization of the initial data, then = m

U- matrix of eigenvectors, which are found from the equation:

This really means a decision m systems of linear equations for each

Those. each eigenvalue corresponds to a system of equations.

Then find V- matrix of normalized eigenvectors.

The factor mapping matrix A is calculated by the formula:

Then we find the values ​​of the principal components using one of the equivalent formulas:

The aggregate of four industrial enterprises is assessed according to three characteristic features:

    average annual output per employee x 1;

    profitability level x 2;

The level of return on assets x 3.

The result is presented in a standardized matrix Z:

By matrix Z the matrix of pairwise correlations is obtained R:

    Let us find the determinant of the matrix of pairwise correlations (for example, using the Faddeev method):

    Let's construct the characteristic equation:

    Solving this equation, we find:

Thus, the original elementary features x 1, x 2, x 3 can be generalized by the values ​​of three main components, and:

F 1 explains about the whole variation,

F 2 -, and F 3 -

All three main components account for 100% of the variation.

Solving this system, we find:

Systems for  2 and  3 are constructed in a similar way. For  2 system solution:

Eigenvector matrix U takes the form:

    We divide each element of the matrix by the sum of the squares of the elements of the j-th

column, we get the normalized matrix V.

Note that the equality = E.

    The matrix of the factor mapping is obtained from the matrix relation

=

Within the meaning of each element of the matrix A represents the partial coefficients of the correlation matrix between the original feature x j and main components F r. Therefore, all the elements.

The equality implies the condition r- the number of components.

The total contribution of each factor to the total variance of features is:

The factor analysis model will take the form:

Find the values ​​of the principal components (matrix F) according to the formula

The center of distribution of the values ​​of the principal components is at the point (0,0,0).

Further, analytical conclusions based on the results of calculations follow after a decision has been made on the number of significant features and the main components of determining the names of the main components. The problems of recognizing the main components, determining the names for them are solved subjectively based on the weight coefficients from the mapping matrix A.

Consider the question of the wording of the names of the main components.

We denote w 1 - a set of insignificant weight coefficients, which includes elements close to zero ,,

w 2 - a set of significant weights,

w 3 - a subset of significant weights that are not involved in the formation of the name of the main component.

w 2 - w 3 - a subset of the weighting factors involved in the formation of the name.

We calculate the coefficient of information content for each main factor

The set of explicable features is considered satisfactory if the values ​​of the informativeness coefficients are within the range of 0.75-0.95.

a 11 =0,776 a 12 =-0,130 a 13 =0,308

a 12 =0,904 a 22 =-0,210 a 23 =-0,420

a 31 =0,616 a 32 =0,902 a 33 =0,236

For j = 1 w 1 = ,w 2 ={a 11 ,a 21 ,a 31 },

.

For j = 2 w 1 ={a 12 ,a 22 }, w 2 ={ a 32 },

For j = 3 w 1 ={a 33 }, w 2 ={a 13 ,a 33 },

Feature values x 1 , x 2 , x 3, the composition of the main component is determined by 100%. in this case, the largest contribution of the feature x 2, the meaning of which is profitability. correct for the name of the feature F 1 will be production efficiency.

F 2 is determined by the component x 3 (return on assets), let's call it efficient use of fixed assets.

F 3 is determined by the components x 1 ,x 2 - may not be considered in the analysis because she explains only 10% of the total variation.

Literature.

    A.A. Popov

Excel: A Practical Guide, DESS COM.-M.-2000.

    Dyakonov V.P., Abramenkova I.V. Mathcad7 in mathematics, physics and the Internet. Publishing house "Nomidzh", M.-1998, section 2.13. Performing regression.

    L.A. Soshnikova, V.N. Tomashevich et al. Multivariate statistical analysis in economics, ed. V.N. Tomashevich. - M. -Nauka, 1980.

    V.A. Kolemaev, O.V. Staroverov, V.B. Turundaevsky Probability theory and mathematical statistics. –M. - Higher school - 1991.

    To Iberla. Factor analysis. -M. Statistics.-1980.

Comparison of two means of normal general populations, the variances of which are known

Let the general populations X and Y be normally distributed, and their variances are known (for example, from previous experience or found theoretically). For independent samples of volumes n and m, extracted from these populations, the sample means x in and y in were found.

It is required to test the null hypothesis based on the sample means at a given level of significance, which consists in the fact that the general means (mathematical expectations) of the considered populations are equal to each other, that is, H 0: M (X) = M (Y).

Considering that the sample means are unbiased estimates of the general means, i.e., M (x in) = M (X) and M (y in) = M (Y), the null hypothesis can be written as follows: H 0: M (x in ) = M (y in).

Thus, it is required to check that the mathematical expectations of the sample means are equal to each other. This task is posed because, as a rule, the sample means are different. The question arises: do the sample means differ significantly or insignificantly?

If it turns out that the null hypothesis is true, that is, the general means are the same, then the difference in the sample means is insignificant and can be explained by random reasons and, in particular, by a random selection of sample objects.

If the null hypothesis is rejected, that is, the general means are not the same, then the difference in the sample means is significant and cannot be explained by random reasons. And it is explained by the fact that the general average (mathematical expectations) themselves are different.

As a test of the null hypothesis, we take a random variable.

Criterion Z - normalized normal random variable. Indeed, the quantity Z is distributed normally, since it is a linear combination of the normally distributed quantities X and Y; these values ​​themselves are normally distributed as sample averages found from samples extracted from general populations; Z is a normalized value, because M (Z) = 0, if the null hypothesis is true, D (Z) = 1, since the samples are independent.

The critical area is constructed depending on the type of the competing hypothesis.

First case... Null hypothesis H 0: M (X) = M (Y). Competing hypothesis H 1: M (X) ¹M (Y).

In this case, a two-sided critical area is constructed based on the requirement that the probability of the criterion falling into this area, assuming the validity of the null hypothesis, is equal to the accepted level of significance.

The greatest power of the criterion (the probability of the criterion falling into the critical region with the validity of the competing hypothesis) is achieved when the "left" and "right" critical points are chosen so that the probability of the criterion falling into each interval of the critical region is equal to:

P (Z< zлев.кр)=a¤2,

P (Z> z right cr) = a¤2. (1)

Since Z is a normalized normal quantity, and the distribution of such a quantity is symmetric about zero, the critical points are symmetric about zero.

Thus, if we denote the right boundary of the two-sided critical region through zcr, then the left boundary is -zcr.

So, it is enough to find the right boundary to find the very two-sided critical region Z< -zкр, Z >zcr and the area of ​​acceptance of the null hypothesis (-zcr, zcr).

Let us show how to find zcr - the right boundary of the two-sided critical region, using the Laplace function Ф (Z). It is known that the Laplace function determines the probability of hitting a normalized normal random variable, for example Z, in the interval (0; z):

P (0< Z

Since the distribution of Z is symmetric about zero, the probability of getting Z into the interval (0; ¥) is 1/2. Therefore, if we divide this interval by the point zcr into the interval (0, zcr) and (zcr, ¥), then by the addition theorem P (0< Z < zкр)+Р(Z >zcr) = 1/2.

By virtue of (1) and (2), we obtain Ф (zcr) + a / 2 = 1/2. Therefore, Ф (zкр) = (1-a) / 2.

Hence we conclude: in order to find the right boundary of the two-sided critical region (zcr), it is enough to find the value of the argument of the Laplace function, which corresponds to the value of the function equal to (1-a) / 2.

Then the two-sided critical region is defined by the inequalities Z< – zкр, Z >zcr, or the equivalent inequality ½Z1> zcr, and the domain of acceptance of the null hypothesis by the inequality - zcr< Z < zкр или равносильным неравенством çZ ç< zкр.

Let us denote the value of the criterion, calculated from the observational data, through zobl and formulate the rule for testing the null hypothesis.

Rule.

1. Calculate the observed value of the criterion

2. From the table of the Laplace function, find the critical point by the equality Ф (zкр) = (1-a) / 2.

3. If ç zobl ç< zкр – нет оснований отвергнуть нулевую гипотезу.

If ç zobl ç> zcr - the null hypothesis is rejected.

Second case... Null hypothesis Н0: M (X) = M (Y). Competing hypothesis H1: M (X)> M (Y).

In practice, this is the case if professional considerations suggest that the general average of one population is greater than the general average of the other. For example, if a process improvement is introduced, then it is natural to assume that it will lead to an increase in output.

In this case, a right-sided critical area is constructed based on the requirement that the probability of the criterion falling into this area, assuming the validity of the null hypothesis, is equal to the accepted level of significance:

P (Z> zcr) = a. (3)

Let's show how to find the critical point using the Laplace function. We will use the relation

P (0 zcr) = 1/2.

By virtue of (2) and (3), we have Ф (zcr) + a = 1/2. Therefore, Ф (zкр) = (1-2a) / 2.

Hence, we conclude that in order to find the boundary of the right-sided critical region (zcr), it is sufficient to find the value of the Laplace function, equal to (1-2a) / 2. Then the right-sided critical region is determined by the inequality Z> zcr, and the region of acceptance of the null hypothesis is determined by the inequality Z< zкр.

Rule.

1. Calculate the observed value of the criterion zobl.

2. From the table of the Laplace function, find the critical point from the equality Ф (zкр) = (1-2a) / 2.

3. If Z obs< z кр – нет оснований отвергнуть нулевую гипотезу. Если Z набл >z cr - we reject the null hypothesis.

Third case. Null hypothesis Н0: M (X) = M (Y). Competing hypothesis H1: M (X)

In this case, a left-side critical area is built based on the requirement, the probability of the criterion falling into this area, in

position of validity of the null hypothesis was equal to the accepted significance level P (Z< z’кр)=a, т.е. z’кр= – zкр. Таким образом, для того чтобы найти точку z’кр, достаточно сначала найти “вспомогательную точку” zкр а затем взять найденное значение со знаком минус. Тогда левосторонняя критическая область определяется неравенством Z < -zкр, а область принятия нулевой гипотезы – неравенством Z >-zcr.

Rule.

1. Calculate Zobl.

2. According to the Laplace function table, find the “auxiliary point” zcr by the equality Ф (zcr) = (1-2a) / 2, and then put z'cr = -zcr.

3. If Zobl> -zcr, there is no reason to reject the null hypothesis.

If Zobl< -zкр, – нулевую гипотезу отвергают.

Basic Equations

Previously, almost all textbooks and monographs on factor analysis provided an explanation of how to carry out basic calculations "manually" or using the simplest calculating device (adding machine or calculator). Today, due to the complexity and large amount of computations required to construct a matrix of interrelationships, isolate factors and rotate them, there is probably not a single person who would not use powerful computers and corresponding programs when conducting factor analysis.

Therefore, we will focus on what the most significant matrices (data sets) can be obtained in the course of factor analysis, how they are related to each other and how they can be used to interpret the data. All necessary calculations can be done using any computer program (for example, SPSS or STADIA).

V tab. 1 provides a list of the most important matrices for principal component analysis and factor analysis. This list contains mainly relationship matrices (between variables, between factors, between variables and factors), standardized values ​​(for variables and for factors), regression weights (for calculating factor values ​​using values ​​for variables), and factor mapping matrices of relationships between factors and variables after oblique rotation. V tab. 1 matrices of eigenvalues ​​and the corresponding eigenvectors are also given. The eigenvalues ​​(eigenvalues) and eigenvectors are described in view of their importance for the selection of factors, the use of a large number of special terms in this regard, as well as the close relationship of eigenvalues ​​and variance in statistical studies.

Table 1

Matrices most commonly used in factor analysis

Designation Name The size Description
R Relationship matrix p x p Relationships between variables
D Non-standardized data matrix N x p Primary data - non-standardized observation values ​​for primary variables
Z Standardized data matrix N x p Standardized Observation Values ​​for Primary Variables
F Factor Values ​​Matrix N x f Standardized Observation Values ​​by Factor
A Factor loading matrix Factor mapping matrix p x f Regression coefficients for common factors, assuming the observed variables are a linear combination of factors. In the case of orthogonal rotation, the relationship between variables and factors
V Factor value coefficient matrix p x f Regression Coefficients for Calculating Factor Values ​​Using Variable Values
S Structural matrix p x f Relationships between variables and factors
F Factor correlation matrix f x f Correlations between factors
L Eigenvalue matrix (diagonal) f x f Eigenvalues ​​(characteristic, latent roots); each factor has one proper number
V Eigenvector matrix f x f Own (characteristic) vectors; each eigenvalue corresponds to one eigenvector

Note. When specifying the size, the number of rows x the number of columns is given: R- the number of variables, N- the number of observations, f- the number of factors or components. If the matrix of relationships R is not degenerate and has a rank equal to R, then it actually stands out R eigenvalues ​​and eigenvectors, not f... However, only f of them. Therefore, the remaining p - f are not shown.

To matrices S and F applies only oblique rotation, to the rest - orthogonal and oblique.

The data set prepared for factor analysis consists of the results of measurements (polling) of a large number of subjects (respondents) according to certain scales (variables). V tab. 2 an array of data is given, which can be conditionally considered as satisfying the requirements of factor analysis.

Five respondents, who applied to a travel agency in order to purchase a ticket to a seaside resort, were asked questions about the significance for them of four conditions (variables) for choosing a summer vacation destination. These variable conditions were: the cost of the voucher, the comfort of the complex, the air temperature, the water temperature. The more, from the point of view of the respondent, the significance of this or that condition for him, the more significance he attributed to it. The research task consisted in studying the model of the relationship between the variables and identifying the underlying causes that determine the choice of the resort. (The example, of course, is extremely simplified for illustrative and educational purposes, and it should not be taken seriously in a meaningful aspect.)

Relationship matrix ( tab. 2) was calculated as correlation. Pay attention to the structure of relationships in it, highlighted by vertical and horizontal lines. High correlations in the upper left and lower right quadrants show that estimates for the cost of a ticket and the comfort of the complex are interrelated, as well as estimates for air temperature and water temperature. The other two quadrants show that the air temperature and the comfort of the complex are related, as well as the comfort of the complex and the temperature of the water.

Let us now try, using factor analysis, to find this structure of correlations, which is easily seen by the naked eye in a small correlation matrix (this is very difficult to do in a large matrix).

table 2

Factor Analysis Data (Case Study)

Tourists Variables
Voucher cost Comfort level Air temperature Water temperature
T1
T2
T3
T4
T5

Correlation matrix

Voucher cost Comfort level Air temperature Water temperature
Voucher cost 1,000 -0,953 -0,055 -0,130
Comfort level -0,953 1,000 -,091 -0,036
Air temperature -0,055 -0,091 1,000 0,990
Water temperature -0,130 -0,036 0,990 1,000

Factorization

An important theorem from matrix algebra states that matrices that satisfy certain conditions can be diagonalized, i.e. transformed into a matrix with numbers on the main diagonal and zeros on all other positions. Relationship matrices are precisely the type of diagonalizable matrices. The transformation is carried out according to the formula:

those. diagonalization of the matrix R is performed by multiplying it first (on the left) by the transposed matrix V, denoted by V ', and then (on the right) by the matrix V.

The columns in the matrix V are called eigenvectors, and the values ​​on the main diagonal of the matrix L are called eigenvalues. The first eigenvector matches the first eigenvalue, and so on. (for more details see Appendix 1).

Due to the fact that in the given example four variables are considered, we obtain four eigenvalues ​​with their corresponding eigenvectors. But since the goal of factor analysis is to generalize the relationship matrix using as few factors as possible and each eigenvalue corresponds to different potential factors, usually only factors with large eigenvalues ​​are taken into account. With a "good" factorial solution, the matrix of calculated relationships obtained using this limited set of factors practically duplicates the matrix of relationships.

In our example, when no constraints are imposed on the number of factors, the eigenvalues ​​2.02, 1.94, .04, and.00 are calculated for each of the four possible factors. Only for the first two factors, the eigenvalues ​​are large enough to become the subject of further consideration. Therefore, only the first two factors are re-emphasized. They have eigenvalues ​​2.00 and 1.91, respectively, as indicated in table. 3. Using equation (6) and inserting the values ​​from the above example, we get:

(All computer calculated values ​​are the same; manual calculations may differ due to rounding inaccuracies.)

The left multiplication of the matrix of eigenvectors by the transposed matrix gives the identity matrix E (with ones on the main diagonal and other zeros). Therefore, we can say that the transformation of the matrix of relationships according to formula (6) does not change it itself, but only transforms it to a more convenient form for analysis:

For example:

Table 3

Eigenvectors and Corresponding Eigenvalues ​​for the Case Study

Eigenvector 1 Eigenvector 2
-.283 .651
.177 -.685
.658 .252
.675 .207
Eigenvalue 1 Eigenvalue 2
2.00 1.91

Since the correlation matrix is ​​diagonalizable, the matrix algebra of eigenvectors and eigenvalues ​​can be applied to it to obtain the results of factor analysis (see Appendix 1). If the matrix is ​​diagonalizable, then all the essential information about the factorial structure is contained in its diagonal form. In factor analysis, the eigenvalues ​​correspond to the variance explained by the factors. The factor with the largest eigenvalue explains the largest variance, etc., until it comes to factors with small or negative eigenvalues, which are usually not considered in the analysis. Calculating eigenvalues ​​and eigenvectors is very laborious, and the ability to calculate them is not an absolute necessity for a psychologist who is mastering factor analysis for his own practical purposes. However, familiarity with this procedure does not hurt, therefore in Appendix 1 we give as an example of calculating eigenvalues ​​and eigenvectors on a small matrix.

To find the eigenvalues ​​of a square matrix p x p, it is necessary to find the roots of a polynomial of degree p, and to find the eigenvectors, to solve p equations with p unknowns with additional side constraints, which for p> 3 is rarely done manually. Once the eigenvectors and eigenvalues ​​are found, the rest of factor analysis (or principal component analysis) becomes more or less clear (see Equations 8-11).

Equation (6) can be represented as: R = V'LV, (8)

those. the matrix of interconnections can be considered as the product of three matrices - the matrix of eigenvalues, the matrix of the corresponding eigenvectors and transposed to it.

After transformation, the matrix of eigenvalues ​​L can be represented as follows:

and therefore: R = VÖLÖL V ’(10)

or (which is the same): R = (VÖL) (ÖL V ’)

We denote: A = (VÖL), and A ’= (ÖL V’), then R = AA ’(11)

those. the relationship matrix can also be represented as the product of two matrices, each of which is a combination of eigenvectors and square roots of eigenvalues.

Equation (11) is often referred to as the fundamental factor analysis equation. It expresses the statement that the relationship matrix is ​​the product of the factor loadings matrix (A) and transposed to it.

Equations (10) and (11) also show that a significant part of the calculations in the methods of factor analysis and principal components consists in determining the eigenvalues ​​and eigenvectors. Once they are known, the factorial matrix before rotation is obtained by direct matrix multiplication:

In our example:

The factor loadings matrix is ​​a matrix of relationships (interpreted as correlation coefficients) between factors and variables. The first column is the correlations between the first factor and each variable in turn: the cost of the ticket (-.400), the comfort of the complex (.251), the air temperature (.932), the water temperature (.956). The second column is the correlations between the second factor and each variable: the cost of the ticket (.900), the comfort of the complex (-.947), the air temperature (.348), the water temperature (.286). The factor is interpreted on the basis of variables strongly related to it (i.e. having high loads on it). So, the first factor is mainly "climatic" (air and water temperature), while the second is "economic" (the cost of the ticket and the comfort of the complex).

When interpreting these factors, one should pay attention to the fact that the variables with high loads for the first factor (air temperature and water temperature) are positively interconnected, while the variables with high loads for the second factor (the cost of the trip and the comfort of the complex) are negatively interconnected. (one cannot expect great comfort from a cheap resort). The first factor is called unipolar (all the variables are grouped at one pole), and the second is called bipolar (the variables split into two groups that are opposite in meaning - two poles). Variables having factorial loads with a plus sign form a positive pole, and those with a minus sign form a negative pole. In this case, the names of the poles "positive" and "negative" when interpreting the factor do not have the evaluative meaning of "bad" and "good". The choice of the sign occurs at random during the calculations. Replacing all signs with opposite ones (all pluses for minuses, and all minuses for pluses) does not change the solution. Analysis of signs is only necessary for identifying groups (what is opposed to what). With the same success, one pole can be called right, the other left. In our example, the variable cost of the voucher turned out to be at the positive (right) pole; it was opposed to the variable comfort of the complex at the negative (left) pole. And this factor can be interpreted (called) as "Economy about Comfort". The respondents, for whom the problem of saving is significant, were on the right - they received factorial values ​​with a plus sign. When choosing a resort, they are more guided by its cheapness and less by comfort. The respondents who do not save on vacation (they do not care much about the price of the voucher) and who want to relax, first of all, in comfortable conditions, were on the left - they received factor values ​​with a minus sign.

However, it should be borne in mind that all variables are highly correlated with both factors. Within this simple example, the interpretation is obvious, but in the case of real data, it is not so simple. Usually a factor is easier to interpret if only a small part of the variables are strongly related to it, and the rest are not.

Orthogonal rotation

Rotation is usually applied after factoring out to maximize high correlations and minimize low ones. There are numerous methods of rotation, but the most commonly used rotation is varimax, which is a variance maximization procedure. This pivot maximizes the variance of the factor loadings, making the high loads higher and the lower ones lower than the day of each factor. This goal is achieved with transformation matrix Л:

A before turning L = A after turning,

those. the factorial loadings matrix before the swing is multiplied by the transformation matrix and the result is the factorial loadings matrix after the swing. In our example:

Compare matrices before and after rotation. Note that the matrix after rotation has lower factorial loads and higher ones than the matrix before rotation. The emphasized difference in loads facilitates the interpretation of the factor, allows one to unambiguously select variables that are strongly interrelated with it.

Transformation matrix elements have a special geometric interpretation:

A transformation matrix is ​​a matrix of sines and cosines of the angle ψ to be rotated. (Hence the name of the transformation - rotation, because from a geometric point of view, the axes rotate around the origin of the factor space.) In our example, this angle is approximately 19 degrees: cos19 ° = .946 and sin19 ° = .325. Geometrically, this corresponds to rotating the factor axes 19 degrees around the origin. (For more information on the geometric aspects of rotation, see below.)

National Research Nuclear University "MEPhI"
Faculty of Business Informatics and Management
complex systems
Department of Economics and Management
in industry (No. 71)
Mathematical and instrumental processing methods
statistical information
Kireev V.S.,
Ph.D., associate professor
Email:
Moscow, 2017
1

Normalization

Decimal scaling
Minimax normalization
Normalization using standard transformation
Normalization using element-by-element transformations
2

Decimal scaling

Vi
"
Vi k, max (Vi) 1
10
"
3

Minimax normalization

Vi
Vi min (Vi)
"
i
max (Vi) min (Vi)
i
i
4

Normalization using standard deviation

Vi
"
V
V
Vi V
V
- selective
the average
- sample mean square
deviation
5

Normalization using element-by-element transformations

Vi f Vi
"
Vi 1
"
log Vi
, Vi log Vi
"
Vi exp Vi
"
Vi Vi, Vi 1 y
Vi
"
y
"
6

Factor analysis

(FA) is a collection of methods that
on the basis of the really existing connections of the analyzed features, the connections themselves
observable objects, allow you to identify hidden (implicit, latent)
generalizing characteristics of the organizational structure and development mechanism
studied phenomena, processes.
Factor analysis methods in research practice are mainly applied
way in order to compress information, obtain a small number of generalizing
features explaining the variability (variance) of elementary features (R-technique of factor analysis) or variability of the observed objects (Q-technique
factor analysis).
Factor analysis algorithms are based on the use of reduced
matrices of pairwise correlations (covariances). A reduced matrix is ​​a matrix on
the main diagonal of which is not the units (estimates) of the complete correlation, or
estimates of the total variance, and their reduced, somewhat reduced values. At
it is postulated that the analysis will not explain all the variance
of the studied features (objects), and some part of it, usually a large one. Remaining
the unexplained part of the variance is the specificity arising from the specificity
observed objects, or errors made when registering phenomena, processes,
those. unreliability of input data.
7

Classification of FA methods

8

Principal component method

(MGK) is used to reduce the dimension
space of observed vectors without leading to a significant loss of
informativeness. The prerequisite for PCA is the normal distribution law
multidimensional vectors. In MGK, linear combinations of random variables are defined
characteristic
vectors
covariance
matrices.
The main
components represent an orthogonal coordinate system in which the variances
components characterize their statistical properties. MGK is not classified as FA, although it has
a similar algorithm and solves similar analytical problems. Its main difference
lies in the fact that not the reduced, but the usual matrix is ​​subject to processing
pairwise correlations, covariances, on the main diagonal of which units are located.
Let the initial set of vectors X of the linear space Lk be given. Application
method of principal components allows us to pass to the basis of the space Lm (m≤k), such
that: the first component (the first vector of the basis) corresponds to the direction along
which the variance of the vectors of the original set is maximum. Second direction
components (of the second basis vector) is chosen in such a way that the variance of the initial
vectors along it was maximal under the condition of orthogonality to the first vector
basis. The rest of the basis vectors are defined similarly. As a result, directions
basis vectors are chosen so as to maximize the variance of the initial set
along the first components, called principal components (or principal
It turns out that the main variability of the vectors of the original set of vectors
is represented by the first few components, and it becomes possible by discarding
less essential components, go to a space of lower dimension.
9

10. The method of principal components. Scheme

10

11. The method of principal components. Matrix of accounts

The score matrix T gives us the projections of the original samples (J -dimensional
vectors
x1, ..., xI)
on
subspace
the main
component
(A-dimensional).
Rows t1,…, tI of the matrix T are the coordinates of the samples in the new coordinate system.
Columns t1, ..., tA of the matrix T are orthogonal and represent the projections of all samples onto
one new coordinate axis.
When examining data using the PCA method, special attention is paid to graphs
accounts. They carry information useful for understanding how they work
data. On the graph of accounts, each sample is depicted in coordinates (ti, tj), most often
- (t1, t2), designated PC1 and PC2. The proximity of two points means their similarity, i.e.
positive correlation. Points at right angles are
uncorrelated, and located diametrically opposite - have
negative correlation.
11

12. The method of principal components. Load matrix

The load matrix P is the transition matrix from the original space
variables x1,… xJ (J-dimensional) into the space of principal components (A-dimensional). Each
the row of the matrix P consists of the coefficients connecting the variables t and x.
For example, the a-th line is the projection of all variables x1, ... xJ on the a-th axis of the main
component. Each column of P is the projection of the corresponding variable xj onto the new
coordinate system.
The load graph is used to investigate the role of variables. On this
In the graph, each variable xj is displayed as a point in coordinates (pi, pj), for example
(p1, p2). Analyzing it similarly to the graph of accounts, you can understand which variables
related and which are independent. Joint exploration of paired account charts and
loads can also provide a lot of useful information about the data.
12

13. Features of the method of principal components

The principal component analysis is based on the following assumptions:
assumption that the data dimension can be effectively downsized
by linear transformation;
the assumption that most of the information is carried by those directions in which
the variance of the input data is maximum.
It is easy to see that these conditions are not always met. For example,
if the points of the input set are located on the surface of the hypersphere, then no
linear transformation will not be able to downsize (but it can be easily dealt with
nonlinear transformation based on the distance from a point to the center of the sphere).
This drawback is equally common to all linear algorithms and can be
overcome by using additional dummy variables that are
nonlinear functions from elements of the input data set (the so-called kernel trick).
The second disadvantage of the principal component method is that the directions
maximizing variance do not always maximize information content.
For example, a variable with maximum variance may carry almost no
information, while the variable with the minimum variance allows
completely separate the classes. The method of principal components in this case will give
preference for the first (less informative) variable. All additional
information associated with the vector (for example, whether the image belongs to one of
classes) is ignored.
13

14. Example data for MGK

K. Esbensen. Analysis of multidimensional data, abbr. per. from English under
ed. O. Rodionova, Iz-in IPKhF RAS, 2005
14

15. Example of data for IGC. Designations

Height
Height: in centimeters
Weight
Weight: in kilograms
Hair
Hair: short: -1, or long:
+1
Shoes
Shoes: European size
standard
Age
Age: in years
Income
Income: in thousands of euros per year
Beer
Beer: consumption in liters per year
Wine
Wine: consumption in liters per year
Sex
Gender: male: -1, or female: +1
Strength
Strength: an index based on
test of physical abilities
Region
Region: North: -1, or South: +1
IQ
IQ,
measured by standard test
15

16. Matrix of accounts

16

17. Matrix of loads

17

18. Objects of selection in the space of new components

Women (F) are indicated by circles ● and ●, and
men (M) - by squares ■ and ■. North (N)
represented in blue ■ and south (S) in red
color ●.
The size and color of the symbols reflects income - how
the larger and lighter, the larger it is. Numbers
represent age
18

19. Initial variables in the space of new components

19

20. Scree plot

20

21. Method of main factors

In the paradigm of the method of main factors, the problem of reducing the dimensionality of the
space looks like n features can be explained using a smaller
the number of m-latent features - common factors, where m<initial features and introduced general factors (linear combinations)
taken into account using the so-called characteristic factors.
The ultimate goal of statistical research conducted with the involvement of
the apparatus of factor analysis, as a rule, consists in identifying and interpreting
latent common factors with a simultaneous desire to minimize how they
the number and the degree of dependence on their specific residual random
component.
Every sign
is the result
the impact of m hypothetical common and
one characteristic factor:
X 1 a11 f1 a12 f 2 a1m f m d1V1
X a f a f a f d V
2
21 1
22 2
2m m
2
X n a n1 f1 a n 2 f 2 a nm f m d nVn
21

22. Rotation of factors

Rotation is a way of transforming the factors obtained in the previous step,
into more meaningful ones. Rotation is divided into:
graphic (drawing axes, does not apply when more than two-dimensional
analysis),
analytical (a certain rotation criterion is chosen, distinguish between orthogonal and
oblique) and
matrix-approximate (rotation consists in approaching a certain given
target matrix).
The result of rotation is a secondary structure of factors. Primary
factor structure (consisting of primary loads (obtained on the previous
stage) are, in fact, projections of points onto orthogonal coordinate axes. It's obvious that
if the projections are zero, the structure will be simpler. And the projections will be zero,
if the point lies on some axis. Thus, rotation can be considered a transition from
one coordinate system to another with known coordinates in one system (
primary factors) and iteratively selected coordinates in another system
(secondary factors). When obtaining a secondary structure, they tend to go to such
coordinate system in order to draw as many axes as possible through the points (objects), so that
as many projections (and therefore loads) were zero. Moreover, they can
remove restrictions on orthogonality and decrease in significance from the first to the last
factors characteristic of the primary structure.
22

23. Orthogonal rotation

implies that we will rotate factors, but not
we will violate their orthogonality to each other. Orthogonal rotation
implies multiplying the original matrix of primary loads by the orthogonal
matrix R (a matrix such that
V = BR
In general, the orthogonal rotation algorithm is as follows:
0. B - matrix of primary factors.
1.
Are looking for
orthogonal
matrix
RT
size
2*2
for
two
columns (factors) bi and bj of the matrix B such that the criterion for the matrix
R is maximum.
2.
Replace columns bi and bj with columns
3.
Check if all columns have been sorted out. If not, then go to 1.
4.
We check that the criterion for the entire matrix has grown. If yes, then go to 1. If
no, then the end of the algorithm.
.
23

24. Varimax rotation

This criterion uses the formalization
the variance of the squares of the variable loads:
difficulties
factor a
across
Then the criterion in general form can be written as:
At the same time, factor loads can be normalized to get rid of
the influence of individual variables.
24

25. Quarter-max rotation

Let us formalize the concept of factorial complexity q of the i-th variable in terms of
the variance of the squares of the factor loadings of the factors:
where r is the number of columns of the factor matrix, bij is the factor load of the j-th
factor on the i-th variable, is the average value. Quartimax criterion tries
maximize the complexity of the entire set of variables in order to achieve
ease of interpretation of factors (seeks to facilitate the description of columns):
Considering that
is a constant (the sum of the eigenvalues ​​of the matrix
covariance) and opening the mean value (and also taking into account that the power function
grows in proportion to the argument), we obtain the final form of the criterion for
maximizing:
25

26. Criteria for determining the number of factors

The main problem of factor analysis is the selection and interpretation
main factors. When selecting components, the researcher is usually faced with
significant difficulties, since there is no unambiguous criterion for identifying
factors, and therefore the subjectivism of the interpretations of the results is inevitable here.
There are several commonly used criteria for determining the number of factors.
Some of them are alternative to others, and some of these
criteria can be used together to complement the other:
Kaiser test or eigenvalue test. This criterion is proposed
Kaiser, and is probably the most widely used. Selected only
factors with eigenvalues ​​equal to or greater than 1. This means that if
the factor does not distinguish a variance equivalent to at least the variance of one
variable, it is omitted.
Scree criterion or screening criterion. He is
graphic method, first proposed by the psychologist Cattell. Own
the values ​​can be displayed as a simple graph. Cattel offered to find such
place on the graph where the decrease in eigenvalues ​​from left to right is maximum
slows down. It is assumed that only
"Factorial talus" - "talus" is a geological term denoting
debris accumulating at the bottom of the rocky slope.
26

27. Criteria for determining the number of factors. Continuation

Significance criterion. It is especially effective when the general model
the aggregate is known and there are no secondary factors. But the criterion is not valid
to search for changes in the model and are implemented only in factor analysis using the method
least squares or maximum likelihood.
The criterion for the proportion of reproducible variance. Factors are ranked by share
deterministic variance, when the percentage of variance turns out to be insignificant,
the selection should be stopped. It is desirable that the selected factors explain
more than 80% of the spread. Disadvantages of the criterion: firstly, the subjectivity of the selection, and secondly, the specifics of the data may be such that all the main factors cannot
cumulatively explain the desired percentage of variance. Therefore, the main factors
must together explain at least 50.1% of the variance.
Criterion for interpretability and invariance. This criterion combines
statistical accuracy with subjective interests. According to him, the main factors
can be distinguished as long as their clear interpretation is possible. She, in her
turn, depends on the magnitude of the factor loadings, that is, if the factor contains at least
one strong load, it can be interpreted. The opposite is also possible -
if there are strong loads, however, interpretation is difficult, from this
components are preferably discarded.
27

28. An example of using MGK

Let be
there are
the following
indicators
economic
activities
enterprises: labor intensity (x1), share of purchased products in production (x2),
equipment replacement ratio (x3), the proportion of workers in the enterprise
(x4), bonuses and remuneration per employee (x5), profitability (y). Linear
the regression model is:
y = b0 + b1 * x1 + b2 * x2 + b3 * x3 + b4 * x4 + b5 * x5
x1
x2
x3
x4
x5
y
0,51
0,2
1,47
0,72
0,67
9,8
0,36
0,64
1,27
0,7
0,98
13,2
0,23
0,42
1,51
0,66
1,16
17,3
0,26
0,27
1,46
0,69
0,54
7,1
0,27
0,37
1,27
0,71
1,23
11,5
0,29
0,38
1,43
0,73
0,78
12,1
0,01
0,35
1,5
0,65
1,16
15,2
0,02
0,42
1,35
0,82
2,44
31,3
0,18
0,32
1,41
0,8
1,06
11,6
0,25
0,33
1,47
0,83
2,13
30,1
28

29. An example of using MGK

Building a regression model in a statistical package shows
the coefficient X4 is not significant (p-Value> α = 5%), and it can be excluded from the model.
what
After excluding X4, the model building process starts again.
29

30. An example of using MGK

The Kaiser criterion for PCA shows that you can leave 2 components explaining
about 80% of the original variance.
For the selected components, you can build equations in the original coordinate system:
U1 = 0.41 * x1 - 0.57 * x2 + 0.49 * x3 - 0.52 * x5
U2 = 0.61 * x1 + 0.38 * x2 - 0.53 * x3 - 0.44 * x5
30

31. An example of using CIM

Now you can build a new regression model in the new components:
y = 15.92 - 3.74 * U1 - 3.87 * U2
31

32. Singular value decomposition (SVD)

Beltrami and Jordan are considered the founders of the theory of the singular
decomposition. Beltrami - for being the first to publish a work about
singular value, and Jordan - for the elegance and completeness of its
work. Beltrami's work appeared in the Journal of Mathematics for
the Use of the Students of the Italian Universities ”in 1873, the main
the purpose of which was to familiarize students with
bilinear forms The essence of the method lies in the decomposition of a matrix A of size n
x m with rank d = rank (M)<= min(n,m) в произведение матриц меньшего
rank:
A = UDVT,
where matrices U of size n x d and V of size m x d consist of
of orthonormal columns, which are eigenvectors for
nonzero eigenvalues ​​of the matrices AAT and ATA, respectively, and
UTU = V TV = I, and D of size d x d is a diagonal matrix with
positive diagonal elements sorted into
descending order. The columns of the matrix U are,
is an orthonormal basis of the column space of the matrix A, and the columns
matrix V is an orthonormal basis of the row space of matrix A.
32

33. Singular value decomposition (SVD)

An important property of the SVD decomposition is the fact that if
for k of only k largest diagonal elements, and also
leave only the first k columns in the matrices U and V, then the matrix
Ak = UkDkVkT
will be the best approximation of the matrix A with respect to
Frobenius norms among all matrices with rank k.
This truncation firstly reduces the dimension of the vector
space, reduces storage and computing requirements
model requirements.
Second, discarding small singular numbers, small
distortions due to noise in the data are removed, leaving
only the strongest effects and trends in this model.