Internet Windows Android

Application of the χ2 test to test the hypothesis that two or more fractions are equal. Testing the hypothesis about the independence of the logarithmic yield of the 2nd criterion

Statistical test

The rule according to which the hypothesis R 0 is rejected or accepted is called statistical criterion. The name of the criterion, as a rule, contains a letter, which designates a specially compiled characteristic from clause 2 of the statistical hypothesis testing algorithm (see clause 4.1), calculated in the criterion. Under the conditions of this algorithm, the criterion would be called "v-criterion".

When testing statistical hypotheses, two types of errors are possible:

  • - error of the first kind(you can reject the hypothesis I 0 when it is actually true);
  • - error of the second kind(you can accept the hypothesis I 0 when it is actually not true).

Probability a to make a mistake of the first kind is called the level of significance of the criterion.

If for R denote the probability of making an error of the second kind, then (l - R) - the probability of avoiding an error of the second kind, which is called power of the criterion.

Goodness of fit x 2 Pearson

There are several types of statistical hypotheses:

  • - about the law of distribution;
  • - uniformity of samples;
  • - numerical values ​​of distribution parameters, etc.

We will consider the hypothesis of the distribution law using the example of Pearson's x 2 goodness-of-fit test.

The criterion of consent is called the statistical criterion for testing the null hypothesis about the assumed law of the unknown distribution.

Pearson's goodness-of-fit test is based on a comparison of empirical (observed) and theoretical observation frequencies calculated under the assumption of a certain distribution law. Hypothesis # 0 is formulated here as follows: the general population is normally distributed according to the attribute under study.

Algorithm for testing statistical hypothesis # 0 for the criterion x 1 Pearson:

  • 1) we put forward the hypothesis I 0 - according to the studied attribute, the general population is distributed normally;
  • 2) calculate the sample mean and sample standard deviation O v;

3) according to the available sample volume NS we calculate a specially compiled characteristic,

where: i, - empirical frequencies, - theoretical frequencies,

NS - sample size,

h- the size of the interval (the difference between two adjacent options),

Normalized values ​​of the observed characteristic,

- table function. Also theoretical frequencies

can be calculated using the standard MS Excel function NORMDIST according to the formula;

4) according to the sample distribution, we determine the critical value of a specially compiled characteristic xl P

5) when hypothesis # 0 is rejected, when hypothesis # 0 is accepted.

Example. Consider the sign X- the value of the indicators of testing convicts in one of the correctional colonies for some psychological characteristics, presented in the form of a variation series:

At a significance level of 0.05, test the hypothesis of the normal distribution of the general population.

1. Based on the empirical distribution, you can put forward a hypothesis H 0: according to the studied attribute "the value of the test indicator for a given psychological characteristic" the general population was

the expected ones are distributed normally. Alternative hypothesis 1: the general population of convicts is not normally distributed according to the studied attribute “the value of the test indicator for a given psychological characteristic”.

2. Let's calculate the numerical sample characteristics:

Intervals

x g u

NS) SCH

3. Let's calculate the specially compiled characteristic j 2. To do this, in the penultimate column of the previous table, we find the theoretical frequencies by the formula, and in the last column

let's calculate the characteristic% 2. We get x 2 = 0,185.

For clarity, we will construct an empirical distribution polygon and a normal curve for theoretical frequencies (Fig. 6).

Rice. 6.

4. Determine the number of degrees of freedom s: k = 5, m = 2, s = 5-2-1 = 2.

According to the table or using the standard MS Excel function "HI20BR" for the number of degrees of freedom 5 = 2 and the level of significance a = 0.05 find the critical value of the criterion xl P.=5,99. For the significance level a= 0.01 critical criterion value NS%. = 9,2.

5. Observed value of the criterion NS= 0.185 less than all found values Hk R.-> therefore, the hypothesis I 0 is accepted at both levels of significance. The discrepancy between empirical and theoretical frequencies is insignificant. Consequently, the observational data are consistent with the hypothesis of a normal distribution of the general population. Thus, according to the studied criterion "the value of the test indicator for a given psychological characteristic", the general population of convicts is distributed normally.

  • 1. Koryachko A.V., Kulichenko A.G. Higher mathematics and mathematical methods in psychology: a guide to practical exercises for students of the Faculty of Psychology. Ryazan, 1994.
  • 2. Heritage AD Mathematical methods of psychological research. Analysis and interpretation of data: Textbook, manual. SPb., 2008.
  • 3. Sidorenko E.V. Methods of mathematical processing in psychology. SPb., 2010.
  • 4. Soshnikova L.A. and other Multivariate statistical analysis in economics: Textbook, manual for universities. M., 1999.
  • 5. Sukhodolskiy E.V. Mathematical methods in psychology. Kharkov, 2004.
  • 6. Shmoilova R.A., Minashkin V.E., Sadovnikova N.A. Workshop on the theory of statistics: Textbook, manual. M., 2009.
  • Gmurman V.E. Theory of Probability and Mathematical Statistics. P. 465.

Criterion assignments

The χ 2 criterion is used for two purposes;

1) to compare the empirical distribution of the feature with theoretical - uniform, normal or otherwise;

2) for comparison two, three or more empirical distributions of the same feature 12.

Description of the criterion

Criterion χ 2 answers the question of whether different values ​​of a feature occur with the same frequency in empirical and theoretical distributions or in two or more empirical distributions.

The advantage of the method is that it allows one to compare the distributions of features presented in any scale, starting from the naming scale (see section 1.2). In the simplest case of the alternative distribution "yes - no", "allowed marriage - did not allow marriage", "solved the problem - did not solve the problem", etc., we can already apply the χ 2 criterion.

Suppose a certain observer records the number of pedestrians who chose the right or left of two symmetrical paths on the way from point A to point B (see Fig. 4.3).

Suppose, as a result of 70 observations, it was found that NS\ people chose the right track, and only 19 chose the left. Using the criterion χ 2 we can determine if a given distribution of choices differs from a uniform distribution in which both tracks would be sampled with the same frequency. This is a variant of the comparison of the received uhpyric distribution with theoretical. Such a task can be, for example, in applied psychological research related to design in architecture, communication systems, etc.

But let us imagine that the observer solves a completely different problem: he is busy with the problems of bilateral regulation. The coincidence of the obtained distribution with the uniform one is of much less interest to him than the coincidence or non-coincidence of his data with the data of other researchers. He knows that people with a predominance of the right leg tend to circle counterclockwise, and people with a predominance of the left leg tend to circle clockwise, and that in a study by colleagues 13, a predominance of the left leg was found in 26 out of 100 people surveyed.

Using the χ2 method, he can compare two empirical distributions: a 51:19 ratio in his own sample and a 74:26 ratio in a sample of other researchers.

This is an option comparison of two empirical distributions according to the simplest alternative criterion (of course, the simplest from a mathematical point of view, and by no means a psychological one).

Similarly, we can compare the distributions of choices from three or more alternatives. For example, if in a sample of 50 people 30 chose the answer (a), 15 people - the answer (b) and 5 people - the answer (c), then we can use the χ 2 method to check whether this distribution differs from a uniform distribution or from distribution of answers in another sample, where answer (a) was chosen by 10 people, answer (b) -25 people, answer (c) - 15 people.

In cases where a trait is measured quantitatively, say, v points, seconds or millimeters, we may have to combine all the abundance of attribute values ​​in several digits. For example, if the time for solving the problem varies from 10 to 300 seconds, then we can enter 10 or 5 digits, depending on the sample size. For example, these will be discharges: 0-50 seconds; 51-100 seconds; 101-150 seconds, etc. Then we use the χ 2 method will compare the frequencies of occurrence of different categories of the feature, but the rest of the schematic diagram does not change.

When comparing the empirical distribution with the theoretical one, we determine the degree of discrepancy between the empirical and theoretical frequencies.

By comparing the two empirical distributions, we determine the degree of discrepancy between the empirical frequencies and the theoretical frequencies that would be observed if the two empirical distributions coincided. Formulas for calculating theoretical frequencies will be specially given for each comparison option.

The greater the discrepancy between two comparable distributions, the more empirical value of y).

Hypotheses

Several variants of hypotheses are possible, depending on the tasks,

which we put in front of ourselves.

First option:

N 0: The obtained empirical distribution of the trait does not differ from the theoretical (for example, uniform) distribution.

H 1: The resulting empirical distribution of the trait differs from the theoretical distribution.

Second option:

H 0: Empirical distribution 1 does not differ from empirical distribution 2.

H 1: Empirical distribution 1 is different from empirical distribution 2.

Third option:

H 0: Empirical distributions 1, 2, 3, ... do not differ from each other.

H 1: Empirical distributions 1, 2, 3, ... differ from each other.

The χ 2 criterion allows you to test all three hypotheses.

Graphical representation of the criterion

Let's illustrate an example with the choice of the right or left tracks on the path from point A to point B. In Fig. 4.4 the frequency of the left track selection is represented by the left bar, and the frequency of the right track selection is represented by the right bar of the histogram 14. On the ordinate axis, the relative frequencies of selection are measured, that is, the frequencies of the selection of a particular track, referred to the total number of observations. For the left track, the relative frequency, which is also called frequency, is 19/70, that is, 0.27, and for the right track, 51/70, that is 0.73.

If both tracks were chosen equally likely, then half of the subjects would choose the right track, and half the left. The probability of selecting each of the tracks would be 0.50.

We see that the deviations of the empirical frequencies from this value are quite significant. Perhaps the differences between empirical and theoretical distribution will be significant.

In Fig. 4.5 actually presents two histograms, but the bars are grouped so that on the left the frequencies of the left lane preference are compared in the choice of our observer (1) and in the sample of T.A. Dobrokhotova and N.N. Bragina (2), and on the right - the frequencies of the right lane preference in the same two samples.

We see that the discrepancies between the samples are very insignificant. Criterion χ2, most likely, it will confirm the coincidence of the two distributions.

Limitations of the criterion

1.The sample size should be large enough: NS30. At NS<30 критерий χ2 gives very approximate values. The accuracy of the criterion increases at large NS.

2. The theoretical frequency for each cell in the table should not be less than 5: f> 5. This means that if the number of digits is predetermined and cannot be changed, then we cannot apply the χ2 method without accumulating a certain minimum number of observations. If, for example, we want to test our assumptions that the frequency of calls to the Trust telephone service is unevenly distributed over 7 days of the week, then we need 5 * 7 = 35 calls. Thus, if the number of digits ( k) given in advance, as in this case, the minimum number of observations ( n min) is determined by the formula: n min = k*5.

3. The selected digits should "scoop out" the entire distribution, that is, cover the entire range of variability of features. In this case, the grouping into digits must be the same in all compared distributions.

4. It is necessary to make a "continuity correction" when comparing the distributions of characteristics that take only 2 values. When the correction is made, the χ 2 value decreases (see Example with correction for continuity).

5. The discharges must be non-overlapping: if an observation is assigned to one category, then it can no longer be assigned to any other category.

The sum of observations by category should always be equal to the total number of observations.

The question of what is considered the number of observations is legitimate - the number of choices, reactions, actions, or the number of subjects who make a choice, show reactions, or perform actions. If the subject shows several reactions, and all of them are registered, then the number of subjects will not coincide with the number of reactions. We can summarize the reactions of each subject, as, for example, it is done in the Heckhausen method for studying achievement motivation or in the S. Rosenzweig Frustration Tolerance Test, and compare the distributions of individual sums of reactions in several samples.

In this case, the number of observations will be the number of subjects. If we count the frequency of reactions of a certain type in the sample as a whole, then we get the distribution of reactions of different types, and in this case, the number of observations will be the total number of registered reactions, and not the number of subjects.

From a mathematical point of view, the rule of bit independence is observed in both cases: one observation refers to one and only one bit of the distribution.

One can imagine a variant of the study where we study the distribution of the choices of one subject. In cognitive-behavioral therapy, for example, the client is asked to record the exact time of the occurrence of an undesirable reaction, for example, attacks of fear, depression, outbursts of anger, self-deprecating thoughts, etc. appear more often, and helps the client build an individual program for the prevention of adverse reactions.

Is it possible using the χ2 criterion to prove that some clocks are more frequent in this individual distribution, while others are less frequent? All observations are dependent, since they refer to the same subject; at the same time, all the discharges are non-overlapping, since the same attack refers to one and only one discharge (in this case, one o'clock in the afternoon). Apparently, the application of the χ2 method in this case will be a certain simplification. Attacks of fear, anger, or depression may occur repeatedly throughout the day, and it may turn out that, say, early morning, 6 o'clock, and late evening, 12 o'clock, attacks usually occur together, on the same day: the same time, a 3-hour day attack appears no earlier than a day after the previous attack and no less than two days before the next, etc. Apparently, we are talking about a complex mathematical model or something like that , which cannot be "believed by algebra". And nevertheless, for practical purposes, it may be useful to use the criterion in order to reveal the systematic unevenness of the onset of any significant events, choices, preferences, etc., in the same person.

So, the same observation should apply to only one category. But whether to consider each subject as an observation or each investigated reaction of the subject is a question, the solution of which depends on the objectives of the study (see, for example, Ganzen V.A., Balin V.D., 1991, p. 10).

The main "limitation" of the criterion χ 2 - that it seems dauntingly difficult to most researchers.

Let's try to overcome the myth of the incomprehensible difficulty of the criterion χ 2 . To spice things up, consider a humorous literary example.

h2 Pearson test

The criteria by which a successful or unsuccessful selection of the distribution law is determined are usually denoted by the criteria of agreement. C. Pearson's criterion 2 is the most frequently used criterion for testing a simple hypothesis about the distribution law. It is based on the use as a measure of the deviation of experimental data from the hypothetical distribution of the same value that serves to construct the confidence region for the unknown density, replacing the unknown true values ​​of the probabilities of falling into the intervals with the probabilities calculated from the hypothetical distribution. Suppose that the range of possible values ​​of a random variable is divided into r intervals (multidimensional, i.e., rectangles, in the case of a vector quantity). Let be the random frequencies of hitting these intervals, obtained as a result of n experiments, P1, ..., Pr - the probabilities of hitting the same intervals, calculated from the hypothetical distribution.

In the general case, these probabilities are functions of estimates of unknown parameters obtained from the same experimental data, and therefore are also random quantities. Suppose that estimates of the unknown parameters of a hypothetical distribution are computed from the same pooled sample as the frequencies. Then the probabilities P1, ..., Pr will be some functions of frequencies, and to assess the deviation of experimental data from the hypothetical distribution, take the value

where Р1, ..., Pr - certain functions of frequencies.

Neumann and Pearson showed that if an asymptotically effective and asymptotically normal estimate of the unknown s-dimensional parameter of the hypothetical distribution over a grouped sample is used to calculate the probabilities P1, ..., Pr, then the value Z, determined by formula (1), in the limit as n ->? has a ch2 -distribution with r-s-1 degrees of freedom.

Using this theorem, it is possible to estimate the discrepancy between the experimental data and the hypothetical distribution using tables of the n2 distribution. We choose a sufficiently small probability p so that an event with such a probability can be considered practically impossible, and we determine from the equation

If the realization = 2 of the value of Z, obtained as a result of the experiments, exceeds or is equal to, = 2, then the hypothetical distribution is considered inconsistent with the experimental data, since with this distribution it is practically impossible to obtain with one sample = 2. The probability of such an event for a large number of experiments n is approximately equal to p, i.e. negligible. In this case, it is said that there is a significant deviation of the experimental data from the hypothetical distribution. If = 2, then it is believed that the hypothetical distribution does not contradict the experimental data, agrees with them.

The value is called the 100p-percent level of significance of the deviation of the sample from the hypothetical distribution. Typically 5%, 1%, and 0.1% significance levels are used, depending on the nature of the task.

To additionally check the consistency of the experimental data with the hypothetical distribution, it is useful to calculate the probability that for a given hypothetical distribution the value of Z will be greater than the value obtained as a result of the experiments of its realization = 2, P (Z> 2). The greater this probability, the better the sample agrees with the hypothetical distribution, the less the significance of the obtained discrepancy between the sample and the hypothetical distribution. Indeed, if the probability P (Z> 2) is high, then when repeating this series of experiments, if the chosen hypothesis about the distribution is correct, the values ​​of Z will often be obtained that are even larger than the value obtained as a result of the experiments = 2.

Pay attention to the fact that, having received = 2< и даже получив высокую вероятность P(Z >2), we do not draw a definite conclusion that the chosen distribution hypothesis is valid, but we only say that this hypothesis does not contradict the experimental results obtained, that it agrees with them, as a result of which it can be accepted. To get a sufficiently strong evidence that the random variable really obeys the hypothetical distribution law, it is necessary to repeat this series of experiments a sufficiently large number of times and make sure that the obtained agreement of the hypothesis with the experimental results is stable.

Kolmogorov criterion

Kolmogorov criterion - auxiliary criterion

As an auxiliary criterion for checking the uniformity of the distribution of the P-value of the main criterion, in this work we use the Kolmogorov criterion.

Kolmogorov's criterion considers the maximum value of the modulus of the difference between the statistical distribution function F ^ * (x) and the corresponding theoretical distribution function F (x, i.e. D = max | F ^ * (x) -F (x) |.

The next step is to determine the value of l = D. According to statistical tables (in the matcalc environment, the pvKolm (u) function) is the probability that, due to purely random reasons, the maximum discrepancy between F ^ * (x) and F (x) will be no less than actually observed. If the probability P (n) is relatively high, then the hypothesis should be accepted, if very small, then rejected as implausible.

Consider the application inMSEXCELthe Pearson chi-square test for testing simple hypotheses.

After obtaining experimental data (i.e. when there is some sample), the distribution law is usually chosen that best describes the random variable represented by a given sampling... Checking how well the experimental data are described by the chosen theoretical distribution law is carried out using criteria of consent. Null hypothesis, usually there is a hypothesis about the equality of the distribution of a random variable to some theoretical law.

Let's first consider the application Pearson's goodness-of-fit test X 2 (chi-square) in relation to simple hypotheses (the parameters of the theoretical distribution are assumed to be known). Then -, when only the shape of the distribution is specified, and the parameters of this distribution and the value statistics X 2 are estimated / calculated based on the same sampling.

Note: In the English-language literature, the application procedure Pearson's goodness-of-fit test X 2 has a title The chi-square goodness of fit test.

Recall the hypothesis testing procedure:

  • based sampling the value is calculated statistics, which corresponds to the type of hypothesis being tested. For example, for is used t-statistics(if not known);
  • subject to truth null hypothesis, the distribution of this statistics is known and can be used to calculate probabilities (for example, for t-statistics this is );
  • calculated based on sampling meaning statistics compared with the critical value for the given value ();
  • null hypothesis reject if the value statistics more than critical (or if the probability of getting this value statistics() smaller significance level, which is an equivalent approach).

We will carry out hypothesis testing for different distributions.

Discrete case

Suppose two people are playing dice. Each player has a different set of dice. The players take turns rolling 3 dice at once. Each round is won by the one who rolls more sixes at a time. The results are recorded. After 100 rounds, one of the players suspected that his opponent's dice were asymmetrical, because he often wins (often throws sixes). He decided to analyze how likely such a number of opponent's outcomes are.

Note: Because dice 3, then you can roll 0 at a time; 1; 2 or 3 sixes, i.e. a random variable can take 4 values.

From the theory of probability, we know that if the cubes are symmetric, then the probability of getting sixes obeys. Therefore, after 100 rounds, the frequencies of sixes can be calculated using the formula
= BINOM.DIST (A7; 3; 1/6; FALSE) * 100

The formula assumes that in the cell A7 contains the corresponding number of dropped sixes in one round.

Note: Calculations are given in example file on the Discrete sheet.

For comparison observed(Observed) and theoretical frequencies(Expected) is convenient to use.

With a significant deviation of the observed frequencies from the theoretical distribution, null hypothesis on the distribution of a random variable according to a theoretical law, should be rejected. That is, if the opponent's dice are asymmetrical, then the observed frequencies will be "significantly different" from binomial distribution.

In our case, at first glance, the frequencies are quite close and it is difficult to draw an unambiguous conclusion without calculations. Applicable Pearson's goodness-of-fit test X 2, so that instead of the subjective statement "significantly differ", which can be made based on the comparison histograms, use a mathematically correct statement.

We use the fact that due to the law of large numbers observed frequency (Observed) with increasing volume sampling n tends to the probability corresponding to the theoretical law (in our case, binomial law). In our case, the sample size n is 100.

Introduce test statistics, which we denote X 2:

where O l is the observed frequency of events that the random variable has taken certain admissible values, E l is the corresponding theoretical frequency (Expected). L is the number of values ​​that a random variable can take (in our case, it is 4).

As can be seen from the formula, this statistics is a measure of the closeness of the observed frequencies to the theoretical ones, i.e. it can be used to estimate the "distances" between these frequencies. If the sum of these "distances" is "too large", then these frequencies are "significantly different." It is clear that if our cube is symmetric (i.e., we apply binomial law), then the probability that the sum of the "distances" will be "too great" will be small. To calculate this probability we need to know the distribution statistics X 2 ( statistics X 2 is calculated on the basis of a random sampling, therefore it is a random variable and, therefore, has its own probability distribution).

From a multidimensional analogue integral theorem of Moivre-Laplace it is known that for n-> ∞ our random variable X 2 asymptotically with L - 1 degrees of freedom.

So if the calculated value statistics X 2 (the sum of the "distances" between frequencies) will be greater than a certain limit value, then we will have reason to reject null hypothesis... As with checking parametric hypotheses, the limit value is set via significance level... If the probability that the statistic X 2 will take a value less than or equal to the calculated ( p-meaning) will be less significance level, then null hypothesis can be rejected.

In our case, the statistic is 22.757. The probability that the X 2 statistic will take a value greater than or equal to 22.757 is very small (0.000045) and can be calculated using the formulas
= CHI2.DIST.RF (22.757; 4-1) or
= CHI2.TEST (Observed; Expected)

Note: The CHI2.TEST () function is specially designed to test the connection between two categorical variables (see).

The probability of 0.000045 is significantly less than usual significance level 0.05. So, the player has every reason to suspect his opponent of dishonesty ( null hypothesis his honesty is rejected).

When applying criterion X 2 it is necessary to ensure that the volume sampling n was large enough, otherwise the approximation of the distribution statistics X 2... It is usually assumed that for this it is sufficient that the observed frequencies (Observed) are greater than 5. If this is not the case, then small frequencies are combined into one or join other frequencies, and the combined value is assigned the total probability and, accordingly, the number of degrees of freedom decreases X 2 -distributions.

In order to improve the quality of application criterion X 2(), it is necessary to decrease the partition intervals (increase L and, accordingly, increase the number degrees of freedom), however, this is hindered by the limitation on the number of observations that fall into each interval (b.b.> 5).

Continuous case

Pearson's goodness-of-fit test X 2 can be applied the same way in case.

Consider a certain sampling consisting of 200 values. Null hypothesis States that sample made from .

Note: Random values ​​in example file on worksheet Continuous generated with the formula = NORM.ST.OBR (RAND ())... Therefore, the new meanings sampling generated each time the sheet is recalculated.

Whether the available dataset is adequate can be visually assessed.

As can be seen from the diagram, the sample values ​​fit fairly well along the straight line. However, as in for hypothesis testing applicable Pearson's goodness-of-fit criterion X 2.

To do this, we divide the range of variation of the random variable into intervals with a step of 0.5. Let's calculate the observed and theoretical frequencies. The observed frequencies will be calculated using the FREQUENCY () function, and the theoretical ones - using the NORM.ST.DIST () function.

Note: As for discrete case, it is necessary to ensure that sample was large enough, and> 5 values ​​fell into the interval.

We calculate the statistics X 2 and compare it with the critical value for a given significance level(0.05). Because we have divided the range of variation of the random variable into 10 intervals, then the number of degrees of freedom is 9. The critical value can be calculated by the formula
= CHI2.OBR.PH (0.05; 9) or
= CHI2.OBR (1-0.05; 9)

In the diagram above, you can see that the statistic is 8.19, which is significantly higher criticalnull hypothesis is not rejected.

Below is shown where sample assumed an unlikely value and based on criterion Pearson Consent X 2 the null hypothesis was rejected (although the random values ​​were generated using the formula = NORM.ST.OBR (RAND ()) providing sampling from standard normal distribution).

Null hypothesis rejected, although visually the data is located quite close to a straight line.

As an example, also take sampling from U (-3; 3). In this case, even from the graph it is obvious that null hypothesis must be rejected.

Criterion Pearson Consent X 2 also confirms that null hypothesis must be rejected.

In the previous notes, procedures for testing hypotheses about numerical and categorical data were described:, several, and also, allowing you to study one or. In this note, we will consider methods for testing hypotheses about the differences between the shares of a feature in general populations based on several independent samples.

To illustrate the methods used, a scenario is used in which the degree of satisfaction of guests of hotels owned by TS Resort Properties is used. Imagine that you are the manager of a company that owns five hotels located on two resort islands. If guests are satisfied with the service, chances are they will come back next year and recommend their friends to stay at your hotel. To assess the quality of service, guests are asked to fill out a questionnaire and indicate whether they are satisfied with the hospitality. You need to analyze the survey data, determine the overall level of satisfaction with the requests of the guests, assess the likelihood that guests will come again next year, and also establish the reasons for the possible dissatisfaction of some customers. For example, on one of the islands, the company owns the Beachcomber and Windsurfer hotels. Is the service the same in these hotels? If not, how can this information be used to improve the quality of the company? Moreover, if some guests said that they will not come to you again, what reasons do they indicate more often than others? Can it be argued that these reasons relate only to a specific hotel and do not apply to the entire company as a whole?

The following notation is used here: X 1 - the number of successes in the first group, X 2 - the number of successes in the second group, n 1 X 1 - the number of failures in the first group, n 2 X 2 - the number of failures in the second group, X =X 1 + X 2 - the total number of successes, nX = (n 1 X 1 ) + (n 2 X 2 ) is the total number of failures, n 1 - the volume of the first sample, n 2 - the volume of the second sample, n = n 1 + n 2 - the total volume of samples. The table shown has two rows and two columns, so it is called a 2 × 2 factor table. The cells formed by the intersection of each row and column contain the number of successes or failures.

Let us illustrate the application of the contingency table using the example of the scenario described above. Suppose the question "Will you come back next year?" 163 of the 227 guests at the Beachcomber and 154 of the 262 guests at the Windsurfer said yes. Is there a statistically significant difference between hotel guest satisfaction (representing the likelihood that guests will return next year) if the significance level is 0.05?

Rice. 2. Factor table 2x2 for assessing the quality of service for guests

The first line indicates the number of guests of each hotel who have declared their desire to return next year (success); the second line contains the number of guests who expressed dissatisfaction (failure). The cells located in the "Total" column contain the total number of guests planning to return to the hotel next year, as well as the total number of guests who were dissatisfied with the service. The cells on the "Total" row contain the total number of guests surveyed for each hotel. The proportion of guests planning to return is calculated by dividing the number of guests who said so by the total number of guests surveyed for a given hotel. The χ 2 test is then used to compare the calculated shares.

To test null and alternative hypotheses H 0: p 1 = p 2; H 1: p 1 ≠ p 2 we use the test χ 2 -statistics.

Chi-square test for comparing two shares. The test χ 2 statistic is equal to the sum of the squares of the differences between the observed and expected number of successes divided by the expected number of successes in each cell of the table:

where f 0- the observed number of successes or failures in a particular cell of the contingency table, f e

The test χ 2 -statistics is approximated by the χ 2 -distribution with one degree of freedom.

Or failures in each cell of the contingency table, you need to understand their meaning. If the null hypothesis is true, i.e. the proportions of success in the two populations are equal, the sample proportions calculated for each of the two groups may differ from each other only for random reasons, and both proportions are an estimate of the overall parameter of the general population R... In this situation, statistics combining both shares in one overall (average) parameter estimate R , represents the total success rate in the pooled groups (i.e., equal to the total number of successes divided by the total sample size). Her addition, 1 – , represents the total failure rate in the combined groups. Using the designations, the meaning of which is described in the table in Fig. 1.You can derive formula (2) to calculate the parameter :

where - the average share of the feature.

To calculate the expected number of successes fe(i.e. the contents of the first row of the contingency table), it is necessary to multiply the sample size by the parameter ... To calculate the expected number of failures f e(i.e. the contents of the second row of the contingency table), it is necessary to multiply the sample size by the parameter 1 – .

Test statistics calculated by formula (1) are approximated by the χ 2 -distribution with one degree of freedom. At a given significance level α, the null hypothesis is rejected if the calculated χ 2 -statistics is greater than χ U 2, the upper critical value of the χ 2 -distribution with one degree of freedom. Thus, the decision rule is as follows: hypothesis H 0 is rejected if χ 2> χ U 2, otherwise the hypothesis H 0 does not deviate (Fig. 3).

Rice. 3. Critical area χ 2 -criterion for comparing the shares at the significance level α

If the null hypothesis is true, the calculated χ 2 statistic is close to zero, since the squared difference between the observed f 0 and expected fe quantities in each cell is very small. On the other hand, if the null hypothesis H 0 is false and there is a significant difference between the proportions of success in general populations, the calculated χ 2 -statistics should be large. This is due to the difference between the observed and expected number of successes or failures in each cell, which increases when squared. However, the contributions of the differences between the expected and observed values ​​to the overall χ 2 -statistics may be different. The same actual difference between f 0 and f e may have a greater impact on the χ 2 -statistics if the cell contains the results of a small number of observations than the difference corresponding to a larger number of observations.

In order to illustrate the χ 2 test for testing the hypothesis of equality of two fractions, let us return to the scenario described in the previous section, the results of which are shown in Fig. 2. The null hypothesis (H 0: p 1 = p 2) states that when comparing the quality of service in two hotels, the proportions of guests planning to return next year are practically the same. To estimate the parameter R, representing the proportion of guests planning to return to the hotel, if the null hypothesis is true, the value is used , which is calculated by the formula

The share of guests who remained dissatisfied with the service = 1 - 0.6483 = 0.3517. Multiplying those two shares by the number of Beachcomber guests surveyed, we get the expected number of guests planning to return next season, as well as the number of holidaymakers who will no longer stay at the hotel. The expected share of guests of the Windsurfer hotel is calculated in a similar way:

Yes - Beachcomber: = 0,6483, n 1 = 227, therefore f e = 147,16.
Yes - Windsurfer: = 0,6483, n 2 = 262, therefore f e = 169,84.
No - Beachcomber: 1 - = 0,3517, n 1 = 227, therefore f e = 79,84.
No - Windsurfer: 1 - = 0,3517, n 2 = 262, therefore f e = 92,16.

The calculations are shown in Fig. 4.

Rice. 4. χ 2 - statistics for hotels: (a) initial data; (b) 2x2 factorial table for comparing the observed ( f 0 ) and expected ( fe) the number of guests who are satisfied and not satisfied with the service; (c) calculating the χ 2 -statistics when comparing the proportion of guests satisfied with the service; (d) calculation of the critical value of the test χ 2 -statistics

To calculate the critical value of the test χ 2 -statistics, the function Excel = CHI2.OBR () is used. If the significance level is α = 0.05 (the probability substituted into the CHI2.OBR function is 1 –α), and the χ 2 -distribution for the 2 × 2 factorial table has one degree of freedom, the critical value of the χ 2 -statistics is 3.841. Since the calculated value of the χ 2 -statistics, equal to 9.053 (Fig. 4c), exceeds the number 3.841, the null hypothesis is rejected (Fig. 5).

Rice. 5. Determination of the critical value of the test χ 2 -statistics with one degree of freedom at the significance level α = 0.05

Probability R the fact that the null hypothesis is correct when the χ 2 -statistics is equal to 9.053 (and one degree of freedom) is calculated in Excel using the function = 1 - CHIS 2.DIST (9.053; 1; TRUE) = 0.0026. R- a value of 0.0026 is the probability that the difference between the sample shares of guests satisfied with the service at Beachcomber and Windsurfer is equal to or greater than 0.718 - 0.588 = 0.13, if in fact their shares in both populations are the same ... Thus, there is good reason to believe that there is a statistically significant difference in guest service between the two hotels. Research shows that the number of guests satisfied with the service at the Beachcomber is higher than the number of guests planning to stay at the Windsurfer again.

Testing the assumptions about the 2 × 2 factor table. To obtain accurate results based on the data in Table 2 × 2, it is necessary that the number of successes or failures is greater than 5. If this condition is not met, then the exact number should be applied. Fisher's test.

When comparing the percentage of customers satisfied with the quality of service in two hotels, the criteria Z and χ 2 lead to the same results. This can be explained by the existence of a close relationship between the standardized normal distribution and the χ 2 -distribution with one degree of freedom. In this case, the χ 2 statistic is always the square of the Z statistic. For example, when measuring guest satisfaction, we found that Z-statistics is +3.01, and χ 2 -statistics is 9.05. By neglecting round-off errors, it is easy to verify that the second value is the square of the first (i.e. 3.01 2 = 9.05). In addition, comparing the critical values ​​of both statistics at the significance level α = 0.05, we can find that the value of χ 1 2 equal to 3.841 is the square of the upper critical value of the Z-statistic equal to +1.96 (i.e., χ 1 2 = Z 2). Moreover, R- the values ​​of both criteria are the same.

Thus, it can be argued that when testing the null and alternative hypotheses H 0: p 1 = p 2; H 1: p 1 ≠ p 2 the criteria Z and χ 2 are equivalent. However, if it is necessary not only to find differences, but also to determine which proportion is greater (p 1> p 2), should apply a Z-test with one critical region bounded by the tail of the standardized normal distribution. Next, the application of the χ 2 criterion will be described for comparing the shares of a feature in several groups. It should be noted that the Z-criterion cannot be applied in this situation.

Application of the χ 2 test to test the hypothesis of equality of several parts

The chi-square test can be extended to a more general case and used to test the hypothesis that several fractions of a feature are equal. Let us denote the number of analyzed independent general populations by the letter with... Now the contingency table consists of two lines and with columns. To test null and alternative hypotheses H 0: p 1 = p 2 = … = p 2, H 1: Not all Rj equal to each other (j = 1, 2, …, c), test χ 2 -statistics is used:

where f 0- the observed number of successes or failures in a particular cell of the factor table 2 * with, fe- theoretical, or expected, number of successes or failures in a particular cell of the contingency table, provided that the null hypothesis is true.

To calculate the expected number of successes or failures in each cell of the contingency table, keep in mind the following. If the null hypothesis is true and the proportions of success in all populations are equal, the corresponding sample proportions may differ from each other only for random reasons, since all proportions are estimates of the proportion of a trait R in the general population. In this situation, statistics combining all shares in one overall (or average) parameter estimate R, contains more information than each of them separately. These statistics, denoted by the symbol , represents the total (or average) success rate in the pooled sample.

Calculation of the average share:

To calculate the expected number of successes f e in the first line of the contingency table, it is necessary to multiply the size of each sample by a parameter. To calculate the expected number of failures f e in the second row of the contingency table, it is necessary to multiply the size of each sample by the parameter 1 – ... Test statistics calculated by formula (1) are approximated by the χ 2 -distribution. The number of degrees of freedom of this distribution is given by the value (r - 1) (c – 1) , where r- the number of rows in the factor table, with- the number of columns in the table. For a factor table 2 * s the number of degrees of freedom is (2 - 1) (s - 1) = s - 1... At a given significance level α, the null hypothesis is rejected if the calculated χ 2 -statistics is greater than the upper critical value χ U 2 inherent in the χ 2 -distribution with s - 1 degrees of freedom. Thus, the decision rule is as follows: hypothesis H 0 is rejected if χ 2> χ U 2 (Fig. 6), otherwise the hypothesis is rejected.

Rice. 6. Critical area χ 2 -criterion for comparison with the share at the significance level α

Checking the assumptions regarding the factorial table 2 * c. To obtain accurate results based on the data given in the factorial table 2 * with, it is necessary that the number of successes or failures be large enough. Some statisticians believe that the test gives accurate results if the expected frequencies are greater than 0.5. More conservative researchers require that no more than 20% of the contingency table cells contain expected values ​​less than 5, and no cell should contain an expected value less than one. The latter condition seems to us to be a reasonable compromise between these extremes. To satisfy this condition, categories containing small expected values ​​should be combined into one. After that, the criterion becomes more precise. If, for some reason, it is not possible to combine multiple categories, alternative procedures should be followed.

In order to illustrate the χ 2 test for testing the hypothesis of equality of shares in several groups, we return to the scenario described at the beginning of the chapter. Consider a similar survey in which guests of three hotels owned by TS Resort Resources take part (Fig. 7a).

Rice. 7. Factor table 2 × 3 for comparing the number of guests satisfied and not satisfied with the service: (a) the observed number of successes or failures - f 0; (b) the expected number of successes or failures - fe; (c) calculating the χ 2 -statistics when comparing the proportions of guests satisfied with the service

The null hypothesis states that the proportion of customers planning to return next year is almost the same in all hotels. To estimate the parameter R, which is the proportion of guests planning to return to the hotel, the value is used R = NS /n= 513/700 = 0.733. The share of guests who were dissatisfied with the service is 1 - 0.733 = 0.267. Multiplying three shares by the number of guests surveyed in each hotel gives the expected number of guests planning to return next season, as well as the number of customers who will no longer stay at that hotel (Figure 7b).

To test the null and alternative hypotheses, test χ 2 -statistics are used, calculated using the expected and observed values ​​according to formula (1) (Fig. 7c).

The critical value of the test χ 2 -statistics is determined by the formula = CHI2.OBR (). Since guests of three hotels take part in the survey, the χ 2 -statistics has (2 - 1) (3 - 1) = 2 degrees of freedom. At the significance level α = 0.05, the critical value of the χ 2 -statistics is 5.991 (Fig. 7d). Since the calculated χ 2 -statistics equal to 40.236 exceeds the critical value, the null hypothesis is rejected (Fig. 8). On the other hand, the probability R the fact that the null hypothesis is correct at χ 2 -statistics equal to 40.236 (and two degrees of freedom) is calculated in Excel using the function = 1-CHI2.DIST () = 0.000 (Fig. 7d). R- the value is equal to 0.000 and less than the significance level α = 0.05. Hence, the null hypothesis is rejected.

Rice. 8. Areas of acceptance and rejection of the hypothesis about the equality of three fractions at a significance level of 0.05 and two degrees of freedom

Rejecting the null hypothesis when comparing the shares indicated in the factorial table 2 * with, we can only say that the proportions of guests satisfied with the service in the three hotels do not coincide. In order to find out which shares differ from others, it is necessary to apply other methods, for example, the Marasquilo procedure.

Marasquilo procedure allows you to compare all groups in pairs. At the first stage of the procedure, the differences p s j - p s j ’(where jj) between s (s - 1) / 2 in pairs of shares. The corresponding critical ranges are calculated using the formula:


At a general significance level of α, the value is the square root of the upper critical value of the chi-square distribution having s - 1 degrees of freedom. For each pair of sample fractions, it is necessary to calculate a separate critical range. At the last stage, each of s (s - 1) / 2 the pairs of beats are compared with the corresponding critical range. The shares forming a particular pair are considered statistically significantly different if the absolute difference of the sample shares | p s j - p s j | exceeds the critical range.

Let us illustrate Marasquilo's procedure using the example of a survey of guests of three hotels (Figure 9a). By applying the chi-square test, we have verified that there is a statistically significant difference between the proportions of guests of different hotels who intend to return next year. Since the survey involves guests of three hotels, it is necessary to perform 3 (3 - 1) / 2 = 3 pairwise comparisons and calculate the three critical ranges. To begin with, let's calculate three sample fractions (Fig. 9b). With a general significance level of 0.05, the upper critical value of the test χ 2 -statistics for the chi-square distribution having (s - 1) = 2 degrees of freedom is determined by the formula = CHI2. OBR (0.95; 2) = 5.991. So, = 2.448 (Fig.9c). Next, we calculate three pairs of absolute differences and the corresponding critical ranges. If the absolute difference is greater than its critical range, then the corresponding shares are considered significantly different (Fig. 9d).

Rice. 9. Results of the Marasquilo procedure for testing the hypothesis about the equality of the proportion of satisfied guests in three hotels: (a) survey data; (b) sample rates; (c) the upper critical value of the test χ 2 -statistics for the chi-square distribution; (d) three pairs of absolute differences and the corresponding critical ranges

As you can see, with a significance level of 0.05, the degree of satisfaction of the Palm Royal hotel guests (p s2 = 0.858) is higher than that of the guests of the Golden Palm (p s1 = 0.593) and Palm Princess (p s3 = 0.738) hotels. In addition, the satisfaction of the Palm Princess is higher than that of the Golden Palm. These results should lead management to analyze the reasons for these differences and try to determine why the satisfaction rate of guests of the Golden Palm hotel is significantly lower than that of guests of other hotels.

Used materials of the book Levin and other Statistics for managers. - M .: Williams, 2004 .-- p. 708-730