Ugrás a tartalomhoz

SOCIAL STATISTICS

Renáta Németh, Dávid Simon

ELTE

PRE, proportional reduction of error

PRE, proportional reduction of error

having a mental medical condition

financial status

 

relatively bad

relatively good

total

yes

390 (97,5 %)

10 (2,5 %)

400 (100 %)

no

40 (6,7 %)

560 (93,3 %)

600 (100 %)

total

430 (43 %)

570 (57 %)

1000 (100 %)

 Using one of the illustrations from the previous lecture (where we considered mental health to be the independent variable and financial status to be the dependent variable), let’s guess the financial status of the individual respondents based on our knowledge of the distribution: 57% have relatively good, 43% have relatively worse financial status.

Let’s imagine that the respondents turn up one by one and we have to guess their financial status as accurately as possible. What’s the best way to do that?

having a mental medical condition

financial status

 

relatively bad

relatively good

total

yes

390 (97,5 %)

10 (2,5 %)

400 (100 %)

no

40 (6,7 %)

560 (93,3 %)

600 (100 %)

total

430 (43 %)

570 (57 %)

1000 (100 %)

Declareing each respondent to have a relatively good financial status is the safest way: thus we are wrong in 430 cases out of 1000.

How does the situation change if we already know Table 1 and we can ask each respondent whether or not they have a mental medical condition?

In this case we can improve the chances of our guesswork by categorizing everyone with a mental problem as having worse financial status, while those without mental problems as having better financial status. Thus the number of mistakes we make is down to 50.

In other words, the guessing error characterizes the relationship of the two variables. Associational indices that work on this principle are called ’proportional reduction of error’ (PRE) indices.

Calculating (λ) to get the connection of two nominal variables:

8.1. egyenlet -


Where:

E1 is the number of categorising mistakes made without considering the independent variable

E2 is the number of categorising mistakes made considering the independent variable

having a mental medical condition

financial status

 

relatively bad

relatively good

total

yes

390 (97,5 %)

10 (2,5 %)

400 (100 %)

no

40 (6,7 %)

560 (93,3 %)

600 (100 %)

total

430 (43 %)

570 (57 %)

1000 (100 %)

in this specific case:

8.2. egyenlet -


Lambda’s characteristics

Let’s assume that mental health is the dependent variable and financial status is the independent one (also assuming that being rich drives you crazy).

In this case lambda is calculated thus:

8.3. egyenlet -


That is, lambda depends on which variable is the dependent and which the independent one. These associational indices are called assymmetric indices.

Two versions of the above table:

having a mental medical condition

financial status

 

relatively bad

relatively good

total

yes

200 (45,5 %)

240 (54,5 %)

440 (100 %)

no

230 (41,1 %)

330 (58,9 %)

560 (100 %)

total

430 (43 %)

570 (57 %)

1000 (100 %)

having a mental medical condition

financial status

 

relatively bad

relatively good

total

yes

189 (43 %)

251 (57 %)

440 (100 %)

no

241 (43 %)

319 (57 %)

560 (100 %)

total

430 (43 %)

570 (57 %)

1000 (100 %)

Table 3

While first table showed (see previous lecture) that there was connection between the two variables, second table shows that the two are completely independent.

Let’s calculate lambda for both.

Without knowing the independent variable the number of categorization mistakes is again 430. However, if we consider the independent variable, it will not help us make fewer mistakes in either case.

E1 = E2 = 430

8.4. egyenlet -


It can be seen that if the variables are independent, lamba is 0 in all cases, yet if lambda=0, it doesn’t automatically mean that the two variables are independent.

Note: This method should not be used if there is less than 5% difference between the distributions that go with the specific values of the independent variable.

Summary:

λ’s characteristics:

  • asymmetric

  • it’s between 0-1

  • if the variables are independent, it’s always 0 (but it can be 0 in other cases as well)