Ugrás a tartalomhoz

SOCIAL STATISTICS

Renáta Németh, Dávid Simon

ELTE

Linear regression

Linear regression

We have to minimise the distance between the points representing the data and the straight line we want to define.

One way to do that is the Least Squares Method: we minimise the square of the distance along the dependent variable.

We could use other methods, but this one is the most widespread.

The procedure of finding the straight line with the smallest square difference from the data is called linear regression.

Illustration:

year

rate of unemployment

(active % of economy)

crime rate

(for 100 thousand)

1999

7

5009

2000

6,4

4496

2001

5,7

4571

2002

5,8

4135

2003

5,9

4076

2004

6,1

4140

2005

7,2

4323

2006

7,5

4227

2007

7,4

4241

2008

7,8

4066

2009

10,0

3928

Source: KSH and Belügyminisztérium

The red circles are the data, the black line is the regression line, which was generated by minimising the square of the distance between the circles and the line measured against Axis y

The procedure can be described by the regression equation as follows:

=a+bx

where

a, b are the regression coefficients

is the regression estimate for the dependent variable

When giving a and b, we want to minimise the following:

9.1. egyenlet -


which can happen, if

9.2. egyenlet -


9.3. egyenlet -


where

is the covariance of the two variables (more on this later)

the variance of the independent variable

For unemployment and crime, we get the following:

a = 4848

b = - 79,62

Interpretation:

  • b means that increasing unemployment by 1 percentage point produces a 79.62 drop in the crime rate

  • a means that if the unemployment rate is 0, crime rate would be 4848 for 100 000

Note: the coefficients of linear regression are asymmetrical indices