Ugrás a tartalomhoz

## SOCIAL STATISTICS

Renáta Németh, Dávid Simon

ELTE

Linear regression

## Linear regression

We have to minimise the distance between the points representing the data and the straight line we want to define.

One way to do that is the Least Squares Method: we minimise the square of the distance along the dependent variable.

We could use other methods, but this one is the most widespread.

The procedure of finding the straight line with the smallest square difference from the data is called linear regression.

Illustration:

 year rate of unemployment (active % of economy) crime rate (for 100 thousand) 1999 7 5009 2000 6,4 4496 2001 5,7 4571 2002 5,8 4135 2003 5,9 4076 2004 6,1 4140 2005 7,2 4323 2006 7,5 4227 2007 7,4 4241 2008 7,8 4066 2009 10,0 3928

Source: KSH and Belügyminisztérium The red circles are the data, the black line is the regression line, which was generated by minimising the square of the distance between the circles and the line measured against Axis y

The procedure can be described by the regression equation as follows: =a+bx

where

a, b are the regression coefficients is the regression estimate for the dependent variable

When giving a and b, we want to minimise the following:

9.1. egyenlet - which can happen, if

9.2. egyenlet - 9.3. egyenlet - where is the covariance of the two variables (more on this later) the variance of the independent variable

For unemployment and crime, we get the following:

a = 4848

b = - 79,62

Interpretation:

• b means that increasing unemployment by 1 percentage point produces a 79.62 drop in the crime rate

• a means that if the unemployment rate is 0, crime rate would be 4848 for 100 000

Note: the coefficients of linear regression are asymmetrical indices