Renáta Németh, Dávid Simon

ELTE

We have to minimise the distance between the points representing the data and the straight line we want to define.

One way to do that is the Least Squares Method: we minimise the square of the distance along the dependent variable.

We could use other methods, but this one is the most widespread.

The procedure of finding the straight line with the smallest square difference from the data is called linear regression.

Illustration:

year |
rate of unemployment (active % of economy) |
crime rate (for 100 thousand) |

1999 |
7 |
5009 |

2000 |
6,4 |
4496 |

2001 |
5,7 |
4571 |

2002 |
5,8 |
4135 |

2003 |
5,9 |
4076 |

2004 |
6,1 |
4140 |

2005 |
7,2 |
4323 |

2006 |
7,5 |
4227 |

2007 |
7,4 |
4241 |

2008 |
7,8 |
4066 |

2009 |
10,0 |
3928 |

Source: KSH and Belügyminisztérium

The red circles are the data, the black line is the regression line, which was generated by minimising the square of the distance between the circles and the line measured against Axis y

The procedure can be described by the regression equation as follows:

=a+bx

where

a, b are the regression coefficients

is the regression estimate for the dependent variable

When giving *a* and *b*, we want to minimise
the following:

which can happen, if

where

is the covariance of the two variables (more on this later)

the variance of the independent variable

For unemployment and crime, we get the following:

a = 4848

b = - 79,62

Interpretation:

b means that increasing unemployment by 1 percentage point produces a 79.62 drop in the crime rate

a means that if the unemployment rate is 0, crime rate would be 4848 for 100 000

**Note**: the coefficients of linear regression are
asymmetrical indices