-->

# Simple Linear Regression Analysis - Definition, How to Create a regression formula, Explanation, Example

## Definition of Regression Analysis

A group of data in which contains a numerical variable, where the description of the variable data is obtained by using descriptive statistics and inference to make estimates and conclusions about numerical variables obtained by using various kinds of inductive statistical methods.
If you are going to discuss two or more numerical variables, including the relationship between variables, two calculation techniques are used, namely Regression and Correlation. In the regression analysis, an estimating equation is a mathematical formula that looks for the value of the dependent variable and the known independent variable value.

## The Purpose of Regression and Correlation Analysis

Regression analysis is used primarily for forecasting purposes, where in the model there is a dependent variable (dependent / influenced) and independent variables (free / influencing). For example, there are two variables, namely income and net income. In practice, the relationship between the two variables will be discussed by looking at the effect of income on net income. This means that there are dependent variables, namely net income and independent variables, namely income.
To measure the closeness of the relationship or the effect of independent variables on non-independent variables, the Correlation Method is used. While the regression method will discuss predictions (forecasting), in this case whether net income in the future can be predicted if the income is known.
Regression is often distinguished between simple regression and multiple regression. Simple regression is called if there is only one independent variable, while it is called Multiple Regression if there are more than one independent variable.
If the relationship between the two variables can be expressed in the form of a mathematical formula, then we can use it for forecasting purposes. Mathematical equations that allow us to predict the values ​​of a dependent of the values ​​of one or more independent variables are called regression equations. This term was first introduced by Sir Francis Galtoon (1822-1911).

## Regression Equation

In this article we will discuss the problem of predicting or forecasting the value of the dependent variable (Y) based on one independent variable (X). A random sample of size n from a population given a notas {(xi, yi); i = 1, 2, …, n}.  The data is then plotted to produce what is called a scatter diagram. If the points follow a straight line, then this shows that the two variables are linearly related. If a linear relationship exists and is expressed mathematically with a straight line equation, the linear regression equation can be expressed by:

Note:

a = express intercept or intersection with the upright axis. (Constant).

= distinguishes between the estimated value produced by the regression line and the actual y observed value for a given x value.

## Create Intersep (a) and Slope Formulas (b) from the Regression Equation

Next, we will be faced with the problem of how to obtain a formula to determine the estimated value of points a and b based on the sample. For this reason, a Least Squares procedure is used. Among all possible straight lines that can be made on scatter diagrams, the least squares method will choose a regression line that makes the number of vertical squares from the observation points to the regression line as small as possible.
If ei declares a vertical deviation from the i-point to the regression line, then the dikecl quadratic method will produce a formula to calculate a and b so that the sum of all deviations is called the sum of squares around the regression line, and SSE notation. So, if we are given a cluster paired data {(xi, yi); i = 1, 2, …, n}. Then we must determine a and b so as to minimize:

### The eligibility requirements that must be fulfilled when we use simple linear regression are:

1. The number of samples used must be the same.
2. The number of independent variables (X) is 1 (one).
3. Residual values ​​must be normally distributed.
4. There is a linear relationship between the independent variable (X) and the dependent variable (Y).
5. There are no symptoms of heteroscedasticity.
6. There are no symptoms of autocorrelation (for time series data).
Partial Derivation:
Before doing the next step, we must understand the properties of Sigma Notation below,
To get the value of the smallest number of errors, the coefficients a and b must be specified so that the partial derivative of the least square of a and b is zero, so it can be written:
Then the equation becomes,
Two equations are obtained to find the a and b values ​​we need, namely:
ind the values ​​a and b using the substitution method and elimination in the two equations above, it can be obtained that

#  The best gradient for b is
#  The best gradient for a is
Subtitles b to equation (3) then the result is