Simple Linear Regression Analysis - Definition, How to Create a regression formula, Explanation, Example
Definition of Regression Analysis
A
group of data in which contains a numerical variable, where the description of
the variable data is obtained by using descriptive statistics and inference to
make estimates and conclusions about numerical variables obtained by using
various kinds of inductive statistical methods.
If you
are going to discuss two or more numerical variables, including the
relationship between variables, two calculation techniques are used, namely Regression
and Correlation. In the regression analysis, an estimating equation
is a mathematical formula that looks for the value of the dependent variable
and the known independent variable value.
The Purpose of Regression and Correlation Analysis
Regression analysis is used primarily for forecasting purposes,
where in the model there is a dependent variable (dependent / influenced)
and independent variables (free / influencing).
For example, there are two variables, namely income and net
income. In practice, the relationship between the two variables will be
discussed by looking at the effect of income on net income. This means that
there are dependent variables, namely net income and independent variables,
namely income.
To measure the closeness of the relationship or the effect of
independent variables on non-independent variables, the Correlation Method is
used. While the regression method will discuss predictions (forecasting), in
this case whether net income in the future can be predicted if the income is
known.
Regression
is often distinguished between simple regression and multiple
regression. Simple regression is called if there is only one independent
variable, while it is called Multiple Regression if there are more than one
independent variable.
If the
relationship between the two variables can be expressed in the form of a
mathematical formula, then we can use it for forecasting purposes. Mathematical
equations that allow us to predict the values of a dependent of the values
of one or more independent variables are called regression equations. This
term was first introduced by Sir Francis Galtoon (1822-1911).
Regression Equation
In
this article we will discuss the problem of predicting or forecasting the value
of the dependent variable (Y) based on one independent variable (X). A random
sample of size n from a population given a notas {(xi, yi); i = 1, 2, …, n}. The data is then plotted to produce what is called a scatter diagram. If the
points follow a straight line, then this shows that the two variables are
linearly related. If a linear relationship exists and is expressed
mathematically with a straight line equation, the linear regression equation
can be expressed by:
Note:
a
= express intercept or intersection with the upright axis. (Constant).
b
= slope or gradient.
Create Intersep (a) and Slope Formulas (b) from the Regression Equation
Next, we will be faced with the problem of how to obtain a formula to determine the estimated value of points a and b based on the sample. For this reason, a Least Squares procedure is used. Among all possible straight lines that can be made on scatter diagrams, the least squares method will choose a regression line that makes the number of vertical squares from the observation points to the regression line as small as possible.
If
ei declares a vertical deviation from the i-point to the
regression line, then the dikecl quadratic method will produce a formula
to calculate a and b so that the sum of all deviations is called
the sum of squares around the regression line, and SSE notation.
So, if we are given a cluster paired data {(xi, yi); i = 1, 2, …, n}. Then we must determine a
and b so as to minimize:
The eligibility requirements that must be fulfilled when we use simple linear regression are:
- The number of samples used must be the same.
- The number of independent variables (X) is 1 (one).
- Residual values must be normally distributed.
- There is a linear relationship between the independent variable (X) and the dependent variable (Y).
- There are no symptoms of heteroscedasticity.
- There are no symptoms of autocorrelation (for time series data).
Partial Derivation:
Before doing the next step, we must understand the properties of Sigma Notation below,
To get the value of the smallest number of errors, the coefficients a and b must be specified so that the partial derivative of the least square of a and b is zero, so it can be written:
Then the equation becomes,
Two equations are obtained to find the a and b values we need, namely:
ind the values a and b using the substitution method and elimination in the two equations above, it can be obtained that
# The best gradient for b is
# The best gradient for a is
Before doing the next step, we must understand the properties of Sigma Notation below,
To get the value of the smallest number of errors, the coefficients a and b must be specified so that the partial derivative of the least square of a and b is zero, so it can be written:
Then the equation becomes,
Two equations are obtained to find the a and b values we need, namely:
ind the values a and b using the substitution method and elimination in the two equations above, it can be obtained that
# The best gradient for b is
# The best gradient for a is
Subtitles b to equation (3) then the result is
SUBSCRIBE TO OUR NEWSLETTER
0 Response to "Simple Linear Regression Analysis - Definition, How to Create a regression formula, Explanation, Example"
Post a Comment