Bowerman, Bruce L.,Richard T. Connell and Michael L. Hand. Newbury Park, California: Sage. Publishing Inc., 2001. Social Statistics in. Practice.Second Edition ...
Learning to Solve Least Squares Curve Method by Using Spreadsheets by Sami M. Khayat, PhD Craig N. Refugio, PhD Negros Oriental State University, Dumaguete City, Philippines Abstract Least Squares Curve/Line is a mathematical procedure for finding the best fitting curve to a given set of points by minimizing the sum of the squares of the “offsets” or "the residuals" of the points from the curve/line. The sum of the squares of the offsets is used instead of the offsets’ absolute values because this allows the residuals to be treated as a continuous differentiable quantity. However, because squares of the offsets are used, outlying points can have a disproportionate effect on the fit, a property that may or may not be desirable depending on the problem at hand. In this paper, all of the aforementioned terms are calculated using spreadsheets and the procedures are emphasized on a step by step manner.
INTRODUCTION This study was conducted to a group of 25 Bachelor of Secondary Education students major in Mathematics school year 2011-2012, College of Education, Main Campus 1, Negros Oriental State University, Philippines using a one group pretest-posttest design. The study aimed to use spreadsheets to solve least squares and determine if students would gain significant knowledge in least squares upon using spreadsheets. We conceptualized and developed teaching least squares through spreadsheets into 5 parts:linear, exponential, power law, logarithmic and applications. In discussing the different parts, the manual computations using the different rules were emphasized first before using spreadsheets so that students would really appreciate the “software” in doing the laborious and often repetitivetasks.
LEAST SQUARES FITTING Least squares fitting is a mathematical procedure for finding the best fitting curve to a given set of points by minimizing the sum of the squares of the offsets ("the residuals") of the points from the curve. The sum of the squares of the offsets is used instead of the offsets’ absolute values because this allows the residuals to be treated as a continuous differentiable quantity. However, because squares of the offsets are used, outlying points can have a disproportionate effect on the fit, a property that may or may not be desirable depending on the problem at hand. The following are the different types of least squares fitting:
I.
Least Squares Fitting - Linear y-axis
x-axis Figure 1
y=a + bx N ∑ x
∑ x a = ∑ y ∑ x b ∑ xy 2
(1)
or
a.N + b.∑ x = ∑ y
(2)
a.∑ x + b.∑ x 2 = ∑ xy
(3)
Where N is the total numbers of point Solve equations 2 & 3 to obtain the value of the constants a & b. Note: the values of x and y are given. Example 1 Given the following points on the plane, find the best line using Least Squares Fitting X= 10 20 30 40 50 60 Y=
5
10 15 20 25 30
By using MS-Excel, we can obtain the value of the intercept (a) equal to zero, and the value of the slope (b) equal to 0.5. The tablesand figures below show the functions.
Figure 2
II.
Least Squares Fitting – Exponential
Figure 3
To fit a functional form 𝑦𝑦 = 𝑎𝑎𝑒𝑒 𝑏𝑏.𝑥𝑥 (4) take the logarithm of both sides 𝑙𝑙𝑙𝑙 y = 𝑙𝑙𝑙𝑙 𝑎𝑎 + 𝑏𝑏 . 𝑥𝑥 (5) The best-fit values are then
N ∑ x
∑ x A = ∑ ln y ∑ x B ∑ x.ln y 2
Solving for a and b,
A.N + B.∑ x = ∑ ln y
(6)
A.∑ x + B.∑ x 2 = ∑ x. ln y
(7)
Where 𝑏𝑏 ≡ 𝐵𝐵 𝑎𝑎𝑎𝑎𝑎𝑎 𝑎𝑎 ≡ exp (𝐴𝐴)
Example 2. Least Squares Fitting- Exponential Given
x y
10
20
30
40
5.437
14.778
40.171
109.196
𝑦𝑦 = 𝑎𝑎𝑒𝑒
50
60
296.826
806.858
𝑏𝑏.𝑥𝑥
ln(y)
x2
x.ln(y)
ycal=aeb*x
5.437
1.6932
100
16.93
5.43656366
20
14.778
2.6931
400
53.86
14.7781122
3
30
40.171
3.6931
900
110.79
40.1710738
4
40
109.196
4.6931
1600
187.73
109.1963
5
50
296.826
5.6931
2500
284.66
296.826318
6
60
806.858
6.6931
3600
401.59
806.857587
∑
210
1273.266
25.1589
9100
1055.56
n
x
y
1
10
2
Answer:
ln(a)=
0.693147
Note: the values of x and y are given.
a=
2
b=
0.1
III.
Least Squares Fitting - Power Law
Given a function of the form 𝑦𝑦 = 𝑎𝑎𝑥𝑥 𝑏𝑏
𝑙𝑙𝑙𝑙(𝑦𝑦) = 𝑙𝑙𝑙𝑙(𝑎𝑎 ) + 𝑏𝑏. 𝑙𝑙𝑙𝑙(𝑥𝑥) Least square fitting gives the coefficients as
N ∑ ln x
(8) (9)
∑ ln x A = ∑ ln y ∑ (ln x) B ∑ ln x.ln y 2
Solving for a and b,
A.N + B.∑ ln x = ∑ ln y
A.∑ ln x + B.∑ ln( x) 2 = ∑ ln x. ln y
Where 𝑏𝑏 ≡ 𝐵𝐵 𝑎𝑎𝑎𝑎𝑎𝑎 𝑎𝑎 ≡ exp (𝐴𝐴)
(10) (11)
Example 3. Least Squares Fitting-Power Law
x y
1 3
2 12
3 27
4 48
5 75
6 108
n
x
y
ln(y)
ln(x)
ln(x).ln(y)
ln(x).ln(x)
ycal=a. xb
1
1
3.00
1.0986
0.0000
0.0000
0.0000
3.00
2
2
12.00
2.4849
0.6931
1.7224
0.4805
12.00
3
3
27.00
3.2958
1.0986
3.6208
1.2069
27.00
4
4
48.00
3.8712
1.3863
5.3666
1.9218
48.00
5
5
75.00
4.3175
1.6094
6.9487
2.5903
75.00
6
6
108.00
4.6821
1.7918
8.3893
3.2104
108.00
∑
21
273.00
19.7502
6.5793
26.0479
9.4099
Answer: ln(a)=
1.098612
Note : the values of x and y are given. Solving using MS Excel:
a=
3
b=
2
IV.
Least Squares Fitting—Logarithmic
Given a function of the form
(12)
𝑦𝑦 = 𝑎𝑎 + 𝑏𝑏. 𝑙𝑙𝑙𝑙(𝑥𝑥) thecoefficients can be found from least squares fitting as
N ∑ ln x
∑ ln x ∑ (ln x)
2
A ∑ y = B ∑ y. ln x
Solving for a and b,
A.N + B.∑ ln x = ∑ y
(13)
A.∑ ln x + B.∑ ln( x) = ∑ y. ln x 2
(14)
Where 𝑏𝑏 ≡ 𝐵𝐵 𝑎𝑎𝑎𝑎𝑎𝑎 𝑎𝑎 ≡ 𝐴𝐴
Example 4.Least Squares Fitting-Logarithmic𝑦𝑦 = 𝑎𝑎 + 𝑏𝑏. 𝑙𝑙𝑙𝑙(𝑥𝑥) n
x
y
ln(x)
y. ln(x)
ln(x).ln(x)
ycal=a + b* ln(x)
1
10
9.6052
2.3026
22.1168
5.3019
9.6052
2
20
10.9915
2.9957
32.9276
8.9744
10.9915
3
30
11.8024
3.4012
40.1423
11.5681
11.8024
4
40
12.3778
3.6889
45.6602
13.6078
12.3778
5
50
12.8240
3.9120
50.1678
15.3039
12.8240
6
60
13.1887
4.0943
53.9991
16.7637
13.1887
∑
210
70.7896
20.3948
245.0138
71.5199
Answer: a=5 and b=2
Solving using MS Excel:
Note: the value of x and y were given.
V.
Application Example from Chemistry: First- Order Reaction The following data are collected for first-order chemical reaction at constant temperature. n 1 2 3 4 5 6 7 8 9 10
Time (min) 0 1 2 3 4 10 15 20 25 30
[A]t 2.719 2.612 2.586 2.509 2.459 2.138 1.855 1.664 1.448 1.276
This example for first-order reaction. [𝐴𝐴]𝑡𝑡 = [𝐴𝐴]0 . 𝑒𝑒 −𝑘𝑘𝑘𝑘
(15)
This shows that the amount at any time t follows a negative exponential function of time. The initial amount, [A]0, is constant for a given experiment. Negative exponential function has the characteristic of having a maximum value at the variable t=zero, and declining monotonically and asymptotically toward zero. Figure below shows the general trend for [A]t over time. The speed with which the amount [A]t approaches zero is dictated by the rate constant k. Another way to rewrite equation (15) is: [𝐴𝐴]𝑡𝑡 = ln[𝐴𝐴]0 – k.t ln
(16)
1.2000 1.0000
ln[A]t
0.8000 0.6000
ln[A]t
0.4000
Linear (ln[A]t)
0.2000 0.0000 0
10
20
30
40
Time (Min.)
The plot above shows the first-order reaction whose slope is –k. By using MS-Excel we can obtain the slope= -0.02496.
RESULTS Before starting the formal instruction of the course, a Likert-type (0-Has No Knowledge, 1-Has Basic Knowledge, and 3-Has Advanced Knowledge) pretest that contains 10 items was conducted to find out if the students had prior knowledge of the five aforementioned parts of the designed course. At the end of the course, a post-test (items were the same to pretest but randomly reordered) was then administered. The pretest and posttest were designed in such a way that students answered them in written form and their respective answers were counter checked during the hands on activities for the pretest and posttest. We matched their written ratings and the ratings that we gave during the pretest and posttest hands on activities. Results indicated perfect matching between the two types of ratings. Table 1.0 shows the knowledge levels of the 25 subjects of this study.
Type of Test
Table 1.0Pretest and PosttestKnowledgeLevels inLeast Squares Curve through Spreadsheets n=25 Mean Standard Deviation Description
Pretest
0.72
0.46
Has Basic Knowledge
Posttest
1.92
0.28
Has Advanced knowledge
Legend: 0.00-0.66 0.67-1.33 1.34-2.00
Has No Knowledge Has Basic Knowledge Has Advanced knowledge
As reflected in table 1.0, the pretest mean score (0.72) disclosed that the 25 subjects of the study had basic knowledge of the five different parts of the course from linear to its applications. This is being substantiated when we let the students browse and operate least squares using spreadsheets in accordance with the five parts of the course. Seventeen out of 25 manifested/demonstrated basic knowledge and the remaining 8 had no knowledge at all. However, the post test mean score (1.92) revealed that at the end of the course, students’ had gained advanced knowledge. This is further substantiated when we let the students perform tasks according to the five parts of the course. Twenty three out of 25 manifested/demonstrated advanced knowledge while the remaining two showed basic knowledge. . The standard deviations showed that pretest scores were more variable that the posttest scores. To determine the significance of the mean difference between the pretest and posttest mean scores, a dependent t-test was performed and the results are shown in table 2.0.
Table 2.0 Test of Difference Between the Pretest and PosttestKnowledgeLevel inLeast Squares Curve through Spreadsheets n=25
Type of Test
n
Mean
Pretest
25
0.72
Posttest
25
1.92
Mean
Standard Deviation
Computed
Degrees of Freedom
pvalue at α=0.05
Interpretation
24
0.000
Significant
t
Differenc e 0.46 -1.20
0.28
-14.70
A dependent t-test comparing the mean scores of the pretest and posttest found a significant difference between the means of the two groups (t(24) =-14.70, p < 0.05). The mean of the pretest was significantly lower (m=0.72, sd=0.46) than the mean of the posttest (m=1.92, sd=0.28). This means that the post-test disclosed that the 25 subjects gained significant knowledge in least squares curvethrough spreadsheetsat 5% level of significance from basic to advanced knowledge level.
REFERENCES Aiken, Lendon S. and Susane G. West.Multiple Regression. Newbury Park, California: Sage Publishing Inc., 2001. Berry, Wees D. Understanding Regression Assumptions. Newbury Park, California: Sage Publishing Inc., 2001. Bowerman, Bruce L.,Richard T. Connell and Michael L. Hand. Social Statistics in Practice.Second Edition.USA: McGraw Hill, Inc., 2001. Ferguson, George E. and Yoshio A. Takane.Statistical Analysis in Psychology and Education.Sixth Edition.USA: McGraw Hill Inc.,1989. Fox, John Q. Regression Diagnostics Newbury Park, California: Sage Publishing Inc., 2001.