# Discrete Least Squares

Suppose you are given the data points x = {x0,x1,x2,......xn} ^T and the function values f= {f0, f1,f2,........fn}^T, where xi > 0 for all i = 0,1,2,......n

a) For some reason, you think that h(x)= a + b*x + c*e^(arccos(x)) + d*sin(cos(T23(x))), where T23(x) is the 23rd degree Chebyshev polynomial is a great model for the data set. Find the normal equations whose solution define the best fit, in the least squares sense, for this model

b) It turns out (in this fictional setting) the basis functions {phi 0(x),phi 1(x), phi 2(x), phi 3(x)}= {1,x,e^(arccos(x)), sin(cos(T23(x))} are orthogonal with respect to summation over the nodes x against the weight functions w(x)=1. This should help you express the coefficients {a,b,c,d} in a simpler way than in part (a)

#### Solution Preview

Please see attachment for properly formatted copy.

a) The equations we would like to solve are

[h(x_i)=f_i,text{ for all}i=0,dots,n]

or

[a + bx_i + ce^{arccos(x_i)} + dsin(cos(T_{23}(x_i)))=f_i,]

but there may not be solution due to the system being overdetermined.

Let A be the matrix whose ith row is (1,x_i,e^{arccos(x_i)},sin(cos(T_{23}(x_i)))) (where we start to count from

i=0 to i=n). Note that the jth column of A is the vector (phi_{j+1}(x_0),dots,phi_{j+1}(x_n))^T, j=0,1,2,3.

If we call x=(a,b,c,d), b=(f_0,dots,f_n) (also called f above), then we want to

solve

[Ax=b.]

The above linear system may mot have a solution as it can be overdetermined (if n>= 4).

The least square method finds hat{mathbf{x}} such that the residual mathbf{r}:=mathbf{b}-Amathbf{x} has the minimum norm

norma{r}=sqrt{r_0^2+dots+r_n^2}. The normal equations in this general setting are

[(A^TA)hat{mathbf{x}}=A^Tmathbf{b}.]

(See below for a derivation of the normal equations in general setting).

In components:

[sum_{i=0}^{n}sum_{k=1}^{4} A_{ij}A_{ik}hat{mathbf{x}}_k=sum_{i=0}^{n} A_{ij}b_i (j=1,dots, 4). ]

In this particular case

[A_{i,1}=1, A_{i,2}=x_i,A_{i,3}=e^{arccos(x_i)}, A_{i,4}=sin(cos(T_{23}(x_i))),]

and b_i=f_i,

so the normal equations are

[sum_{i=0}^{n}sum_{k=1}^{4} A_{ij}A_{ik}hat{mathbf{x}}_k=sum_{i=0}^{n} A_{ij}f_i (j=1,dots, 4), ]

where the unknown is the vector

hat{mathbf{x}}=(hat{mathbf{x}}_1,hat{mathbf{x}}_2,hat{mathbf{x}}_3,hat{mathbf{x}}_4)^T.

Let us look at the explicit form of the previous equations. The right hand side (RHS) first

For j=1 RHS is: sum_{i=0}^{n} A_{i1}f_i=f_0+f_1+dots+f_n

For j=2 RHS is: sum_{i=0}^{n} A_{i2}f_i=x_0f_0+x_1f_1+dots+x_nf_n

For j=3 RHS is: sum_{i=0}^{n} ...

#### Solution Summary

Given a data and a model for the data depending on four unknown coefficients, we find expressions for the unknown coefficients that are the best fit, in the least squares sense, for the model.

Then, using the assumption of orthogonality with respect to summation over the nodes of the data, we find a simpler forms for the expressions obtained before.