I missed where the base value comes from. How is that determined?
I need to make a video on how MARS works. A picture or graph would be useful.
But let me try to explain here in words. The method I describe is not perfectly reflective of what MARS does but is a good way to begin understanding how it works - to be refined by you later on:
I will kind of assume you know what Sum of Squares is: It is the sum of the square of the differences between an estimated variable value (e.g. GLA contribution to sale price) and the actual sale price (or equivalent). The square is used because we want negative and positive differences to be treated equally. So all such squares are added for all GLA values (in this case). This is a common measure for how well a graph estimates a series of data values.
Steps:
1. Through correlation (more than this is involved), MARS decides which variables are significant contributors to the sale price. It will rank them most to least significant contributors to value based on correlation. And, from the correlation it can estimate what percentage of sale price each variable contributes. So it can use that percentage as a target variable in creating a model equation (or estimator) for each variable.
2. For each of the feature variables, starting with the most important, it will:
a) Create a linear equation with a "good" start point on the left to a "good" endpoint on the right. This would be a straight-line graph.
b) Then it will incrementally find a best midpoint or more appropriately "breakpoint" or "knot" to divide that line into separate but connected lines that best describe the sales values. It will do this by starting on the left and incrementally increasing the point by some value until it gets to the end of the interval (let's say lowest GLA to highest GLA for the GLA variable). For each of these midpoint X values, it will try different Y values, and choose the best Y by Sum of Squares. It will calculate the Sum of Squares for each midpoint value. The feature midpoint value with the lowest Sum of Squares will be chosen as the new breakpoint, or in MARS terms "Knot". This will be a graph that is two connected line segments. It will then do the same with each line segment - until there are no data points (GLA's) left. This is called the "forward pass". We will end up with a number of connected line segments.
c) Then it will reverse the process through what is known as a "backward pass" - removing the points it just created by finding the ones that have the least impact on the Sum of Squares. That is to say - if removing some point makes only a very minimal change to the Sum of Squares then why keep it and complicate things? But note, we want simplicity not only for the sake of simplicity - but also, so that our model will be more likely to create the same degree of accuracy on new data. We call this "robustness" - we want a model that works well not only on our current set of data - but that would also work well on different sets of data - such as new sales. This means for reasons I won't get into, that we want a general model that is not too detailed with respect to our current set of data. We generalize to create that desired "robust" model.
d) We do the above for all variables. We will likely do this hundreds, thousands or even hundreds of thousands of times trying different starting configurations and parameters. In the end, we will choose the best model.
3. Ok. Each feature linear equation has the form Y= a + bx, e.g. Sale_Price_GLA_Contiribution_Value = $200K + (GLA -500) * $150. That $200K is the constant. Now we do this for all significant variables that wind up going into the model. We then add all of their constants to create the "Base Value."
.