• Welcome to AppraisersForum.com, the premier online  community for the discussion of real estate appraisal. Register a free account to be able to post and unlock additional forums and features.

Appraisal Statistics

Status
Not open for further replies.

RCA

Elite Member
Gold Supporting Member
Joined
Jun 27, 2017
Professional Status
Certified General Appraiser
State
California
Isn't about time we have a separate forum group for "Appraisal Statistics"?

1. And, it is time for me to say something about Appraisal Statistics as taught by the Appraisal Institute and others. The discipline has done a lot of harm to the profession!. The typical book in appraisal statistics is not written by a statistician or mathematician. It is most likely written by someone with a Ph.D. in Real Estate from some lower level state university. They know what they know, but don't really understand it to any depth. - What they don't know is appalling.

2. The problem is that house values should not be treated like typical "parametric" statistical distributions. Parametric distributions are those that we can assume follow some kind of regular pattern like a normal distribution. For example population IQs. Statisticians discovered a long time ago that certain things like intelligence and semi-random events like the distribution of bullet shots around a target by a given shooter, fit certain kinds of distributions. They would study these distributions and discover certain parameters that could be used for making powerful predictions. Typically in such distributions we do not understand the why and wherefore of what causes the distribution, it is just an observation that we know follows a certain pattern.

3. But house prices are different. We know the structure of houses. We know the assessor data and we can study the internal structure to great depths. We also know that certain characteristics of homes have an effect on value. The kind of "statistics" we thus employ are quite different. For example, a neighborhood may be a census tract that is 30% type A, 10% type B, 40% type C, 20% custom homes. Or it maybe some other arbitrary combination. And each of these segments may be divisible into groups of different GLAs. And these may be further divided into those with 1, 2, 3 or more bathrooms, bedrooms or other types of homes. That is to say, we know all kinds of detailed facts about these "objects".

4. The traditional "matched pair" analysis was a kind of non-parametric method. And quite frankly, surprise-surprise, it is in many ways superior to the use of parametric statistical methods!!

5. Oh the EVIL that has been done by the AI and others. Unbelievable.

6. Anyway, to get on with this. There are statistical methods that do a very good job in creating models for in many cases fairly accurately explaining the reasons for differences in home prices, based on their features. They are called "non-parametric". They are essentially a combination of "matched-pair" analysis and statistical methods.

7. The best among these is MARS from Salford Systems. Now there are other such software variations out there which have different names, because Salford Systems put a trademark on the name MARS and aggressively protect it. R-Language has "earth". However, Salford Systems made their version extremely efficient and accurate many years ago, and as far as I know, no one else can touch it.

8. MARS costs me about $320/year, as I recall. So, it is affordable.

9. Sorry I don't have more time. But here is a link to a presentation I did back in 2006 and the Salford Systems Data Mining Conference: http://docs.salford-systems.com/BertCraytor.pdf

10. And I would beg the AI and others to stop pushing parametric methods to do a job they are largely not suited for (yes, in certain tract development, under certain conditions, they can be somewhat useful, but typically not very).

11. Also should mention, that MARS, asides from doing regression, automatically finds the "categories" that houses group into on different features. These are indicated by the breaks in the segmented linear regression. It can handle, missing variables and categorical variables. Got weird housing groups, custom homes, anomalies? No problem. Have small sample sizes? No problem.

12. MARS, as nearly all non-parametric methods, does not require large sample sizes, as they make no assumptions on underlying distributions. E.g. there is no "p" value. They simply juggle things around to create a model that best describes the differences in values based on features. It is like "matched-pairs" on steroids.

I have to go.
 
Last edited:
I use statistics for mass appraisal regularly...because it makes sense to do so. If I have to value an individual property then statistics takes on a test-of-reasonableness-type role and for market condition support. Many statistical techniques are overkill or inappropriately applied...I think that is more a symptom of the user not fully understanding their relevance (or lack thereof) to a singular property valuation problem. I mathturbated for many years before coming to the conclusion that sometimes the simplest methods are the most appropriate...but I am still fascinated by what stats can do.
 
In AI class today and someone asked about collinearity:

Once you use MARS for a while, you probably won’t be too concerned about collinearity. There is always some of that of course. Lot size and GLA are collinear. We expect that and understand it. We understand most relationships and variables that pop up. The important thing we are looking for is a predictive model. That model may say Price = $100,000 + $100 x LotSize + $300 x GLA. If it has an Adjusted R2 of 90%, that is, if it does a very good job of explaining the variance in home prices for the subject neighborhood and property type, we don’t much care about collinearity. We aren’t trying to prove anything about the relationship of the variables as much as predicting the sale price of a property at some point in time. Or alternatively, more probably, we pull out the separate parts of the equation for each feature and create adjustments. And because in the form we actually have only a few variables – we aren’t likely very interested in other variables. Whatever adjustments we make for lot size and GLA, collinearity if it exists, still exists. So, I wouldn’t be too concerned with it; it is just common sense that some features have certain dependencies. You can’t very easily build a 5000 sf one-story on a 4000sf lot.

However, we can let the software go wild and throw in a lot of variables to get minor and rather useless improvements in accuracy. – And such variables may be highly collinear with a more important variable. If we don’t limit what the software does, it can create a messy model that gives little improvement in accuracy.

And on the other hand, MARS handles missing variables by searching for collinearity in other variables and using some function of those other variables to substitute for the missing value. So, if you have a sale that is missing lot size, MARS may decide that it can use a function of GLA and something else to create a substitute value. Of course, only if you allow it! (missing values treatment can be indicated on a variable by variable basis – and we will see the substitute in the final model).

MARS allows you to adjust certain things to allow it to avoid creating new variables with only very minor improvements in accuracy. In fact, it allows a number of settings, and I usually start with a set and then tweak it somewhat for the data. (You should always publish your settings in the report – so someone else could replicate the same model, with the same data (although they will usually go out and get their own data – because you can’t distribute your own due to MLS restrictions. But, the state board could get your data and would expect to be able to create the same model with your settings. )

More on this (search for collinearity):

http://media.salford-systems.com/pdf/spm8/IntroMARS_v_8_2.pdf


There are many things you can do with MARS that are more advanced subjects. You could, for example, put average daily loan rates for the last couple of years and ask it to find the impact on value – using time lags. One of the things you absolutely need to learn is controlling overfitting either by simply limiting the number of basis functions – or using another technique. If your variable graphs are too complex then you need to restrict the max number of basis functions. – MARS is non-parametric and is geared towards creating a model that fits the data – so of course you can have it perfectly fit every data point – but what good is that? You can also do this by specifying a minimum number of observations (home sales) that each regression line segment (or category) needs to span. Which means, for example, like don’t give me a new knot (line segment) if it is only going to improve the fit for less than 3 sales.

MARS in a sense is more like matched pair analysis. It is going to create adjustments by creating a model that best fits whatever data you throw at it. It is not at all concerned about the general population it is drawing the data from. (However, there are techniques employed by MARS in doing what it does, that may use ANOVA or some other more traditional technique for intermediate tasks).
 
Yeah but what about motivations and highest and best use analysis. I understand some are inherent, but that depends on the sample pool.

In a homogeneous market like those that exist in several states, appraisers could use it well. In very heterogeneous markets, forget it. It could set large parameters at best. Complexity of real estate is unlike any other market. No other market compares.

Then too you have intended use and user aside from market value and motivations. That’s in part why I get appraisals handed to me by parties to the transaction more and more. Appraisers are shifting gears.
 
Last edited:
If I follow correctly, I think this has been referred to in older posts as a step-wise regression method. Sounds like MARS automates that somewhat.
 
Stats has its place in the sun. As a means to derive a value no. As a means to segregate adjustments and eliminate things that don't correlate, yes. I use it for many things but is most useful to prove something like there being no correlation between value and a fireplace not already addressed as a quality issue.
 
The use of statistic modeling in real estate appraisal is appropriate when buyers and sellers start behaving with scientific consistency. Statistical sampling's last accomplishment was getting the 2016 Presidential Election absolutely wrong, when they only had two choices, their R2 was 0.

Watch a couple of Flipping Las Vegas shows on cable and try to figure out why buyers will pay 25% more per square foot for gold glitter.
 
Last edited:
Status
Not open for further replies.
Find a Real Estate Appraiser - Enter Zip Code

Copyright © 2000-, AppraisersForum.com, All Rights Reserved
AppraisersForum.com is proudly hosted by the folks at
AppraiserSites.com
Back
Top