- Joined
- Jun 27, 2017
- Professional Status
- Certified General Appraiser
- State
- California
Isn't about time we have a separate forum group for "Appraisal Statistics"?
1. And, it is time for me to say something about Appraisal Statistics as taught by the Appraisal Institute and others. The discipline has done a lot of harm to the profession!. The typical book in appraisal statistics is not written by a statistician or mathematician. It is most likely written by someone with a Ph.D. in Real Estate from some lower level state university. They know what they know, but don't really understand it to any depth. - What they don't know is appalling.
2. The problem is that house values should not be treated like typical "parametric" statistical distributions. Parametric distributions are those that we can assume follow some kind of regular pattern like a normal distribution. For example population IQs. Statisticians discovered a long time ago that certain things like intelligence and semi-random events like the distribution of bullet shots around a target by a given shooter, fit certain kinds of distributions. They would study these distributions and discover certain parameters that could be used for making powerful predictions. Typically in such distributions we do not understand the why and wherefore of what causes the distribution, it is just an observation that we know follows a certain pattern.
3. But house prices are different. We know the structure of houses. We know the assessor data and we can study the internal structure to great depths. We also know that certain characteristics of homes have an effect on value. The kind of "statistics" we thus employ are quite different. For example, a neighborhood may be a census tract that is 30% type A, 10% type B, 40% type C, 20% custom homes. Or it maybe some other arbitrary combination. And each of these segments may be divisible into groups of different GLAs. And these may be further divided into those with 1, 2, 3 or more bathrooms, bedrooms or other types of homes. That is to say, we know all kinds of detailed facts about these "objects".
4. The traditional "matched pair" analysis was a kind of non-parametric method. And quite frankly, surprise-surprise, it is in many ways superior to the use of parametric statistical methods!!
5. Oh the EVIL that has been done by the AI and others. Unbelievable.
6. Anyway, to get on with this. There are statistical methods that do a very good job in creating models for in many cases fairly accurately explaining the reasons for differences in home prices, based on their features. They are called "non-parametric". They are essentially a combination of "matched-pair" analysis and statistical methods.
7. The best among these is MARS from Salford Systems. Now there are other such software variations out there which have different names, because Salford Systems put a trademark on the name MARS and aggressively protect it. R-Language has "earth". However, Salford Systems made their version extremely efficient and accurate many years ago, and as far as I know, no one else can touch it.
8. MARS costs me about $320/year, as I recall. So, it is affordable.
9. Sorry I don't have more time. But here is a link to a presentation I did back in 2006 and the Salford Systems Data Mining Conference: http://docs.salford-systems.com/BertCraytor.pdf
10. And I would beg the AI and others to stop pushing parametric methods to do a job they are largely not suited for (yes, in certain tract development, under certain conditions, they can be somewhat useful, but typically not very).
11. Also should mention, that MARS, asides from doing regression, automatically finds the "categories" that houses group into on different features. These are indicated by the breaks in the segmented linear regression. It can handle, missing variables and categorical variables. Got weird housing groups, custom homes, anomalies? No problem. Have small sample sizes? No problem.
12. MARS, as nearly all non-parametric methods, does not require large sample sizes, as they make no assumptions on underlying distributions. E.g. there is no "p" value. They simply juggle things around to create a model that best describes the differences in values based on features. It is like "matched-pairs" on steroids.
I have to go.
1. And, it is time for me to say something about Appraisal Statistics as taught by the Appraisal Institute and others. The discipline has done a lot of harm to the profession!. The typical book in appraisal statistics is not written by a statistician or mathematician. It is most likely written by someone with a Ph.D. in Real Estate from some lower level state university. They know what they know, but don't really understand it to any depth. - What they don't know is appalling.
2. The problem is that house values should not be treated like typical "parametric" statistical distributions. Parametric distributions are those that we can assume follow some kind of regular pattern like a normal distribution. For example population IQs. Statisticians discovered a long time ago that certain things like intelligence and semi-random events like the distribution of bullet shots around a target by a given shooter, fit certain kinds of distributions. They would study these distributions and discover certain parameters that could be used for making powerful predictions. Typically in such distributions we do not understand the why and wherefore of what causes the distribution, it is just an observation that we know follows a certain pattern.
3. But house prices are different. We know the structure of houses. We know the assessor data and we can study the internal structure to great depths. We also know that certain characteristics of homes have an effect on value. The kind of "statistics" we thus employ are quite different. For example, a neighborhood may be a census tract that is 30% type A, 10% type B, 40% type C, 20% custom homes. Or it maybe some other arbitrary combination. And each of these segments may be divisible into groups of different GLAs. And these may be further divided into those with 1, 2, 3 or more bathrooms, bedrooms or other types of homes. That is to say, we know all kinds of detailed facts about these "objects".
4. The traditional "matched pair" analysis was a kind of non-parametric method. And quite frankly, surprise-surprise, it is in many ways superior to the use of parametric statistical methods!!
5. Oh the EVIL that has been done by the AI and others. Unbelievable.
6. Anyway, to get on with this. There are statistical methods that do a very good job in creating models for in many cases fairly accurately explaining the reasons for differences in home prices, based on their features. They are called "non-parametric". They are essentially a combination of "matched-pair" analysis and statistical methods.
7. The best among these is MARS from Salford Systems. Now there are other such software variations out there which have different names, because Salford Systems put a trademark on the name MARS and aggressively protect it. R-Language has "earth". However, Salford Systems made their version extremely efficient and accurate many years ago, and as far as I know, no one else can touch it.
8. MARS costs me about $320/year, as I recall. So, it is affordable.
9. Sorry I don't have more time. But here is a link to a presentation I did back in 2006 and the Salford Systems Data Mining Conference: http://docs.salford-systems.com/BertCraytor.pdf
10. And I would beg the AI and others to stop pushing parametric methods to do a job they are largely not suited for (yes, in certain tract development, under certain conditions, they can be somewhat useful, but typically not very).
11. Also should mention, that MARS, asides from doing regression, automatically finds the "categories" that houses group into on different features. These are indicated by the breaks in the segmented linear regression. It can handle, missing variables and categorical variables. Got weird housing groups, custom homes, anomalies? No problem. Have small sample sizes? No problem.
12. MARS, as nearly all non-parametric methods, does not require large sample sizes, as they make no assumptions on underlying distributions. E.g. there is no "p" value. They simply juggle things around to create a model that best describes the differences in values based on features. It is like "matched-pairs" on steroids.
I have to go.
Last edited: