Appraisal Statistics: Regression - Do Your Really Know How Much Work It Is?

RCA · Jun 12, 2018

Michael P Jacobs MAI said:
It's not my intent to belittle or demean anyone here, but for those of us objecting to Bert's method: as far as I can tell from a superficial reading of this and Bert's other comments as well as my experience with many (too many) MLS multi-variable regressions and grad-level econometrics courses, it is close to - if not - textbook. The appraisal INDUSTRY has to realize that what Bert calls "appraisal statistics" and what I consider "real estate valuation econometrics" is very, very deep and not within the grasp of even the most experienced appraisers that haven't had the training at graduate level (or equivalent). He hasn't even addressed the confidence testing math he alludes to because it would be way over the vast majority of our heads. Some of you know this, but others of you "do not know what you do not know." Even those who routinely run regressions without understanding the math behind them - as well as all the pitfalls (autocorrelation, heteroskedasticity, multicolinearity, etc.) - are making the big mistake thinking that it's just a matter of running the software.

I think it's great that Bert's taking the initiative to bring this topic to the forefront, but unless you have some capacity to understand the concepts and math involved take this opportunity to learn some of the concepts. I urge you to ask questions and let him explain - he may eventually refer you to some of the links to articles explaining the formulas and concepts necessary to obtain "good" results. Hopefully, many of you will go on to learn more about the issues so that you can successfully argue AVM in the AVM language, not from a perspective of ignorance and misunderstanding (sorry, we are all ignorant about some things).

Bert, you are going to face some difficulty obtaining a satisfying and productive discussion of this here. I hope you find enough educated minds, or at least bring some of the concepts to appraisers faced with AVM-obsolescence! I don't do residential anymore and have no profit in the subject, but it would be great if appraisers could present their objection to AVMs from an educated AND experienced perspective (not just educated OR experienced).

I need to come up with some really good appraisal problems and show how I would do them. - I don't think appraisers need to know all that much theory - but those who design the statistics courses for appraisers, who choose the methods and protocols to be employed in appraisal, need to have a fairly broad knowledge of statistics and also consult with a good mathematician in a very communicative way. I see these statistics books written and sold from the AI store that are for example written by someone with a Ph.D. in Real Estate from a second or third tier university. They may have consulted with some mathematicians - but quite possibly didn't communicate the characteristics of real estate data very accurately. I say this because I don't see much if any mention of non-parametric statistics in their books. Real Estate data consists primarily of non-uniform distributions and we should expect the majority of statistical methods to be non-parametric. I mentioned this actually to someone recently who countered that [all] distributions could be mapped to normal distributions. (--- Well, in some cases, you can map certain uniform distributions to normal, but it doesn't really apply to most of the distributions we get in real estate) It appears that while these authors have a very solid understanding of a certain area of statistics (primarily traditional parametric statistics), they don't have a broad enough understanding to enable them to sit back, study the characteristics of the data appraisers deal with, and then choose from a wide range of available statistical methods and a wide range of software applications, those that best do the job. Aside from trac homes, real estate data is not a normal distribution like IQs. With normal distributions parameters like variance or standard deviation can be powerful tools in making predictions. But in real estate they fall flat. They don't work. They are nonsense. But it's so easy to take these things and make it look like you have done some real work and come to a meaningful conclusion!

I'm convinced that Salford Systems MARS is the only tool that appraisers need to extract adjustments from market data, there isn't anything else that comes close. I could teach appraisers how to use MARS, and you don't need much of a theoretical background. MARS is non-parametric, it doesn't make false assumptions about the distribution of data, - it just fits a segment linear model to the data to get the tightest fit, in such a manner that it can be used to estimate value for new properties that are not in the "training" data set. There are many other tools as well, such as MDS (multi-dimensional scaling) and a vast array of R scripts for visualization and analysis.

Lastly, I hate economists. Probably because they throw everything into overly simplistic models and make heavy use of parametric methods. They are the reason our politicians in the past have been so gullible in believing a lot of BS about trade balances and tariffs. Thank God for Trump.

Joe Flacco · Jun 12, 2018

You should have a discussion with Leasedfee. I bet you guys can talk about regression and statistics for days.

DCFjay13 · Jun 12, 2018

It's a lot of work, but I feel the juice is worth the squeeze.

Michael P Jacobs MAI · Jun 12, 2018

Bert Craytor said:
I don't think appraisers need to know all that much theory ... I hate economists. Probably because they throw everything into overly simplistic models and make heavy use of parametric methods...

Well, your first statement and your last clash and that's an issue. Not knowing the theory is going to result in two problems - first, that an appraiser is going to eventually get skewered over the theory if they are relying on regression, especially when the tools are there to uncover and often correct the problems in a model I've named twice. I really like the poster that noted sometimes his best comparables were statistical outliers - that's important. More to my main objective, however, is to "discourage" reliance on AVMs by detailing the flaws from a practical perspective - the perspective of a day-to-day appraiser that also has the theory down pat who can demonstrate why AVM reliance is just bad underwriting. It's going to be difficult to find someone like that who can argue head-to-head with an academic because they truly love their math, but we as a profession are unprepared to defend ourselves from wonks that can talk-the-talk but can't even walk-the-walk. My preference would be that appraisers run the AVMs, like you and a few others here do, and not the other way around. God bless you, though! Keep at it!

RCA · Jun 13, 2018

Well, the point is that understanding how non-parametric methods like CART and MARS works does not take a graduate level course in mathematical statistics. In fact, I have in my hands lecture notes on CART by Prof. Clifton D. Sutton of the statistics department at George Mason University. He got his Ph.D. at Stanford in 1987 - and I imagine associated with Friedman and Breiman at the time ( You won't see any delving into the calculus of characteristic functions, it's really all about strategy trying to place objects into classification and regression trees. Iteratively growing the trees and pruning them to improve the ability to predict the desired outcome for a given training set of observations (in our parlance "real estate property sales values"). It's about creating what might be a relatively simple classification structure on top of complex data. MARS operates as an extension of CART. For example I have some home sales and associated GLA and other values. MARS will try to find an optimal (not too large but large enough to provide a good prediction) set of GLA ranges, each of which we can throw a linear function on for predicting price. It does a lot of computation and determines that 500-1500 sf, is a good first range, 1501-2500 sf is a good second range and 2501-3500 sf is a good last range, there being no larger homes in the training (input) data set. It then calculates a linear function that runs from 500 to 1500 to 2500 to 3500 in such a way that all the lines connect (spline continuity) and minimize the SSE (Sum of Squared Errors) between the predicted and actual sale prices. Now you need to know about the concept of conditional probability and be comfortable with tree structures to understand it. But I would think a good 15% of appraisers would feel fairly comfortable with the explanations. And, more importantly they could use MARS software to do the job. CART and MARS both excel at handling data that is non-homogenous, has outliers and so on. There are a number of things to learn about using MARS such as "cross-validation", but they do not really take a lot of background in math. Cross-validation is when you run CART or MARS on a data set of say 500 properties. MARS will take a certain percentage you specify, say 80% and use them to create a model. Then it will test it on the other 20%. It will do this over and over, creating a number of different models based on different random selections the properties to create the 80/20% partitions. It will then rank them and suggest using a model with a good balance of accuracy and simplicity - however you can choose to go with other models. An appraiser should look at the associated plots and see if they make sense. You can also go out and remove basis functions out of model (manually) to smooth it out. - No harm. The whole point is that you are creating a model to predict home prices. You want it to be simple, robust and accurate. The MARS software is just a tool to get you to that point. The quality of the final model is how well it can predict prices based on important features. .... WHO CARES HOW YOU GOT THERE? ---- Well, there is at least a certain amount truth in that. That's non-parametric statistics. Trump math.

Michael P Jacobs MAI · Jun 13, 2018

Bert Craytor said:
Well, the point is that understanding how non-parametric methods like CART and MARS works does not take a graduate level course in mathematical statistics....

I truly hope so, and your point that it is within the grasp of the "average" is encouraging! But just so that I know a little more about where you are coming from, do these apps measure and correct for autocorrelation, heteroskedasticity and multicolinearity? Or is it up to the user? Do you make adjustments for them in your models?

Thomas Holding · Jun 13, 2018

Bert Craytor said:
I need to come up with some really good appraisal problems and show how I would do them.......I'm convinced that Salford Systems MARS is the only tool that appraisers need to extract adjustments from market data, there isn't anything else that comes close. I could teach appraisers how to use MARS, and you don't need much of a theoretical background.

Before you mentioned it, I had never knew that this software existed. After reading up on its capabilities, and trying to grasp the concepts behind each tool, I'm beginning to see how this can be very useful. The big question is, how could an appraiser justify the $15,000 per year price tag? I could see where this would be a great tool for mass appraisal, and county governments might pony up the cash to pay for it.

RCA · Jun 13, 2018

Well, these are all certainly an issue. Let me try to make it a bit simple: The way CART and MARS work is to FIRST split all significant variables into chunks based on let's call it "common behavior" through creating classifiers. [Understand this is not really exact, as the initial classification can be quite arbitrary, - but it is a good way to think of it, if you are starting out.] This is an attempt to, from a set of non-homogeneous data, create subsets which are more homogenous. They start with a group of data and then start dividing into subgroups to create a tree. First group is say A which is all of the data, then they create two child nodes based upon some split in the data, say B and C, then each of those is split and so on and so forth. The splits are engineered taking a number of things into consideration - until splitting is no longer possible (depending on a parameter the user sets, e.g. if you only have 3 objects left, stop splitting).

Now, classification trees are actually interesting in themselves, in that they model the way people think. People tend to work in layers. You start with a basic understanding of something, that while not very correct, gives you a basic intuitive feel for some problem. It produces results by allowing you to say: These objects go on the left because they all seem to have this or that characteristic or behaviour, and the other objects go on the right. Then if you have more time, you focus on the objects on the left or right, give them so more thought, see new more detailed patterns, each maybe along completely different criteria and then classify them into smaller subgroups. And so on. That is what CART and MARS do, kind of.

The statisticians who developed CART and MARS are concerned with the issues you mentioned, because they affect the outcome and they have developed fairly reasonable techniques to overcome them. In fact, refer to the link below on CART and you will find the term “heteroskedasticity” mentioned eleven times, with a section on it.

With respect to covariance, the impact is that it tends to create more variables. MARS tries to simplify its models by cutting out parameters that are not significant. So, for example, TotalBedrooms might be highly covariant with GLA, and MARS will try to eliminate TotalBedrooms and then GLA and see which produced a better R2. Likely GLA produced a better R2. GLA will be selected over TotalBedrooms when it comes to creating the basis functions, TotalBedrooms will just pick up the slack. It will therefore use at one stage both variables, letting TotalBedrooms have a small partial weight to improve the R2. But then it will turn around and try eliminating TotalBedrooms from the model to see what damage that causes. If it doesn’t cause much damage, then it will be eliminated for the sake of simplicity.

You have to understand it works on creating the regression tree in forward and backward steps, adding child nodes, testing, pruning, testing, running cross-variance checks until it can’t improve things much more.

It does a hell of a lot of computation.

There really aren’t tests of significance other than R2 and its variants like GCV-R2. (I won’t get into here).

http://docs.salford-systems.com/CART_Main.pdf

But, I should also add that there are supporting statistics, for various side issues, such as the importance of variables

Thomas Holding · Jun 13, 2018

Bert Craytor said:
You have to understand it works on creating the regression tree in forward and backward steps, adding child nodes, testing, pruning, testing, running cross-variance checks until it can’t improve things much more.

It does a hell of a lot of computation.

How long does it take to perform its calculations? How powerful of a computer do you need and does it offload some of its calculations to the graphics card?

hastalavista · Jun 13, 2018

Bert-

Let me ask you a question from a boots-on-the-ground residential appraiser....
Assume I'm valuing a home in a relatively conforming neighborhood. Typical suburban tract development. In my initial search parameters, assume I'm gathering data to these parameters: Size range 1,800sf to 2,350sf; age range 30-45 years; lot size 8,000sf to 13,000sf. Assume over the last year, there are 40 data points and the price range is $700,000 to $825,000; assume there may have been an outlier or two at $600k and one at $1,000,000 and I throw those out of the mix. Assume the market has been increasing (overall) at 6% and based on the price distribution by date, we see homes at the beginning of the period selling for less than homes at the tail (consistent with the expectation of an overall rising market).

From the ground level, it would appear the differences in value are based on (a) condition, (b) size of the home, (c) possibly the specific location, (d) to a degree another amenity (say, a pool), and (e) the motivations of the individual buyer/seller.

Question: It seems to me that I've already filtered the data to a relatively high degree. I don't have a random distribution; I have neighborhood-specific sales, all which more or less compete with one another, share the same/very similar buyer pool, and for the most part, equally share external/economic factors/forces.
At this point, how much more discrete must I get in my filtering system or analysis whereby a simple, linear regression doesn't provide a reasonable starting point to determine a GLA adjustment within that data set such that it cannot be refined in the grid using simple sensitivity analysis after adjusting for other factors (condition, let's say) which I can also support using old-fashion pairing?

Appraisal Statistics: Regression - Do Your Really Know How Much Work It Is?

RCA

Elite Member

Joe Flacco

Elite Member

DCFjay13

Freshman Member

Michael P Jacobs MAI

Member

RCA

Elite Member

Michael P Jacobs MAI

Member

Thomas Holding

Junior Member

RCA

Elite Member

Thomas Holding

Junior Member

hastalavista

Elite Member