Regression Analysis

Joe Flacco · May 19, 2017

leasedfee said:
LOL, and you know far more about pools than I. You illustrate a weakness of statistical models: qualitative attributes. The likely datatable would simply dummy code No Pool =0 and Pool = 1. It'd average out the nice pools with the bad pools with the fabulous pools. Unless you sat down and tried to quantify the qualities: Big Pool vs. Small Pool (. . . and what is the functional size pool relative to the type of house?), or Simple Pool = 1, Good Pool =2, Great Pool = 3, Olympic = 4, and what to do with the hot tub and pool house. . . . As location improves, the homes can change. Homes with 2"x4"s and asphalt shingles in one neighborhood compare poorly to 2"x 6"s and slate roofs. Cross the street, and a new home in swanky neighborhood uses cheapo materials like EIFs and synthetic stone facade.

I think homogeneity is more important than statistical robustness when developing adjustments. Obviously it's great if you can get both.

Joe Flacco · May 19, 2017

This is the kind of stuff being marketed to residential appraisers as regression. Handle over 3,000 imported comps for regression analysis. It even does your cost approach, effective age, HBU, and reconciliation in minutes.

I wonder if the AI and Columbia Institute know that they have been using their adjustments.

DTB · May 19, 2017

I wouldn't hire a quant to appraise my house so why would I hire an appraiser to be a quant?

Let big data run the numbers and have them available to appraisers.
Let appraisers use their local market judgment, experience and expertise to add context to the quant run report.

Amy Perkins · May 19, 2017

Human beings will and always will be the best indicators for complex statistical analysis. Tools are great, as long as you understand how the data is being applied. Computers will always be literal and lack the common sense, experience, and judgement of a good old fashioned seasoned appraiser.

NachoPerito · May 20, 2017

leasedfee said:
MS Excel has a free add-on called "Data Analysis". It is already loaded onto your machine, but you have to turn-it-on, though the advanced Options menu. It may be called the Analysis ToolPak. When you have your columns of data, it will give you a pop-up wizard. It will do a very nice multiple regression analysis. It more than works for my needs, and is friendlier than semi-professional statistical software.

Be very careful with this. One of the critical, major assumptions (overlooked by real estate peoples) of regression and general Anova is a homogeneous population. I.e., an apple is an apple. In appraisal, as we expand n, the sample size, we expand into less homogeneous properties.

I found the add-in in the most recent excel:
Options -> Advanced -> Analysis Tookpak
select Go and then you will find it hiding under the Data tab when you are ready to use it.

I ran a set of data that illustrates your point. I did a single regression, haven't figured out multiple regression with it yet.

I analyzed 250 sales in the City of Marysville, built from 2004-2009, 2000-2800 SF, sold in past 2.5 years.
X-axis - Square footage
Y-axis - Price

The trendline (aka X Variable 1) showed an adjustment of $105/SF
This seems great on the surface until you think more deeply and realize that:
Larger homes aren't just larger, but they usually have a larger site and larger garages, and other things I am not thinking of, which is making the $105/SF bloated

leasedfee · May 20, 2017

NachoPerito said:
I did a single regression, haven't figured out multiple regression with it yet.

Select two or more columns as the independent (X) variables. (Putting labels on these columns as Excel will make it easier to read the output.)

NachoPerito said:
I analyzed 250 sales in the City of Marysville, built from 2004-2009, 2000-2800 SF, sold in past 2.5 years.

May I recommend that you tighten your time frame to something like 6 months or 1 year. May I recommend that you narrow the data set to a part of town or a neighborhood or two or three like subdivisions.

leasedfee · May 21, 2017

Here is a MRA of a simple 1950s ranch style working-class single family residence.

Two variables example everything: Square Footage, and Garage Space (or 2). The dependent variable was Value/SF. I looked at Value, but the correlations, while good, wasn't as strong. (Garage square footage revealed nearly identical results). I used 11 sales, all from the same neighborhood. I had about 10-12 sales with basements but decided to just drop that out of the dataset, and only examine SFRs on slabs. That extra variable, "basement", reduced the statistical measures for whatever benefit having a dozen more data-points added. It took a lot of work to research and throw out by hand a couple of non-arms-length sales and some sales fronting a heavy thoroughfare, which sold at 20% less. To do this, I charted the sales of price versus square footage, and then investigated those that were far below the rest of the trend-line. There are more mathematical techniques, like computing a Z-score or H "leverage" score for each observation. This model is extremely strong. 98% R^2. In other words, they all fall on/near the same line. And this line is 98% explained by Square Footage and Garage Spaces. 2% is other stuff. Correlation isn't causation -- but here it is. I added to Excel the COV and COD and it was quite good. The F score say excellent -- so no multi-colinearity, e.g., bedroom counts mathematically confuting against square footage.

The Coeficients say value is $177.69/sf as a constant, minus $0.07/sf X square footage of comp/subject. So a marginally diminishing return on bigger square footage homes. A Garage Space added $2.27/sf. So a 1,200 sf home with a garage adds $2,700. (A different model computing # of garage spaces, found $3,451 for a space). The P-stats are very strong (a little high on the garage variable). One comp I couldn't figure out if it was a 1 car garage or 1.5 size, and with a small dataset, this may add some error.

More advanced software have scripts to run numerous models and find the best fit. MS Excel doesn't, so I manually ran 20+ different models to find the marketplace's strongest relationship.

Continued.

Terrel L. Shields · May 21, 2017

leasedfee said:
It'd average out the nice pools with the bad pools with the fabulous pools....It took a lot of work to research and throw out by hand a couple of non-arms-length sales

Normally, I am looking for adjustments to the house only. So I subtract the land value, outbuildings, pools, etc. so the core values are parsed to the fundamental structure. That applies to Sensitivity & MLR both. Thus drivers of value are age, condition, SF, garage. Minimizing variables can let you play with it. Add a dummy variable, say fireplace, and see if it makes sense, etc.

leasedfee · May 21, 2017

One of the least interesting things is looking at the residuals (how the real world empirical model varies against your regression model), but it is a necessary chore: The Square Footage falls randomly positive and negative of the regression line, so there's no apparent second variable or non-linear pattern.

Same for garage spaces:

I then used the model as a prediction of my subject. Rather than a specific price, I wanted to know how reliable the model was for what is called a "Prediction Interval". (If a politician is ahead in the polls at 51% to 49%, but it is +/- 3%, then they're not really ahead at all, and should be reported as Candidate A at 48% to 54% versus Candidate B at 46% to 52%; and that assumes the polls are truly random samples of likely voters.) MS Excel doesn't compute PI for you so here goes the math:

The PI indicates that my subject at 1,215 sf would be $100.29/sf with a 95% chance of statistically being between $95.67/sf and $104.91/sf.
Increasing sample size to say 103 data points (comps), minus 3 degrees of freedom used for two variables (aka, Element of Comparison) plus the default one DOF doesn't help a huge amount in the PI. The PI would only be reduced about 14% to 1.984 instead of the 2.306 (on line 50) of 8 degrees of freedom (http://www.sjsu.edu/faculty/gerstman/StatPrimer/t-table.pdf). My quality control was by using good comps to begin with rather than everything-in-the-blender constituting non-comps, sorta comps, and erroneous comps. Conclusion: This was a huge amount of work and nowhere close to being presentable to a client. A traditional appraiser (my AppraiserForum friends) would take 4-6 hours to complete in entirely, and you would'a quickly selected 3-4 comps out of my n = 11 samples. We might vary on which of the 11 should be tossed, but all 11 would be stashed in your file. All in all, it is an excellent model, and would gladly stand by it in a court of law. Lastly, this being the graph that most people would want to see: The sales and the model (1) overlap, and (2) follow a linear relationship.

Joe Flacco · May 21, 2017

In my opinion the benefits of the information age to residential real estate valuation is not analytics. The explosion of data and accessibility of the data allows the appraiser to better identify the differences that impact value.

For example a 2,000 SF house built in 1940 is not the same thing as a 1940 house that was originally 1,200 SF but expanded to 2,000 SF in 1990. A 1940 house that was originally 1,200 SF but expanded to 2,000 SF in 2010 is different than the house that was expanded in 1990. We are now able to break down the market data down to if it is a expanded home and when the expansion occurred. We can even see if more recent updates have been made since the 1990 expansion.

We can even take it another step further. We are even able to see the floor plan created by the addition. Sometimes a addition tacks on a family room on the main leve. Sometimes a addition removes the entire rear wall and adds a new kitchen / family room area which is a more functional floor plan. This kind of addition costs more to do. We now know details like this and can reconcile the opinion of value depending on which kind of addition the subject property has when it was done.

Analytics with real estate data basically assumes that all is equal except for a short list of differences (elements of comparison). Analytics is not the best way to take advantange of all the information available today.

Regression Analysis

Joe Flacco

Elite Member

Joe Flacco

Elite Member

DTB

Elite Member

Amy Perkins

Senior Member

NachoPerito

Senior Member

leasedfee

Member

leasedfee

Member

Attachments

Terrel L. Shields

Elite Member

leasedfee

Member

Attachments

Joe Flacco

Elite Member