• Welcome to AppraisersForum.com, the premier online  community for the discussion of real estate appraisal. Register a free account to be able to post and unlock additional forums and features.

Inaugural Issue of "Valuation Engineer Journal"

RCA

Elite Member
Gold Supporting Member
Joined
Jun 27, 2017
Professional Status
Certified General Appraiser
State
California
I have just published the first issue of "Valuation Engineer Journal". It is for Summer 2026, with effective publication date of July 1, 2026. But I have added the first article already:

The Doctrine–Practice Gap in Real Estate Appraisal: A Structured Account of Functions, Boundaries, and Tensions


So in the next several weeks i will be add another 3-4 articles already written, and be shift my DOI from Zenodo to CrossRef + Zenodo.

I will just keep tweaking it to do more.

What makes it different from other appraisal journals:

1. It deals with higher-level statistics that work: MARS, GLR, GAM, ...
2. It deals with multiple computer languages, but particularly Prolog for handling protocols ( and you can see this quite extensively in the above article), R for statistics, Jupyter, Quarto, Python, and C++.
3. It will deal with GIS, particularly QGIS and map making, demographic, and neighborhood mapping methods.
4. Protocols at all levels from high level IVS and USPAP, to banking, GSE, FHA, state and local regulations and guidelines, - down to protocols for doing the low level technical work needed for high-quality proposals.
 
Last edited:
doesn't Claude Opus 4.8 with code execution and web search already run MARS, GAM, and glmnet on pulled sales data with nothing more than a subject address and a prompt making the talent pipeline problem mostly a non issue?
 
That was actually pretty cool in 4.6 the code execution and statistical work is the same so I gave it 238 similar sales and asked it to derive adjustments for an appraisal I'm working on. It was a little off and needs tweaking I only took a couple minutes to do the entire exercise but with some light work and buying 4.8 this could easily replace Synapse which I think is $50 a month vs 4.8 at $20 a month

1780946296115.jpeg
 
doesn't Claude Opus 4.8 with code execution and web search already run MARS, GAM, and glmnet on pulled sales data with nothing more than a subject address and a prompt making the talent pipeline problem mostly a non issue?
Of course not. Claude, as I have already stated, as well as other AI Chat services will just scan the web for estimates from the likes of Zillow and Redfin, and then do some kind of weighted averaging based on how accurate it thinks they are. If you want more than that, give it details about the subject, provide plenty of comps with data, and tell it you are an appraiser; it might do more. You can also tell it to use MARS, glmnet, and mgcv (GAM), but in that case, you may be asked for parameters, which can get complicated. If you have Claude Code and give it references to my programs, you might get more cooperation. But that is hazardous, especially if you don't know what you are doing. I don't have time to play around with this sort of thing. I don't need to.

But, yes, appraisers can use these tools. However, they have to glue the results together. In the end, like guys on this forum always say, they just want to push a button. A well-trained A(-Agent can manage very complex workflows and do it right every time, while you are sleeping.

So, the other perspective, which may pan out, is that appraisers who have some money will build companies with the robots, the EVs, drones, cameras, and so on, and be the ones who take orders and hand them off to their computers, which send out robot inspectors, get the data, and then feed it to the computer AI systems which produce polished reports. That might turn out to be a thriving business. But they will still have to grapple with the ups and downs of the housing/lending industry
 
That was actually pretty cool in 4.6 the code execution and statistical work is the same so I gave it 238 similar sales and asked it to derive adjustments for an appraisal I'm working on. It was a little off and needs tweaking I only took a couple minutes to do the entire exercise but with some light work and buying 4.8 this could easily replace Synapse which I think is $50 a month vs 4.8 at $20 a month

View attachment 109707

I asked ChatGPT to analyze your results: "What do you think about this OLS analysis and value indication? For example what do they really mean by 2.5% variance?

ChatGPT Response:
Thought for a couple of seconds
Bert, my first reaction is that this is a useful exploratory regression summary, but I would be cautious about treating the displayed $570,215 model indication as a strong final value conclusion without more diagnostics, validation, and appraisal reconciliation.

The model is not useless. It has 238 sales, which is a respectable sample size, and the displayed RMSE of $50,730 / 9.8% is not terrible for rural/acreage residential property, where heterogeneity is often high. But the reported R² = 0.581 means the model explains only about 58.1% of the variation in sale prices within the analyzed price band. Roughly 41.9% remains unexplained by the included variables. That is a large residual component, especially if the output is being used to support a specific value opinion.

What “2.5% variance” probably means​

In the lower panel, the numbers shown are:

Model indication: $570,215
Contract price: $585,000
Variance: 2.5%

That “variance” almost certainly does not mean statistical variance in the formal sense.

It appears to mean:

585,000−570,215585,000≈2.53%585,000585,000−570,215≈2.53%
or possibly:

585,000−570,215570,215≈2.59%570,215585,000−570,215≈2.59%
Either way, the displayed 2.5% is simply saying that the contract price is about 2.5% above the model’s point estimate.

So, in plain English:

The model predicts about $570,000, while the contract price is $585,000; the contract price is only about 2.5% higher than the model indication.
That is better described as percent difference, percent deviation, price-to-model variance, or contract-to-model spread. Calling it “variance” is potentially misleading because statisticians and technically trained reviewers may read “variance” as a squared-dispersion measure.

The important comparison is not the 2.5%; it is the RMSE and prediction interval​

The contract price being only 2.5% above the model indication sounds very reassuring, but it needs to be interpreted against the model’s error.

The RMSE is $50,730, while the difference between contract price and model indication is:

585,000−570,215=14,785585,000−570,215=14,785
That is only:

14,78550,730≈0.2950,73014,785≈0.29
So the contract price is only about 0.29 RMSE units above the model prediction. That is small relative to the model’s ordinary prediction error.

In appraisal language, I would say:

The contract price is well within the model’s typical error range and does not appear inconsistent with the regression-based indication.
That is a much stronger and more defensible interpretation than focusing on “2.5% variance.”

The 95% range is very wide​

The displayed 95% range of $471k–$670k is also important. That is roughly:

670,000−471,000=199,000670,000−471,000=199,000
So the 95% range is almost $200,000 wide. The contract price of $585,000 falls comfortably inside that interval, but so would a very broad range of possible values.

This means the model supports the idea that $585,000 is plausible, but it does not narrowly prove that $585,000 is the correct market value.

In practical appraisal terms:

The model gives weak-to-moderate support for the contract price, not precise confirmation.
That distinction matters.

Several individual adjustments look suspicious​

Some coefficients look plausible, but some deserve scrutiny.

The above-grade GLA adjustment of +$79/SF and basement finished area at +$35/SF are directionally reasonable. The age adjustment of -$716/year is also plausible, depending on the market and how age interacts with condition, renovation, and effective age.

But several items raise flags:

Half bath: -$1,510.
The note correctly says this is probably a multicollinearity artifact. A negative half-bath coefficient is not impossible in a statistical model, but as an appraisal adjustment it is hard to explain unless half baths are acting as a proxy for some other negative feature or are highly correlated with other variables. I would not use that as a literal market-derived adjustment.

Deck/patio/porch: +$289.
The model is essentially saying this feature has no meaningful contributory value in this dataset. That may be true, or it may mean the variable is too crude. A simple presence/absence variable for “deck/patio/porch” may fail to distinguish a small concrete slab from a large high-quality outdoor living area.

Pool: +$15,715 with correlation +0.11.
This is directionally possible, but the low correlation suggests the effect may be weak or context-dependent. In rural/acreage property, pool value can vary sharply by condition, quality, climate, buyer preference, maintenance burden, and market segment.

HOA: +$21,036.
The note is sensible: this probably does not mean “HOA equals +$21,036.” It may be capturing better-maintained roads, superior subdivisions, gated communities, similar-quality homes, or other unmodeled locational/neighborhood effects.

The correlation column is only marginally helpful​

The “Corr w/ price” column shows simple bivariate relationships between each feature and price. That can be useful as a sanity check, but it can also be misleading because OLS coefficients are partial effects conditional on other variables.

For example, if full baths are highly correlated with GLA, quality, bedroom count, and age, then the simple correlation with price does not tell you the independent value of a full bath. The coefficient depends on the whole design matrix.

So I would not put too much weight on the correlation column except to identify weak or suspicious features.

The price-band restriction is both useful and risky​

The model is apparently restricted to the $350k–$750k band. That may improve relevance by excluding very low and very high outliers. But it also introduces a technical issue: filtering by sale price can create sample-selection effects.

In other words, if the dependent variable is sale price, and the dataset is selected partly based on sale price, the coefficients may be affected by the truncation. That does not make the model invalid, but it should be disclosed and tested.

For appraisal use, I would want to know:

  1. Was the subject reasonably centered within the model’s data range?
  2. How many sales were near the subject’s predicted value?
  3. Were there enough truly comparable rural/acreage sales?
  4. Were outliers reviewed individually?
  5. Were the coefficients stable if the price band changed slightly?
  6. Was there any holdout validation or cross-validation?
The displayed model does not answer those questions.

The R² is acceptable for exploration, but not enough by itself​

An R² of 0.581 in residential valuation is not shocking, especially for rural/acreage property. Rural properties are messy. Lot utility, views, outbuildings, road quality, privacy, condition, renovation level, school district, proximity to services, flood risk, topography, and buyer-specific motivations may all matter.
 
Last edited:
I uploaded the same CSV format I use for Synapse to derive adjustments for more complex homes around here but I made my search parameters unusually high and sucked in too many sales that were too dissimilar though I did apply some general filters. R² was 0.581, CV R² was negative which is why I said it needs tweaking and why the appraiser's judgment on the coefficients still matters. The point wasn't a publication ready model, it was deriving directionally sound adjustments in 2 minutes as a starting framework. I am going to play around with this more I wouldn't mind scrapping Synapse and saving a few bucks
 
Further, I asked ChatGPT: How would you suggest a better appraisal that provides more support for the value indication?

ChatGPT Response:


[COLOR=rgba(0, 0, 0, 0.847)] [/COLOR]​
"Bert, I would strengthen this appraisal by changing the role of the OLS model. Instead of using it as the primary value engine, I would use it as one component in a broader, well-supported valuation reconciliation. The current exhibit says, in effect, “The contract price is close to the regression estimate.” That is helpful, but it is not enough by itself. A better appraisal would show why the model is credible, why the selected comparables are relevant, and why the final value conclusion is more likely than competing indications.

1. Start with a clearer valuation question​

The appraisal should first define the subject’s competitive market more precisely than “Stafford County rural/acreage.” Rural/acreage property can include very different submarkets.
For example, the report should identify whether the subject competes mainly with:
commuter-oriented rural homes,
equestrian or hobby-farm properties,
large-lot subdivision homes,
older rural homes on acreage,
custom homes with privacy/amenity appeal, or
properties influenced by proximity to Fredericksburg, Quantico, I-95, or Warrenton/Fauquier markets.
The OLS model uses a broad Stafford County rural/acreage dataset. That may be acceptable as background support, but the final value should be tied to the subject’s actual competitive market segment.
A stronger report would explicitly answer:

Who is the probable buyer, what alternatives would that buyer consider, and why are these sales the best evidence?
That question matters more than the regression table.

2. Use the OLS model as support, not as the final answer​

The model indication of $570,215 and contract price of $585,000 are close. But because the model has an RMSE of about $50,730 and a 95% range of about $471,000 to $670,000, the model is not precise enough to carry the value conclusion alone.
I would write something like:
The regression model is used as a market calibration tool to estimate typical marginal relationships among GLA, basement area, lot size, age, baths, garage, fireplace, pool, and related features. The model is not used as a stand-alone valuation method. It is used to test the reasonableness of paired-sale and grouped-data adjustments and to evaluate whether the final value conclusion is consistent with broader market behavior.
That turns the model into a credible support tool rather than an overclaimed black box.

3. Add a traditional sales comparison grid with fewer, better comparables​

The strongest improvement would be a well-reconciled comparable sales grid using perhaps 4 to 8 highly competitive sales, not just 238 statistically analyzed sales.
For each comparable, the report should explain:
Why this sale competes with the subject.
Not merely because it is in Stafford County, but because it has similar location appeal, acreage utility, improvement type, condition, price range, and buyer profile.
What differences require adjustment.
GLA, basement area, site size, condition, quality, age/effective age, garage, outbuildings, road access, privacy, view, pool, topography, and location.
Which adjustments are strongly supported and which are judgmental.
For example, GLA and age may be statistically supported. Condition, quality, privacy, and functional utility may need paired sales, broker interviews, listing analysis, or appraiser judgment.
The final value should emerge from a narrowed adjusted range, not merely from the regression point estimate.


4. Use the regression to calibrate adjustments, then temper them​

The OLS table gives candidate adjustments:

FeatureOLS adjustment
Above-grade GLA+$79/SF
Basement finished SF+$35/SF
Lot size+$11,462/acre, capped at 6 acres
Age-$716/year
Full bath+$15,495
Garage space+$16,235
Pool+$15,715
Fireplace+$6,077
Some of these could be useful in the sales grid, but I would not automatically adopt them. I would convert them into market-supported adjustment ranges.
For example:
GLA: The model says about $79/SF. I might test a range such as $65–$90/SF against paired sales and residual behavior.
Basement finished area: The model says $35/SF. That may be plausible if basement finish is secondary living area. But it should be checked against quality, walkout utility, ceiling height, bedroom/bath presence, and market norms.
Lot size: The model says $11,462/acre up to 6 acres. That is plausible only if the acres are similar in utility. Usable cleared acreage, wooded acreage, steep land, wetland, floodplain, and excess land should not receive the same unit rate.
Age: The model says -$716/year, but actual buyer reaction is usually more about effective age, modernization, condition, and remaining economic life than chronological age. I would not apply age mechanically.
Half bath: I would probably set this at zero or nominal, as the note suggests, unless separate market evidence supports a real effect.
Deck/patio/porch: I would not use the +$289 adjustment. That says “no reliable adjustment.” A superior outdoor living area might still need a qualitative adjustment, but the binary variable is not strong enough.


5. Add coefficient uncertainty and statistical reliability​

The OLS exhibit would be much more persuasive if it showed:

FeatureCoefficientStd. errort-stat / p-valueConfidence interval
Right now the table gives a false sense of precision. For example, garage space +$16,235 may look precise, but without a standard error we do not know whether a reasonable confidence interval is, say, +$5,000 to +$27,000 or -$2,000 to +$34,000.
For appraisal support, I would especially want confidence intervals around:
GLA,
basement finish,
lot size,
age/effective age,
garage,
pool, and
HOA/subdivision effect.
Then the appraiser could say:

The adjustment applied in the sales grid falls within the statistically supported range and is consistent with paired-sale and grouped-sale evidence.
That would be much stronger than simply reporting a coefficient.

6. Add residual diagnostics​

A credible regression-supported appraisal should show that the model is not systematically biased.
I would include at least these diagnostics:
Predicted vs. actual sale price plot.
This shows whether the model tracks the market reasonably.
Residual vs. predicted price plot.
This shows whether errors get larger at higher prices or whether the model systematically misses certain ranges.
Residual map.
For rural acreage, this is very important. If residuals cluster geographically, the model is missing location effects.
Residuals by subdivision / school district / rural area.
This would expose whether the model undervalues one area and overvalues another.
Outlier and influence table.
A few unusual acreage properties can distort coefficients.
This is especially important because the displayed R² of 0.581 means a large part of price variation is not explained by the model. The appraiser needs to show that the unexplained part does not invalidate the subject indication.


7. Add out-of-sample validation​

The displayed RMSE appears to be an in-sample error measure. That is useful, but not enough. A better report would include out-of-sample testing.
For example:
Repeated holdout validation:
Randomly hold out 20% of the sales, fit the model on the remaining 80%, and predict the held-out sales. Repeat this many times.
Report:

MetricIn-sampleCross-validated / holdout
0.581maybe lower
RMSE$50,730likely higher
Median absolute erroruseful
Mean absolute percentage erroruseful
If the out-of-sample RMSE remains near 10%, that supports model stability. If it jumps to 14% or 18%, the model is less reliable than the exhibit implies.
For appraisal support, I would place more weight on holdout error than in-sample R².

(to be continued)
 

(continued)

8. Separate location from physical characteristics

This is probably one of the largest weaknesses in the shown OLS table. The model includes physical characteristics, but the screenshot does not show strong location controls.
For rural/acreage property, location is not just “Stafford County.” Important location factors may include:
commute distance to I-95, Quantico, Fredericksburg, or D.C.-oriented employment,
school district,
road type and access,
subdivision versus non-subdivision setting,
waterfront or water influence,
topography,
privacy,
neighborhood price level,
proximity to commercial services,
Fauquier/Spotsylvania/Culpeper market influence,
utility availability,
broadband availability,
road maintenance quality, and
zoning/land-use constraints.
The HOA coefficient is a clue that location/subdivision quality may be leaking into a crude HOA variable. That is dangerous. A better model would include specific location controls, such as market area, school zone, subdivision class, distance variables, or geospatial smoothing.
In your own terminology, Bert, this is where a residual component may be carrying unmodeled condition, quality, appeal, location, and utility.

9. Consider a semi-parametric or non-parametric model as a check

OLS is understandable and transparent, which is valuable in appraisal. But the relationship between price and rural features is often nonlinear.
I would consider adding one or more of these as secondary checks:
GAM:
Useful for smooth nonlinear effects of GLA, age, acreage, and distance.
MARS:
Very useful for thresholds, hinges, and interactions, such as acreage value flattening after a certain point.
Random forest or gradient boosting:
Useful as prediction checks, but less transparent. I would use them cautiously and not as the primary explanation.
Quantile regression:
Useful if the model behaves differently at lower versus upper price points.
A strong report might say:
OLS was used for transparency and adjustment support. A MARS/GAM check produced a similar subject indication, suggesting the OLS conclusion is not merely an artifact of linear functional form.
That would give the value indication much more credibility.

10. Improve the prediction interval explanation

The 95% range of $471k–$670k is useful, but the report should explain exactly what it is.
There are at least two different possibilities:
Confidence interval for the mean predicted value
This answers: “Where is the average expected price for properties like the subject?”
Prediction interval for an individual sale
This answers: “Where might an actual sale price fall for one individual property?”
For appraisal, the second is often more relevant, but it is wider. The screenshot says “95% range,” but not whether it is a confidence interval, prediction interval, bootstrapped interval, or rule-of-thumb RMSE interval.
I would explicitly label it:

95% individual prediction interval: $471,000–$670,000
or
95% confidence interval for model-indicated mean value: $–$
Those are very different. A reviewer would notice.

11. Reconcile the contract price carefully

The current exhibit says:
Model indication: $570,215
Contract price: $585,000
Variance: 2.5%
A better appraisal would not stop there. It would ask:
Was the contract arm’s-length?
Was it exposed to the open market?
Were there concessions?
How many days on market?
Were there multiple offers?
Was the property listed on MLS?
Did the contract include personal property?
Were seller-paid costs included?
Were inspection concessions or repair credits negotiated?

If the sale was exposed to the market and the contract was arm’s-length, the contract price itself is very meaningful evidence. The regression then becomes a reasonableness test.
A strong statement would be:

The contract price of $585,000 is supported by the adjusted comparable sales range and is also within the regression model’s expected range. The model indication of $570,215 is 2.5% below the contract price, a difference substantially smaller than the model’s typical prediction error. Therefore, the regression does not suggest that the contract price is unsupported.
That is defensible.

12. Use bracketing more explicitly

For a residential appraisal, especially one going through lender review, bracketing still matters.
The comparable set should bracket the subject on major features where possible:

FeatureSubject should be bracketed by comps on
Sale pricelower and higher sales
GLAsmaller and larger homes
Site sizesmaller and larger lots, or explain why not
Age/effective ageolder and newer
Conditioninferior and superior
Qualityinferior and superior
Basement finishless and more, if relevant
Garagefewer and more spaces
Locationinferior and superior, where possible
The OLS model can support adjustments, but good comparable selection still persuades the reader.

(to be continued)
 

(continued)

13. Add a “model applicability to subject” section

This is a simple but powerful addition.
The report should compare the subject to the model sample:
VariableSubjectModel medianModel rangeSubject position
GLAwithin / high / low
Basement SFwithin / high / low
Acres0–6 capwithin / high / low
Agewithin / high / low
Garage spaceswithin / high / low
Bathswithin / high / low
The appraiser needs to show the subject is not an extrapolation case. If the subject has 12 acres but the model caps acreage at 6 acres, the report should explain whether the extra acreage is excess land, surplus land, non-contributory privacy acreage, or separately marketable land.

14. Add competing indications and reconciliation

A better appraisal would include several indications:
MethodIndication
Adjusted comparable salesperhaps $575k–$600k
OLS model indication$570k
Alternative model checkperhaps $565k–$590k
Contract price$585k
Final reconciled valueperhaps $585k, if supported
The final value should be explained as a reconciliation, not just a model output.
For example:

The adjusted comparable sales provide the strongest direct evidence and indicate a range of $575,000 to $600,000, with greatest weight to Sales 2, 4, and 5. The OLS model indicates $570,215, slightly below but reasonably consistent with the comparable sales and contract price. A sensitivity analysis using alternate GLA and acreage adjustments produced indications from $565,000 to $590,000. Given the subject’s superior condition and market exposure, the final opinion is reconciled at $585,000.
That kind of reconciliation is much more convincing.

15. My suggested appraisal structure

I would structure the improved appraisal this way:
  1. Define the subject’s competitive market segment.
    Not just county, but buyer profile, geography, and property type.
  2. Analyze the contract.
    Exposure, concessions, DOM, arm’s-length nature, list-to-sale behavior.
  3. Select 4–8 highly competitive comparables.
    Explain why each was chosen.
  4. Develop a traditional sales comparison grid.
    Use market-supported adjustments.
  5. Use OLS to support adjustment ranges.
    Include coefficient uncertainty, diagnostics, and model limitations.
  6. Validate the model.
    Show holdout RMSE, residual plots, and subject applicability.
  7. Add nonlinear/statistical cross-checks if available.
    GAM/MARS would be particularly useful.
  8. Reconcile all indications.
    Give greatest weight to the best comparable sales, with regression as support.

Bottom-line opinion

The current OLS exhibit is a good start, but it mainly says:
“The contract price is plausible relative to a broad regression model.”
A stronger appraisal should say:
“The contract price is supported by the best comparable sales, by market-derived adjustment evidence, by a validated regression model, and by a reconciliation that explains why this point in the indicated range is most probable.”
That is the difference between a statistical reasonableness check and a well-supported appraisal value opinion."
 
Find a Real Estate Appraiser - Enter Zip Code

Copyright © 2000-, AppraisersForum.com, All Rights Reserved
AppraisersForum.com is proudly hosted by the folks at
AppraiserSites.com
Back
Top