RCA
Elite Member
- Joined
- Jun 27, 2017
- Professional Status
- Certified General Appraiser
- State
- California
This may be of interest to a few of you:
MARS is the absolute best tool for SCA. It is not just regression, it has striking resemblances to both Convolutional Neural Networks (CNN) used in popular AI image processing systems like ResNet and Transformers used in most LLMs such as ChatGPT, Grok-3 and Claude.
MARS is better for appraisal because the models it creates can be explained and discussed and understood by humans. CNN and Transformers simulate neurons and neural paths in the brain, in particular the growth of new neural pathways through training. You hear exclamations, such as by people designing these system that they don't really understand why they work as good as they do:
medium.com
Anyway:
From Grok-3:
From ChatGPT-4o:
Question: Is there any similarity between MARS regression and Transformer algorithms?
Interesting question! On the surface, MARS (Multivariate Adaptive Regression Splines) and Transformers might seem worlds apart—they come from different eras and are used for different purposes—but they do share a few high-level ideas. Here's a breakdown:
Want me to illustrate this with a concrete toy example or code snippet?
MARS is the absolute best tool for SCA. It is not just regression, it has striking resemblances to both Convolutional Neural Networks (CNN) used in popular AI image processing systems like ResNet and Transformers used in most LLMs such as ChatGPT, Grok-3 and Claude.
MARS is better for appraisal because the models it creates can be explained and discussed and understood by humans. CNN and Transformers simulate neurons and neural paths in the brain, in particular the growth of new neural pathways through training. You hear exclamations, such as by people designing these system that they don't really understand why they work as good as they do:

We don’t understand *exactly* how LLMs work. But there is progress.
It’s fascinating that LLMs are being studied as if they’re a complex pre-existing natural phenomenon that calls for investigation.
Anyway:
From Grok-3:
From ChatGPT-4o:
Question: Is there any similarity between MARS regression and Transformer algorithms?
Interesting question! On the surface, MARS (Multivariate Adaptive Regression Splines) and Transformers might seem worlds apart—they come from different eras and are used for different purposes—but they do share a few high-level ideas. Here's a breakdown:
What They Are:
- MARS is a non-parametric regression technique that models relationships by automatically creating piecewise linear splines (hinge functions) to fit data flexibly.
- Transformers are deep learning architectures designed primarily for sequence modeling, especially in natural language processing. They use self-attention mechanisms to capture dependencies between inputs.
Conceptual Similarities:
Concept | MARS | Transformers |
---|---|---|
Adaptive modeling | Chooses basis functions (splines) adaptively based on data | Learns weights/representations adaptively via attention |
Nonlinearity | Models nonlinear relationships using hinge functions | Models complex nonlinear interactions using deep layers and attention |
Feature interaction | Interactions can be added between features | Attention layers model pairwise interactions between tokens/features |
Data-driven structure | The model structure (number and placement of splines) is learned from the data | The attention mechanism learns which parts of the input to focus on, based on the data |
️Key Differences:
Aspect | MARS | Transformers |
---|---|---|
Purpose | Regression, predictive modeling | Sequence modeling, language understanding |
Model type | Classical, statistical model | Deep learning neural network |
Interpretability | Highly interpretable | Often a black box (though explainable AI tools exist) |
Data types | Tabular, structured data | Sequential, text, audio, image patches, etc. |
Similar Vibe:
Both MARS and Transformers adapt to the data rather than sticking with a fixed model structure. You could say they both "learn how to learn the shape of the data"—but the mechanisms and domains are very different.Want me to illustrate this with a concrete toy example or code snippet?
Last edited: