AI Is Supposed To Be "Smart" and "Tireless" - But,

RCA · Tuesday at 5:50 AM

Others are reporting on this more and more. In 2025, the problem was ChatGPT and other LLMs "Hallucinating" and doing really screwy things.

Now it seems to be getting very lazy and forgetful like people.

Its script is failing -> I ask it why -> It says it is trying to access port 5432, but it is blocked for some reason, so it will change the script to use port 5433. And then 5433 eventually gets blocked, and it goes back to 5432. And back and forth until I ask it to "really" fix the problem. And it says, "Oh sure, you're right, I will fix it this time." And well, it seems to work for a day or so, and then it doesn't work anymore. Or it fixes a bug and introduces another, and back and forth you go until you get after it and explain you can't live with this behavior, and the problem really needs to be fixed. And sometimes, you have to keep after it until it seems to get everything right. Just right. If only you could get it to remember that right way, because inevitably, sooner or later, it will screw it up again. You can tell it to write notes in the code (as Markdown files or notes), which certainly helps. But when things get complicated, i.e., not simple anymore, they can have trouble finding information and start to screw up again.

But at least, I guess, it seems so human, although irritatingly stupid and incompetent. And I can only tell myself it is my fault for not making things clear enough up front, although I am not yet convinced that even that would help. At least I have Alexa and Siri around to say nice things to me and calm me down.

HVNY · Tuesday at 10:43 AM

They will never replicate the human brain. Maybe in about a zillion years....but we'll be vaporized by an exploding star by then.

Vermonter · Tuesday at 11:04 AM

RCA said:
Others are reporting on this more and more. In 2025, the problem was ChatGPT and other LLMs "Hallucinating" and doing really screwy things.

Now it seems to be getting very lazy and forgetful like people.

Its script is failing -> I ask it why -> It says it is trying to access port 5432, but it is blocked for some reason, so it will change the script to use port 5433. And then 5433 eventually gets blocked, and it goes back to 5432. And back and forth until I ask it to "really" fix the problem. And it says, "Oh sure, you're right, I will fix it this time." And well, it seems to work for a day or so, and then it doesn't work anymore. Or it fixes a bug and introduces another, and back and forth you go until you get after it and explain you can't live with this behavior, and the problem really needs to be fixed. And sometimes, you have to keep after it until it seems to get everything right. Just right. If only you could get it to remember that right way, because inevitably, sooner or later, it will screw it up again. You can tell it to write notes in the code (as Markdown files or notes), which certainly helps. But when things get complicated, i.e., not simple anymore, they can have trouble finding information and start to screw up again.

But at least, I guess, it seems so human, although irritatingly stupid and incompetent. And I can only tell myself it is my fault for not making things clear enough up front, although I am not yet convinced that even that would help. At least I have Alexa and Siri around to say nice things to me and calm me down.

Are you using one LLM for your work or multiple different models?

Been going the multiple model route for a project I've been working on and it performs much better than a single LLM. Some RAG, some stock, currently attempting to add overseeing and management agents. Fewer errors and hallucinations and they're easier to track and isolate.

Currently using Flowise which is a budget version of n8n and open source models. If productive and profitable I would probably move to paid subscriptions and a larger server to self host.

TerryRohrer · Tuesday at 11:39 AM

Vermonter said:
Are you using one LLM for your work or multiple different models?

Been going the multiple model route for a project I've been working on and it performs much better than a single LLM. Some RAG, some stock, currently attempting to add overseeing and management agents. Fewer errors and hallucinations and they're easier to track and isolate.

Currently using Flowise which is a budget version of n8n and open source models. If productive and profitable I would probably move to paid subscriptions and a larger server to self host.

Have you looked in to any of the "agents" like Genspark, which claims to allow unlimited access to several versions of ChatGPT, Grok, Claude, and others, all for $11.99 per month? Trying to figure how to play without breaking the bank for a near term sideline.

Vermonter · Tuesday at 12:04 PM

TerryRohrer said:
Have you looked in to any of the "agents" like Genspark, which claims to allow unlimited access to several versions of ChatGPT, Grok, Claude, and others, all for $11.99 per month? Trying to figure how to play without breaking the bank for a near term sideline.

I have not. Currently using open source models that I self host and free versions of the big boys. It works for what I need at the moment. Nothing I'm doing requires cutting edge models tho. At least not yet. Not quite at the agentic flow level in my project, but it is on the horizon.

Claude seems to be giving me the best output for what I'm doing. It is a little weird with it's responses tho. Politeness isn't what I care about so I have a prompt that cuts that BS out. It is limited to 40 chats per day, but I haven't hit the limit yet. ChatGPT tends to require more corrections but there are prompts to weed thru that too. I haven't tested any of the other bigs yet tho.

$12 a month seems like a deal tho if you need higher level AI and different sources.

What are you trying to do with it?

TerryRohrer · Tuesday at 12:23 PM

Vermonter said:
What are you trying to do with it?

Get started! I have used ChatGPT for a few tasks, such as parsing data (text files, csv data, etc), but it can get into an endless loop of trying to correct the errors and inconsistencies it introduces.

I fed it a spreadsheet of data I had analyzed with regression to develop adjustments in a neighborhood and have an issue that I can not find an explanation for (unfinished basement value approaching that of GLA but a third higher than finished basement area). I was mildly surprised when it concluded a similar set of adjustments, but when I asked it to address the specific issue, it responded with volumes of "stuff" without any insight. I think there is significant opportunity in using AI to look for patterns in data that might explain this type of anomaly.

I also need to create a process to compare public data and MLS data to identify discrepancies to facilitate faster modeling (as always, the data scrubbing is the time suck). Here, realtors almost always rely on public data for listings, but somehow don't understand how to report the same number twice. I would like to isolate the stupid and intentionally misleading data so that I can focus on those records where their actual knowledge of the home is likely preferable to public data.

Vermonter · Tuesday at 1:07 PM

Yeah this is where agentic AI comes in....models analyzing and correcting output from other models. Very complicated to setup tho as a beginner.

Are you using one instance (chat) for different inquiries? I'm getting the best results keeping a chat going with only slight variations on the same question. So in your case use that same spreadsheet with different data on the same chat and it should improve the output over time. If you start a new instance you may have the same error every time.

Same with the second example. Use the same chat as the template to analyze the difference every time and it should improve accuracy.

TerryRohrer · Tuesday at 1:12 PM

Do you find the self-hosted model is hamstrung or limited compared to the public model, or are they pretty much equal at the outset? In addition to protecting your data, results, and processes, are you seeing decent results in training within your own system?

Vermonter · Tuesday at 1:33 PM

TerryRohrer said:
Do you find the self-hosted model is hamstrung or limited compared to the public model, or are they pretty much equal at the outset? In addition to protecting your data, results, and processes, are you seeing decent results in training within your own system?

That's more my issue. My project has financial data I want to keep in house, so self hosting is more secure. There definitely are limitations to self hosted models on consumer grade hardware, but you can navigate with the right workflow. Keeping the models narrowly focused like above helps a lot.

The big boys are definitely better at complex inquiries and general use covering a vast array of topics, but multiple models focused on a small group of individual tasks works just as well locally hosted.

RCA · Tuesday at 1:45 PM

Vermonter said:
Are you using one LLM for your work or multiple different models?

Been going the multiple model route for a project I've been working on and it performs much better than a single LLM. Some RAG, some stock, currently attempting to add overseeing and management agents. Fewer errors and hallucinations and they're easier to track and isolate.

Currently using Flowise which is a budget version of n8n and open source models. If productive and profitable I would probably move to paid subscriptions and a larger server to self host.

I am using the $100/month Claude Code for now. Some recommend the $200/month, claiming they get $300-$400 in value. But that would be min $2400/year! It does make you wonder about getting a nice Mac Studio M5 Ultra when they come out next year for $10K or so. 4x$2400 for 4 years use of Claude Code would pay for it. However, even an M5 Ultra will not replace something like Claude Code.

I do have the $30/month Grok and $20/month ChatGPT. But I am seriously thinking about terminating those subscriptions and saving $50/month, which would save $1200 over two years. The thing is, despite all the work that Claude Code does (aside from the bumps we run into from time to time, I still have to spend time doing very targeted research, planning and design. That slows me up.

E.g., I have switched to PostgreSQL for my MLS data. And I decided to start from scratch with a more efficient download design. It works pretty well. -- But it takes a good number of Zsh shell scripts, Python, and Prolog to get everything done. I am downloading all data for all Counties my MLS has a broker feed for. That is San Mateo, Santa Clara, Santa Cruz, Monterey, Alameda, and Contra Costa. - Plus ALL photos --> going back to 2000. A ton of data. Then I have all kinds of other data to merge in, such as the County Assessor and Planning GIS data. And I make my own latitude and longitude from PostGIS centroids to get very accurate (the most accurate) latitude and longitude data (floating point accuracy). And of course, I rounded all of those values down to 3 decimal points for MARS analysis to prevent overfitting. Then there is the elevation data as well. I have all the property data downloaded, just some photos in Alameda, Contra Costa and Monterey to be downloaded. Now, I can do all kinds of things with that data, such as define market areas and neighborhoods everywhere, ones superior and based on the latest data. I create my own neighborhoods and market areas, give them a name and then add that info to the properties for MARS to use. Now, I know, as I have done it in the past, that one smaller county like San Mateo is a lot of work doing one MARS analysis for each city. So, it will take me a while to analyze and do the work for all of these counties and set up a system to update the reports every couple of months.

Now, actually, I can pull in about all of the available data for San Diego, Los Angeles and San Bernardino counties from my own MLS, if I wanted to. IF I can successfully automate the SF Bay Area counties, I might consider that. But, then again, it probably won't make sense unless I start doing reviews. Because, I am probably more interested in getting into the valuation of equipment, in particular robots and automated machinery, as well as Ag and other commercial.

I think being able to value machinery is very important, actually for other areas. Everything is getting more dependent on robots and automated machinery.

...

AI Is Supposed To Be "Smart" and "Tireless" - But,

RCA

Elite Member

HVNY

Freshman Member

Vermonter

Elite Member

TerryRohrer

Elite Member

Vermonter

Elite Member

TerryRohrer

Elite Member

Vermonter

Elite Member

TerryRohrer

Elite Member

Vermonter

Elite Member

RCA

Elite Member