ChatGPT can get worse over time, Stanford study finds | Fortune

cyu@sh.itjust.works · 1 year ago

ChatGPT can get worse over time, Stanford study finds | Fortune

Lvxferre@lemmy.ml · 1 year ago

I totally agree that LLMs don’t currently have a ‘concept’ of what they’re being asked. But

And yet that’s essential. More on that later.

But it’s also pretty easy to ask questions of young kids that go beyond what they have a mental map of and get nonsense.

Look at which type of nonsense they generate. It shows higher (albeit still imperfect) awareness that utterances are supposed to be internally and externally consistent - i.e. that an utterance shouldn’t contradict itself and the available world knowledge of the child. It’s still leagues above the LLM “hallucinations”.

Also compare the relatively small amount of input that a child receives through her life, versus the large amount of info crammed into those large language models. Even the children are doing something important with the data, that those language models don’t do yet. That’s why I say that LLMs doesn’t use the data well.

Here’s what ChatGPT has to say in response to your question:

I was able to sic ChatGPT to output the same sort of nonsense that Bart gave me. But focusing on your answer: before your second prompt, why was ChatGPT listing things that are clearly unrelated to behaviour? And even in my answer, the same problem persists:

[to ChatGPT, in English] What is the difference in behaviour between a potato and an orange?

Potatoes and oranges are both edible plant products, but they are quite different in terms of their behavior due to their distinct characteristics and properties.

Botanical classification:
·Potato: Potatoes belong to the Solanaceae family and are classified as tubers. The edible part of the potato is an underground stem or tuber that stores nutrients for the plant.
·Orange: Oranges belong to the Rutaceae family and are classified as citrus fruits. The edible part of the orange is the fleshy fruit that surrounds the seeds.

Growth and cultivation:
·Potato: Potatoes are usually grown in the ground and harvested after the plant’s foliage has died down. They prefer cool climates and well-drained, loose soil.
·Orange: Orange trees are typically grown in tropical or subtropical regions. They are large, evergreen trees that produce sweet, juicy fruits.

Appearance:
·Potato: Potatoes have a rough, brownish skin and are usually oval or oblong in shape.
·Orange: Oranges have a bright orange, smooth, and slightly dimpled skin, and they are round or oval-shaped.

Taste and texture:
·Potato: Potatoes have a starchy taste and a dense, solid texture when cooked. They can be boiled, baked, mashed, or fried.
·Orange: Oranges have a sweet and tangy taste with juicy, segmented flesh. They are typically eaten fresh or used to make juices.

Nutritional content:
·Potato: Potatoes are a good source of carbohydrates, particularly starch, and they also contain some vitamins and minerals like vitamin C and potassium.
·Orange: Oranges are rich in vitamin C, a powerful antioxidant, as well as dietary fiber, potassium, and various vitamins and minerals.

Culinary use:
·Potato: Potatoes are versatile and used in various cuisines worldwide, often as a side dish or a main ingredient in dishes like fries, mashed potatoes, and stews.
·Orange: Oranges are commonly consumed as fresh fruit or used in various beverages, desserts, and savory dishes to add a citrusy flavor.

Storage:
·Potato: Potatoes have a relatively long shelf life and can be stored in a cool, dark place for several weeks to several months.
·Orange: Oranges have a shorter shelf life and are best stored in the refrigerator for a few weeks.

In summary, the behavior of a potato and an orange differs significantly due to their botanical classification, growth conditions, appearance, taste, nutritional content, culinary use, and storage requirements.

Bart’s “oranges are passive, potatoes are active” output was hilarious but at least Bart listed things that could be creatively interpreted as behaviour. In the meantime, ChatGPT simply ignored the word for your first prompt, until you emphasised it with a second prompt; and for mine, it assigned it to a big, contextually irrelevant info dump, about inherent attributes of both entities that cannot be interpreted as behaviour.

And we might say “it might get right in some situations, depending on the prompt”, but how it reaches those conclusions (right or wrong) matters too. Learning a language is also about the internal process yielding that output. And that is not just theoretical babble; if we can’t model the process in a somewhat decent way, we get inconsistent and unreliable output (as it is now), that’s really bad for a tool. Garbage input → garbage output; but also decent input + garbage algorithm → garbage output.

That’s why I said that concepts are essential. Learning how to handle concepts is an integral part of learning both language “as a faculty” and any instance of language (e.g. Mandarin, English, etc.)

There are more issues than just that, mind you, but I already wrote a big wall of text.

So, a smarter parrot?

Nope - a dumber parrot. Way dumber; I know that I’m the one who brought this comparison up, but in a hindsight it sounds like underestimating parrots by a mile. Parrots show signs of primitively associating things with words, and even handling abstractions like colour.

How far until it’s as good as any young kid?

If “it” = LLM, I do not think that it’ll be as good as a young kid, ever. Brute forcing it with more data won’t do the trick.

If “it” = machine learning, regardless of model: I think that it’s possible that it reaches the level of a young kid in some decades. (Source: I’m guessing it.)

And you still didn’t say anything about not using the data ‘well’. What would you like to see them doing?

I explained it across this comment, but by “using the data well” I mean that a good model should require less data to yield meaningful outputs. GPT3.5 for example had 45TB of data, and it was still not enough.

dave@feddit.uk · 1 year ago

Ok, I’m not going to go point by point, as this is getting too long. All I’d say is remember where the model for ML came from (McCulloch & Pitts), and that this is the worst AI will ever be.

If this is truly a jump across S-curves in utility, it’s bound to be slightly worse than other methods to begin with. Many of the arguments against the current approach sound like the owners of a hot air balloon business arguing with the Wright brothers.

Lvxferre@lemmy.ml · 1 year ago

The whole idea of artificial neurons (from McCulloch and Pitts) sounds for me like modelling a wing-flapping mechanism for airplanes. You can get something fun out of it, but I think that further progress will focus on reserve engineering the software (language as a faculty) instead of trying to mimic the underlying machine (human brains).

that this is the worst AI will ever be.

Probably? I think so, at least. I’m not too eager to make a “hard” statement about future tech, though.

Note that my criticism is not towards the development of language models and natural language processing, but specifically against the current state of art technology (LLM).

Many of the arguments against the current approach sound like the owners of a hot air balloon business arguing with the Wright brothers.

That doesn’t say much about the validity of the arguments. And I bet that a lot of people voicing arguments against Dumont or the Wight brothers were actually correct.

dave@feddit.uk · 1 year ago

Definitely LLMs have been over promised and/or misrepresented in mainstream media, but even in the last few months their utility is increasing. I’m a big advocate of finding ways to use them to enhance people (thinking partner not replacement for thinking). They are most certainly a tool, and you need to know their limitations and how to use them.

From experience working with naive end users, they are anthropomorphising based on how the models have been reported and that’s definitely not helpful.

As the models get more and more capable (and I’m pretty happy to make that prediction), will they reach a point where they are indistinguishable from the output of a real person? That will give us some challenges. But the interesting thing for me is that when that happens, and the AI can write that report you were paying someone to write, what was the point of the report? You could argue they were some kind of terrible UBI and we’ll end up with just the pointless output without the marginal benefit of someone’s livelihood. That needs a bigger rethink.

dave@feddit.uk · 1 year ago

In fact, see this for some similar hyperbole and sentiment.