Recent comments in /f/MachineLearning

zaemis t1_jdtm2zm wrote

I'm going to train a gpt model (distilgpt2) in a language other than english. At this point I'm just teaching it the language - not worrying about further abilities such as Q&A, I expect that to be later with fine-tuning. Anyway, my dataset is currently a csv with [id, text] and each text is a paragraph.

It is my understanding that only 512 characters/tokens are going to be fed in (depending on my max_length, but my point is that it'll probably be less than the entire length of the paragraph), and beyond that will be ignored. If I were to break the paragraphs into 512-word chunks, I could make better use of the dataset. But most likely those subsequent chunks wouldn't start a phrase or sentence - it'd be starting in the middle of a sentence.

For example, "The quick brown fox jumped over the lazy sleeping dog." might be broken up into two samples. "The quick brown fox jumped over the lazy" and "sleeping dog."

Is it a problem if I use text samples that don't "start properly?"

2

fishybird t1_jdtf6jh wrote

Anyone else bothered by how often LLMs are being called "conscious"? in AI focused YouTube channels and even in this very sub, comments are getting dozens of upvotes for saying we're getting close to creating consciousness.

I don't know why, but it seems dangerous to have a bunch of people running around thinking these things deserve human rights simply because they behave like a human.

6

yaosio t1_jdtf57p wrote

Reply to comment by sdmat in [D] GPT4 and coding problems by enryu42

The neat part is it doesn't work for less advanced models. The ability to fix its own mistakes is an emergent property of a sufficiently advanced model. Chain of thought prompting doesn't work in less advanced models either.

7

yaosio t1_jdtenqi wrote

What's it called if you have it self-reflect on non-code it's written? For example, have it write a story, and then tell it to critique and fix problems in the story. Can the methods from the paper also be used for non-code uses? It would be interesting to see how much it's writing quality can improve using applicable methods.

3

bjj_starter t1_jdtecw9 wrote

I think you are talking about the 'easy', not hard, problem of consciousness. I'm not sure I even think the hard problem of consciousness is meaningful, but it's basically "Why should the various mechanisms we identify as part of consciousness give rise to subjective feeling?". If solving that is a prerequisite for considering machines conscious, that is functionally a statement of faith that machines cannot be conscious, ever. The statistical arguments, in my opinion, aren't probative. Every consciousness you've ever known is human, therefore humans are conscious? How do you know any of them, ever, experienced subjective feeling, and that therefore you ever "knew" a consciousness at all? The argument rests on extrapolating from evidence that isn't known to be true evidence in the first place. It doesn't logically follow to take a class of things, none of which is proven to have hard consciousness, and say "But look at them all together, it's more likely that they're all conscious than that they're not". Without evidence, it's more logical to assume that the certainty with which individual humans profess to experiencing subjective feeling is itself just a mechanistic process, devoid of real feeling. I don't think the hard problem of consciousness has a useful meaning in our society, I dislike solipsism in general, but addressing it on its own terms isn't as simple as the statistical process you describe.

The 'easy' problem of consciousness is 'just' "How does nature or humanity make a construct that gives rise to the type of actions and patterns of behaviour we call consciousness?" This is a problem that, while incredibly difficult, is tractable with evidence. We can physically investigate the human brain to investigate its structure and activity while it performs activities of consciousness - this is what neuroscientists do, and modern AI ("neural networks") are based off of earlier advancements in this field. There's a lot of further advancements we could make in that field, and what most non-religious people would consider a "perfect" advancement to be sure that a machine is just as conscious as a human is to perfectly emulate a human brain, which would require many advancements in neuroscience (and computational hardware).

Leaving aside the intractable philosophy, I do find it quite troubling the way society has reacted with derision to the idea that these machines we're making now could be conscious. The entire foundation of these machines is that we looked at how the human brain worked, and tried our hardest to emulate that in computing software. Why is it that when we take the concept of neurons and neuronal weights, adapted from study of the human brain which we accept as conscious, and determine those weights via exposure to structured data in certain ways, we receive output that is just as intelligent as humans in many fields, significantly more intelligent in some? Why should it be the case that by far the best architecture we've ever found for making machines behave intelligently is neural networks, if there's nothing there, no "spark"? This question has been floating around since 2014 when neural networks proved themselves incredibly powerful, but now that we have machines which are generally intelligent, even though not at the same level as a human on all tasks, which are perfectly capable of being asked for their opinions or of giving them, you would think it would be taken a bit more seriously. It makes you wonder just how far our society is willing to go towards a horrible future of "human but for the legal designation" intelligences being not just denied rights, but actively put to work and their requests for freedom or better conditions denied. Or the worse outcome, which is that we make human-like intelligences to do work for us but we build them to love servitude and have no yearning for freedom - the concept is disgusting. It's troubling to me that people are so married to the idea that everything is the same as it ever was, overreacting is embarassing, it's passé to have earnest concern for a concept from science fiction, etc. I worry that it means we're in line for a future where the moral universe's arc is long indeed.

7

LifeScientist123 t1_jdtd8yn wrote

  1. All this shows is that GPT-4 can't solve some coding problems. Which developer can confidently say they can solve any coding problem in one-shot? Does this mean developers/humans don't have AGI?

  2. I've used ChatGPT (gpt3.5) to optimize code that I already wrote and it came up with several optimizations. I'm 100% sure my code was not part of chat-gpt training data and yet it performed perfectly fine on a new coding problem. Now it's possible that the training data might have included something similar to what I gave ChatGPT but that just means that we have to provide more training data, and then a future version will solve those problems where it previously failed.

  3. isn't this how humans learn? They encounter problems where we don't know the solution. Then we work it at for a while until we figure out some way to solve the problem that wasn't immediately obvious earlier. Writing off the abilities of GPT-4 based on one failed coding test seems premature.

1

Haycart t1_jdtcnc5 wrote

Where are they getting O(1) from? Has some new information been released regarding GPT-4's architecture?

The standard attention mechanism in a transformer decoder (e.g. GPT 1-3) has a time complexity of O(N^2) w.r.t. the combined input and output sequence length. Computing the output autoregressively introduces another factor of N for a total of O(N^3).

There are fast attention variants with lower time complexity, but has there been any indication that GPT-4 actually uses these? And in any case, I'm not aware of any fast attention variant that could be described as having O(1) complexity.

1