LeN3rd t1_jdls5jy wrote on March 25, 2023 at 10:06 AM

Reply to [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

How big do models need to be until certain capabilities emerge? That is the actual question here, isn't it? Do smaller models perform as well in all tasks, or just the one they are trained for?

Yardanico t1_jdls342 wrote on March 25, 2023 at 10:05 AM

Reply to comment by wojtek15 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Yeah, I think there's a lot of overhyping going around "running ChatGPT-grade language models on consumer hardware". They can "follow" instructions they same way as ChatGPT, but obviously those models know far, far less than the ClosedAI models do, and of course they'll hallucinate much more.

Although it's not an entirely bad thing, at least the community will innovate more so we might get something interesting in the future from this "push" :)

nekize t1_jdlrqnt wrote on March 25, 2023 at 10:00 AM

Reply to comment by Normal_Antelope_2556 in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501

Of course you can. Depending in which group you end up, there is a lot of cool stuff being done outside of NLP and Computer vision (if you consider these two “solved”).

light24bulbs t1_jdlrnll wrote on March 25, 2023 at 9:59 AM

Reply to comment by elbiot in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

That's cool, that's exactly what I want to do. I'm hunting around for a ready-made pipeline to do that on top of a good open source model.

theotherquantumjim t1_jdlre84 wrote on March 25, 2023 at 9:55 AM

Reply to comment by greenskinmarch in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501

No! Not like that!

thePaddyMK t1_jdlr6bp wrote on March 25, 2023 at 9:52 AM

Reply to comment by plocco-tocco in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

There is a paper that operates a website to generate traces of data to sidestep tools like Selenium: https://mediatum.ub.tum.de/doc/1701445/1701445.pdf

It's only a simple NN, though, no LLM behind it.

thePaddyMK t1_jdlqyng wrote on March 25, 2023 at 9:49 AM

Reply to comment by dankaiv in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

I think so, too. IMO this will open new ways for software development. There has already been work looking towards RL to find bugs in games. Like climbing walls that you should not. With a multimodal model there might be interesting new ways to debug and develop UIs.

Maximum t1_jdlqolz wrote on March 25, 2023 at 9:45 AM

Reply to comment by plottwist1 in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

It's community driven, so they are open open.

Normal_Antelope_2556 t1_jdlqc42 wrote on March 25, 2023 at 9:40 AM

Reply to comment by nekize in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501

as a person who inspires to go into research in this field,how bad is it? Can people even do their own research?

gopher9 t1_jdlq1jy wrote on March 25, 2023 at 9:35 AM

Reply to comment by michaelthwan_ai in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

Add RWKV.

Crystal-Ammunition t1_jdlpzsw wrote on March 25, 2023 at 9:34 AM

Reply to comment by Short_Change in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

At that point, the training data world have to almost completely be synthetic, right?

gmork_13 t1_jdlpq90 wrote on March 25, 2023 at 9:31 AM

Reply to comment by learn-deeply in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501

Sometimes I feel like a toddler for doing it, but I always scroll to the images first and for most papers that’s the TLDR.

visarga t1_jdlpf0h wrote on March 25, 2023 at 9:26 AM

Reply to comment by mxby7e in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

What about data generated from Alpaca, is that unrestricted?

wojtek15 t1_jdlpai0 wrote on March 25, 2023 at 9:24 AM

Reply to comment by ttkciar in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Exactly, I have seen many inaccurate claims, e.g. LLaMa-7B with Alpaca being as capable as ChatGPT. From my testing even much bigger LLaMa-30B with Alpaca is far worse than ChatGPT, can't even get simplest programming and common knowledge tasks right, and GPT3 ChatGPT get them right without any problems every time. I have not tried LLaMa-65B with Alpaca yet, because it has not being trained yet AFAIK, but I doubt it will be very different. GPT3 ChatGPT is 175B, maybe some 100B model can match it, but not 6B or 7B model, if someone claim this, he clearly don't know what he is talking about.

visarga t1_jdlpae7 wrote on March 25, 2023 at 9:24 AM

Reply to comment by WarAndGeese in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

OpenAI has first hand RLHF data. Alpaca has second hand. Wondering if third hand is good enough and free of any restrictions.

Spud_M314 t1_jdlp71e wrote on March 25, 2023 at 9:23 AM

Reply to comment by Nyanraltotlapun in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501

Genetically alter the human brain to make more neocortical neurons and glia... That make brain more brainy, more gray matter, more smart stuff... A biological (human) superintelligence is more likely...

visarga t1_jdlp21i wrote on March 25, 2023 at 9:21 AM

Reply to comment by master3243 in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

The combined effect of knowing what is possible and pressure to develop an alternative means replication effort will be huge.

Short_Change t1_jdlp0cw wrote on March 25, 2023 at 9:20 AM

Reply to comment by Vegetable-Skill-9700 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Actually if his analogy is true, we will have 20 trillion parameters in the future for modern consumption.

visarga t1_jdloqee wrote on March 25, 2023 at 9:16 AM

Reply to comment by ZetaReticullan in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

Most of our pre-2020 NLP skills are worthless now, what required bespoke models and datasets is just another emergent LLM ability. It's like a new starting line and we don't know what human skills will be valuable in the future.

visarga t1_jdlonpq wrote on March 25, 2023 at 9:15 AM

Reply to comment by kromem in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

One way to speed this up is to make an extension for voluntary contributions of LLM interactions to open source. A user decides when a chat deserves to be donated to open source and pushes a button to share. I don't think OpenAI can object to users donating their data.

dont_tread_on_me_ t1_jdlonff wrote on March 25, 2023 at 9:15 AM

Reply to comment by fishybird in [N] ChatGPT plugins by Singularian2501

Actually they cited it directly in their announcement post. Click on the ‘ideas’ link

visarga t1_jdloh24 wrote on March 25, 2023 at 9:12 AM

Reply to comment by light24bulbs in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

Since RLHF finetuning is short, you can continue training your original model and RLHF again.

loly0ss t1_jdloewa wrote on March 25, 2023 at 9:11 AM

Reply to [D] Simple Questions Thread by AutoModerator

Hello everyone,

I had a very ignorant question which I’m trying to find an answer too but i still couldn’t find it.

In terms of the deep learning model in supervised segmentation vs semi-superised segmentation.

Is the model itself the same in both cases, for example using Unet++ for both? And the only diffference comes during training where we use psuedo-labels for example for semi-supervised segmentation?

Or is the model different when it comes between supervised vs semi-supervised segmentation?

Thank you!

visarga t1_jdlo8hl wrote on March 25, 2023 at 9:09 AM

Reply to comment by master3243 in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

Closed source on the generation end, but even more open than open source on the usage end. LLMs lift the open source idea to the next level.

Historical-Tree9132 t1_jdlo76y wrote on March 25, 2023 at 9:08 AM

Reply to [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

miss the dataset flow arrow to China-related model..

Recent comments in /f/MachineLearning