Recent comments in /f/MachineLearning
Yardanico t1_jdls342 wrote
Reply to comment by wojtek15 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Yeah, I think there's a lot of overhyping going around "running ChatGPT-grade language models on consumer hardware". They can "follow" instructions they same way as ChatGPT, but obviously those models know far, far less than the ClosedAI models do, and of course they'll hallucinate much more.
Although it's not an entirely bad thing, at least the community will innovate more so we might get something interesting in the future from this "push" :)
nekize t1_jdlrqnt wrote
Reply to comment by Normal_Antelope_2556 in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
Of course you can. Depending in which group you end up, there is a lot of cool stuff being done outside of NLP and Computer vision (if you consider these two “solved”).
light24bulbs t1_jdlrnll wrote
Reply to comment by elbiot in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
That's cool, that's exactly what I want to do. I'm hunting around for a ready-made pipeline to do that on top of a good open source model.
theotherquantumjim t1_jdlre84 wrote
thePaddyMK t1_jdlr6bp wrote
Reply to comment by plocco-tocco in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
There is a paper that operates a website to generate traces of data to sidestep tools like Selenium: https://mediatum.ub.tum.de/doc/1701445/1701445.pdf
It's only a simple NN, though, no LLM behind it.
thePaddyMK t1_jdlqyng wrote
Reply to comment by dankaiv in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
I think so, too. IMO this will open new ways for software development. There has already been work looking towards RL to find bugs in games. Like climbing walls that you should not. With a multimodal model there might be interesting new ways to debug and develop UIs.
__Maximum__ t1_jdlqolz wrote
Reply to comment by plottwist1 in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
It's community driven, so they are open open.
Normal_Antelope_2556 t1_jdlqc42 wrote
Reply to comment by nekize in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
as a person who inspires to go into research in this field,how bad is it? Can people even do their own research?
gopher9 t1_jdlq1jy wrote
Reply to comment by michaelthwan_ai in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
Add RWKV.
Crystal-Ammunition t1_jdlpzsw wrote
Reply to comment by Short_Change in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
At that point, the training data world have to almost completely be synthetic, right?
gmork_13 t1_jdlpq90 wrote
Reply to comment by learn-deeply in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
Sometimes I feel like a toddler for doing it, but I always scroll to the images first and for most papers that’s the TLDR.
visarga t1_jdlpf0h wrote
Reply to comment by mxby7e in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
What about data generated from Alpaca, is that unrestricted?
wojtek15 t1_jdlpai0 wrote
Reply to comment by ttkciar in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Exactly, I have seen many inaccurate claims, e.g. LLaMa-7B with Alpaca being as capable as ChatGPT. From my testing even much bigger LLaMa-30B with Alpaca is far worse than ChatGPT, can't even get simplest programming and common knowledge tasks right, and GPT3 ChatGPT get them right without any problems every time. I have not tried LLaMa-65B with Alpaca yet, because it has not being trained yet AFAIK, but I doubt it will be very different. GPT3 ChatGPT is 175B, maybe some 100B model can match it, but not 6B or 7B model, if someone claim this, he clearly don't know what he is talking about.
visarga t1_jdlpae7 wrote
Reply to comment by WarAndGeese in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
OpenAI has first hand RLHF data. Alpaca has second hand. Wondering if third hand is good enough and free of any restrictions.
Spud_M314 t1_jdlp71e wrote
Reply to comment by Nyanraltotlapun in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
Genetically alter the human brain to make more neocortical neurons and glia... That make brain more brainy, more gray matter, more smart stuff... A biological (human) superintelligence is more likely...
visarga t1_jdlp21i wrote
Reply to comment by master3243 in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
The combined effect of knowing what is possible and pressure to develop an alternative means replication effort will be huge.
Short_Change t1_jdlp0cw wrote
Reply to comment by Vegetable-Skill-9700 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Actually if his analogy is true, we will have 20 trillion parameters in the future for modern consumption.
visarga t1_jdloqee wrote
Reply to comment by ZetaReticullan in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
Most of our pre-2020 NLP skills are worthless now, what required bespoke models and datasets is just another emergent LLM ability. It's like a new starting line and we don't know what human skills will be valuable in the future.
visarga t1_jdlonpq wrote
Reply to comment by kromem in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
One way to speed this up is to make an extension for voluntary contributions of LLM interactions to open source. A user decides when a chat deserves to be donated to open source and pushes a button to share. I don't think OpenAI can object to users donating their data.
dont_tread_on_me_ t1_jdlonff wrote
Reply to comment by fishybird in [N] ChatGPT plugins by Singularian2501
Actually they cited it directly in their announcement post. Click on the ‘ideas’ link
visarga t1_jdloh24 wrote
Reply to comment by light24bulbs in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
Since RLHF finetuning is short, you can continue training your original model and RLHF again.
loly0ss t1_jdloewa wrote
Reply to [D] Simple Questions Thread by AutoModerator
Hello everyone,
I had a very ignorant question which I’m trying to find an answer too but i still couldn’t find it.
In terms of the deep learning model in supervised segmentation vs semi-superised segmentation.
Is the model itself the same in both cases, for example using Unet++ for both? And the only diffference comes during training where we use psuedo-labels for example for semi-supervised segmentation?
Or is the model different when it comes between supervised vs semi-supervised segmentation?
Thank you!
visarga t1_jdlo8hl wrote
Reply to comment by master3243 in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
Closed source on the generation end, but even more open than open source on the usage end. LLMs lift the open source idea to the next level.
Historical-Tree9132 t1_jdlo76y wrote
miss the dataset flow arrow to China-related model..
LeN3rd t1_jdls5jy wrote
Reply to [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
How big do models need to be until certain capabilities emerge? That is the actual question here, isn't it? Do smaller models perform as well in all tasks, or just the one they are trained for?