Recent comments in /f/MachineLearning

currentscurrents t1_jdn7spo wrote

Bigger models are more sample efficient for a given amount of data.

Scale is a triangle of three factors; model size, data size, and compute size. If you want to make more efficient use of data, you need to increase the other two.

In practice LLMs are not data limited right now, they're limited by compute and model size. Which is why you see models like LLaMa that throw huge amounts of data at smaller models.

4

pornthrowaway42069l t1_jdn6noe wrote

Not going to deny that GPT-4 looks impressive, but, they could set up 10 bajillion-quadrillion parameters, question is, do they have the data to effectively utilize all of these? Maybe its time to start looking into decreasing number of parameters, and making more efficient use of the data.

4

ajingnk t1_jdn5uwr wrote

What is the minimum hardware requirement to fine tune like Stanford Alpaca? I am thinking to build a workstation to do some DL exploration and fine-tuning work. For fine-tuning, I have around 10k samples.

1