Technology

59192 readers

2452 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

The Future of Large Language Model Pre-training is Federated (arxiv.org)

submitted 5 months ago* (last edited 5 months ago) by Hackworth@lemmy.world to c/technology@lemmy.world

8 comments fedilink hide all child comments

Also See: Worldwide Federated Training Of Language Models

Claude's Summary:

The two papers, "Worldwide Federated Training of Language Models" by Iacob et al. and "The Future of Large Language Model Pre-training is Federated" by Sani et al., both propose using federated learning (FL) as a new paradigm for pre-training large language models (LLMs). The main ideas are:

FL allows leveraging more data and compute resources from multiple organizations around the world, while keeping the data decentralized and private. This can enable training larger LLMs on more diverse data compared to centralized training.
FL relaxes synchronization requirements and reduces communication overheads compared to data-parallel distributed training, making it feasible for geographically distributed participants with varying hardware and connectivity.
The papers present systems and algorithms for enabling efficient federated pre-training of LLMs at billion-parameter scales. Key techniques include allowing participants to modulate their amount of local training based on resource constraints, and partially personalizing models to clusters of participants with related data.
Experimental results show federated LLM pre-training can match or exceed centralized training performance, with the performance gap narrowing as model size increases to billions of parameters. Larger federated models also converge faster and are more robust.
Challenges include data and hardware heterogeneity across participants. The papers propose techniques like adaptive aggregation and load balancing to mitigate these issues.

In summary, the papers argue federated learning is a promising new direction for democratizing LLM pre-training by allowing many more organizations to collaboratively train large models on their combined data and compute resources. Let me know if you would like me to expand on any part of the summary or papers in more detail.

top 8 comments

sorted by: hot top controversial new old

[–] Martineski@lemmy.dbzer0.com 12 points 5 months ago (1 children)

Is this how you make a sentient planet?

[–] Petter1@lemm.ee 3 points 5 months ago

I like that 🤩

[–] Martineski@lemmy.dbzer0.com 10 points 5 months ago (1 children)

I wonder if this will become a big thing in FOSS ai space. It's hard to compete with corpos when it comes to computing power.

[–] muntedcrocodile@lemm.ee 3 points 5 months ago

Still doesnt solve the whole what data can be used for a foss model thing but distributing compute requirements is good. Idk if this still requires that each node can compute the whole model tho might be a limitation of model sizes since moat pwople wont be able to run huge models etc.

[–] Audrey0nne@leminal.space 7 points 5 months ago (1 children)

Lot of words just to say that once the advertisers move in on a centralized platform its value is shot. A huge part of the reason I abandoned the last platform I was using and sought a federated alternative.

[–] Hackworth@lemmy.world 14 points 5 months ago

The papers have a ton of practical info about feasibility, implementation, etc.

[–] General_Effort@lemmy.world 3 points 5 months ago (1 children)

As far as I know, federated learning is pretty much dead. The point would be that it allows organizations to create a joint model without sharing data. But it doesn't look like anyone who doesn't want to share data wants to share a model.

[–] Hackworth@lemmy.world 2 points 5 months ago

Until they can distribute the training load of large models to consumer graphics cards (and do something like SETI@Home) it does seem like the benefit of distributed training isn't enough to overcome the friction.