MI Történik?

Mesterséges intelligencia hírek magyarul — naponta frissülve

← Vissza a főoldalra

Covenant-72B: Distributed Training Challenges Centralized AI Development

A bunch of people have used the blockchain to coordinate the distributed training run of a 72B parameter model which matches the performance of LLaMA2, a model trained and released by Facebook in 2023. The model, Covenant 72B, is a dense decoder-only Transformer architecture model built in the LLaMA-3 style. “Our model, pre-trained on approximately 1.1T tokens, performs competitively with fully centralized models pre-trained on similar or higher compute budgets, demonstrating that fully democratized, non-whitelisted participation is not only feasible, but can be achieved at unprecedented scale for a globally distributed pre-training run,” writes Covenant AI, an organization dedicated to doing AI development on top of the blockchain.
Miért fontos?

Why this matters - who owns the future?: Distributed training is a technique that can change the political economy of AI by shifting the people at the frontier from monolithic ‘compute singletons’ (like labs such as Anthropic and OpenAI, and clouds like Google) to a larger federated collective. But for that to be true, distributed training needs to catch up to the frontier (more discussion from Import AI 439) - as impressive as Covenant is, it’s mostly a demonstration that distributed training can build some non-trivial models that have vague utility, but that’s a long way from the frontier - modern frontier models are trained on tens to hundreds of thousands of chips, whereas this was trained on perhaps ~160 or so (20 peers * 8 chips apiece). Nonetheless, it’s an important technology to track, and I could imagine a world where on-device AI features a lot of models developed via distributed training techniques, while on-cloud AI mostly runs on proprietary models trained on huge amounts of compute.

Eredeti forrás megtekintése (angol) →