A HuggingFace elindítja a Boom projektet a nagyléptékű elosztott modelltanításhoz

AI INFRASTRUCTURE

A HuggingFace elindítja a Boom projektet a nagyléptékű elosztott modelltanításhoz

2025. március 17. · MI Történik? · 1 perc olvasás

HuggingFace has started the 'Boom' project, whose goal is to 'train a decoder-only Transformer language model at the 70-100 billion parameter scale for +20T tokens'. They estimate the compute requirement will be ~5 million H100-hours, equivalent to month-long allocations of 512 H100s from ~10 different datacenters. HuggingFace is apparently validating the project now, in discussion with 12 data center operators, and has already confirmed compute from ~6 of them and will start a pilot in March/April. If HuggingFace succeeds, AI policy could end up looking quite different.

Goal is to train a model at the 70-100 billion parameter scale.
Training set involves over 20 trillion tokens.
Requires approximately 5 million H100-hours of compute.
Collaborating with 12 different data center operators for distributed resources.
Pilot program is expected to begin in March or April of 2025.

Miért fontos?

This project serves as a real-world test for distributed training scaling laws. Success would prove that state-of-the-art models no longer require single-location massive supercomputers, decentralizing the power of AI development.

Eredeti forrás megtekintése (angol) →