MI Történik?

Mesterséges intelligencia hírek magyarul — naponta frissülve

← Vissza a főoldalra

A Huawei hazai infrastruktúrával fejlesztette ki a billió paraméteres PANGU-Σ modellt

Huawei has trained PANGU-Σ, a trillion parameter Chinese language model. This is a scaled-up model and is the successor to Huawei's 'PanGu', which was the first publicly disclosed attempt at replicating OpenAI's GPT3. PANGU-Σ is very much a statement of intent - "the main motivation for this work is to design a scalable model architecture and an efficient distributed training system", Huawei writes. In other words: this is a technical report about us building repeatable infrastructure so we can crank out an ever larger set of models. One weird thing that makes you go 'uh oh': They train the model on 329 billion tokens for over 100 days. That's… not a lot of tokens? The Chinchilla paper from DeepMind showed that things like GPT3 (~400bn tokens) were undertrained by 4X-5X. That sort of napkins out to PANGU-Σ needing to be trained on multiple trillions of tokens to effectively utilize its parameter size - but there's a chance I'm being dumb here and missing something.
Miért fontos?

This paper is a symptom of how Chinese AI is industrializing in much the same way as in the West - a small number of labs linked to large tech companies are building the infrastructure necessary to train large models, and are starting to stamp out increasingly large models as they all chase the scale hypothesis. These large-scale model factories are also going to be proving grounds for the rest of the AI supply chain - here, homegrown software and homegrown semiconductors.

Eredeti forrás megtekintése (angol) →