A Reka alapítója megosztotta az LLM-ek tanításának nehézségeit a Big Tech infrastruktúrán kívül
Yi Tay, one of the founders of Reka, has written a blog about what it's like to build a startup trying to train AI systems. Coming from Google, which has notoriously excellent internal infrastructure, Tay found the external landscape to be much more difficult. His reflections highlight the instability of compute providers and the lack of mature software tools compared to the internal environments of tech giants.
- Reported extreme instability in compute providers, with some clusters failing every few hours.
- Noted that GPU multinode training feels like an 'afterthought' compared to TPU pods.
- Critiqued external codebases for significantly lagging behind Google's internal standards.
- Expressed surprise that model parallelism changes are not automated in many external frameworks.
Miért fontos?
This post is valuable because it sheds light on what the frontier of AI in the world of startups looks like—messy, ever-evolving, and depending on resources you think work like utilities but in practice work more like artisanal businesses. Though AI is progressing rapidly, it is sometimes despite the challenges of building systems at the frontier.