A Vision Transformerek felveszik a versenyt az újszülött csibék tanulási hatékonyságával a felismerésben
Researchers with Indiana University Bloomington have done a neat study where they compare how well a transformer-based computer vision system can learn basic object recognition skills compared to newborn chicks. The results show a surprising convergence between the biological system (the chick) and the digital (the vision transformer), suggesting that transformers are more efficient at learning visual representations than people think.
Specifically, the “chicks were hatched in darkness, then raised singly in automated controlled-rearing chambers that measured each chick’s behavior continuously (24/7) during the first two weeks of life.” In the first week, they displayed a variety of different views of a single object. They then replicated this experience for the vision transformer by building a perfect replica of the chick chamber in a game engine, then gathering data via a first-person viewpoint. The agent received visual input (64×64 pixel resolution images) through a forward-facing camera attached to its head.
- Vision Transformers (ViT) performed on par or better than chicks in object recognition tests
- Larger ViT architectures were not more 'data hungry' than smaller versions
- ViTs were trained on 80,000 images sampled at 10 frames per second
- The study suggests generic learning systems don't need hardcoded knowledge of objects to learn representations
- Biological visual systems process roughly 430,000 'images' in their first day, assuming 10Hz processing
Miért fontos?
Research like this shows how digital systems like transformers seem to display similar efficiency at learning certain things to biological intelligence. “Our results provide computationally explicit evidence that a generic learning mechanism (ViT), paired with a biologically inspired learning objective (contrastive learning through time), is sufficient to reproduce animal-like object recognition when the system is trained on the embodied data streams available to newborn animals,” the authors write.