Chameleon represents all modalities—images, text, and code, as discrete tokens and uses a uniform transformer-based architecture that is trained from scratch in an end-to-end fashion on ?10T tokens of interleaved mixed-modal data. As a result, Chameleon can both reason over, as well as generate, arbitrary mixed-modal documents. Text tokens are represented in green and image tokens are represented in blue. Credit: arXiv (2024). DOI: 10.48550/arxiv.2405.09818 AI researchers at Meta, the company that owns Facebook, Instagram, WhatsApp, and many other products, have designed and built a multimodal model to compete with Read More
No comments:
Post a Comment