MI Történik?

Mesterséges intelligencia hírek magyarul — naponta frissülve

← Vissza a főoldalra

A Google elindította a Gemini Embedding 2-t a multimodális vektortér-leképezéshez

Google released Gemini Embedding 2, a multimodal embedding model that maps text, images, video, audio, and PDFs into a unified vector space. The model succeeds the text-only gemini-embedding-001 and addresses the architectural complexity of building production RAG systems by eliminating the need for separate pipelines—developers can now combine different modalities in single requests with technical limits including 8,192 tokens of text, six images, 120 seconds of video, 80 seconds of audio, and six PDF pages. On the Massive Text Embedding Benchmark, Gemini Embedding 2 shows improvements in retrieval accuracy and robustness to domain shift: a common problem where performance drops when moving from generic training data to specialized domains like proprietary code or medical datasets. The model is available in public preview through the Gemini API and Vertex AI, with optional task-type parameters (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, CLASSIFICATION) that optimize vector properties for specific operations and improve semantic search hit rates.
Eredeti forrás megtekintése (angol) →