AI MODELS
A Google elindította a Gemini Embedding 2-t a multimodális vektortér-leképezéshez
Google released Gemini Embedding 2, a multimodal embedding model that maps text, images, video, audio, and PDFs into a unified vector space. The model succeeds the text-only gemini-embedding-001 and addresses the architectural complexity of building production RAG systems by eliminating the need for separate pipelines—developers can now combine different modalities in single requests with technical limits including 8,192 tokens of text, six images, 120 seconds of video, 80 seconds of audio, and six PDF pages. On the Massive Text Embedding Benchmark, Gemini Embedding 2 shows improvements in retrieval accuracy and robustness to domain shift: a common problem where performance drops when moving from generic training data to specialized domains like proprietary code or medical datasets. The model is available in public preview through the Gemini API and Vertex AI, with optional task-type parameters (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, CLASSIFICATION) that optimize vector properties for specific operations and improve semantic search hit rates.
- Maps text, images, video, audio, and PDFs into a single unified vector space
- Supports 8,192 text tokens, 120s of video, and 80s of audio per request
- Reduces RAG complexity by eliminating the need for modality-specific pipelines
- Shows improved robustness to domain shift on the Massive Text Embedding Benchmark
- Includes task-type parameters to optimize vector properties for specific use cases