Developer Tools

Amazon Nova Multimodal Embeddings unlocks visual data in manufacturing docs

Text search can't find answers in diagrams—but this can.

Deep Dive

Amazon Nova Multimodal Embeddings on Amazon Bedrock maps text, images, and document pages into a shared vector space, so engineers can retrieve diagrams or plots using natural language queries. The model supports configurable dimensions (256, 384, 1024, or 3072), a DOCUMENT_IMAGE detail level for mixed-content pages, and a purpose parameter set to GENERIC_INDEX for documents or GENERIC_RETRIEVAL for queries. In an evaluation on 26 manufacturing queries, the multimodal pipeline was compared against a text-only baseline—enabling retrieval of torque specs from CAD drawings or peak temperatures from thermal contour plots.

Key Points
  • Maps text, images, and document pages into a shared vector space for cross-modal similarity search (e.g., text query matching an image).
  • Configurable embedding dimensions: 256, 384, 1024, or 3072; 1024 used as a practical trade-off in evaluation.
  • Includes DOCUMENT_IMAGE detail mode for mixed-content pages and an asymmetric purpose parameter (GENERIC_INDEX vs. GENERIC_RETRIEVAL) to optimize retrieval.

Why It Matters

Engineers can now find answers locked in diagrams and plots, eliminating OCR blind spots in manufacturing document search.