[R] Where to look for resources on developing metrics for generative model for science?
A developer building a generative model for science struggles to find evaluation metrics beyond basic MSE.
An AI researcher is developing a new generative foundation model for a scientific field, handling multimodal data like videos and time series. They currently lack robust evaluation methods, relying only on Mean Squared Error (MSE) and subjective human feedback. The core challenge is finding or developing suitable metrics for each modality, moving beyond field-specific standards like FID for images or PESQ for audio to properly assess the novel model's performance.
Why It Matters
Without proper metrics, evaluating and advancing complex, real-world AI models for science remains a significant bottleneck.