Non-contrastive SSL methods (BYOL, JEPA, data2vec) lack monotonic loss, making hyperparameter selection a guessing game?

Non-contrastive SSL methods (BYOL, JEPA, data2vec) lack monotonic loss, making hyperparameter selection a guessing game.

Linear probing/KNN during training risks researcher degrees of freedom abuse?

Linear probing/KNN during training risks researcher degrees of freedom abuse.

RankMe metric gets 'absorbed' by existing entropy-collapse regularization, nullifying its value as an independent criterion?

RankMe metric gets 'absorbed' by existing entropy-collapse regularization, nullifying its value as an independent criterion.

Research & Papers

DeepMind's BYOL and Meta's JEPA face hyperparameter selection challenge with non-monotonic loss

r/MachineLearning May 25, 2026

⚡Evaluating self-supervised learning when loss doesn't decrease is a researcher's nightmare.

Deep Dive

A Reddit discussion highlights a fundamental pain point in self-supervised representation learning: how to pick hyperparameters and architectures when the training loss doesn't steadily decrease? Non-contrastive methods like BYOL (DeepMind), JEPA (Meta), and data2vec are popular yet notoriously opaque. The loss landscape is non-monotonic, meaning traditional early stopping or learning rate schedules fail. Researchers often turn to linear probing or KNN accuracy on downstream tasks, but this risks 'p-hacking' by peeking at test data during training—a classic degrees-of-freedom problem.

RankMe, a metric that computes the effective rank of the embedding matrix via SVD, was proposed as a proxy for representation quality. However, methods like JEPA already incorporate entropy-collapse terms (e.g., Barlow Twins, VICReg, SIGREG) that directly penalize rank collapse. Thus RankMe becomes absorbed into the loss and loses independence—increasing the penalty weight can artificially inflate rank without improving real transfer. The community is now asking: what metrics truly generalize? Options like contrastive accuracy on held-out views or invariance to augmentations are being debated, but no silver bullet exists yet.

Key Points

Non-contrastive SSL methods (BYOL, JEPA, data2vec) lack monotonic loss, making hyperparameter selection a guessing game.
Linear probing/KNN during training risks researcher degrees of freedom abuse.
RankMe metric gets 'absorbed' by existing entropy-collapse regularization, nullifying its value as an independent criterion.

Why It Matters

Without robust evaluation criteria, progress in self-supervised learning stalls—impacting every downstream task from image understanding to robotics.

Read Original Article

DeepMind's BYOL and Meta's JEPA face hyperparameter selection challenge with non-monotonic loss

Why It Matters

Related Articles

🚀 Stay Ahead in AI