LessWrong's 'Unmeasured' concept clarifies data gaps in research
A simple term for data you forgot to collect—and why it matters.
In a LessWrong post titled 'Data you could have observed but didn't,' author Gretta Duleba examines a common research predicament: you realize you should have collected a variable (like hair color) but didn't think of it in advance. This type of data isn't latent (unobservable in principle) nor merely 'unobserved' (which can conflate with latent variables). Duleba searches for precise terminology and finds that different fields have distinct labels. In econometrics, it's often called 'omitted' (as in omitted variable bias). Epidemiology and biostatistics use 'unmeasured variable.' Statistics as a whole has a subfield called 'Missing Data.'
Duleba settles on 'unmeasured' as a clean, intuitive term, noting that most people she works with mix terminology from mathematics, physics, engineering, and other fields, creating confusion. A commenter, DaemonicSigil, suggests 'unrecorded' as another candidate. The post underscores the importance of shared language in research and highlights how even simple gaps in data collection can benefit from standardized labels. Duleba also reveals that she verified her findings by asking two different LLMs and cross-checking their answers.
- Gretta Duleba identifies a vocabulary gap for data that could have been observed but wasn't collected.
- Different fields use different terms: 'omitted' (econometrics), 'unmeasured variable' (epidemiology), 'missing data' (statistics).
- Duleba proposes 'unmeasured' as the best catch-all; commenter suggests 'unrecorded' as an alternative.
Why It Matters
Precise terminology helps researchers avoid confusion and improve study design and data collection.