Research & Papers

Tokenization Tradeoffs in Structured EHR Foundation Models

New research reveals how tokenization choices in medical AI models can improve 73/74 prediction tasks while using 39.5% less compute.

Deep Dive

A Stanford research team has published a comprehensive study examining how tokenization—the process of converting medical records into discrete inputs for AI models—affects both performance and efficiency in healthcare AI. The researchers pretrained transformer models on pediatric EHR data using different tokenization strategies, varying how medical events, timestamps, and workflow annotations were encoded. Their factorial design revealed that specific choices dramatically impact downstream results, with the optimal method improving performance across nearly all 74 clinical prediction tasks tested.

The study identified 'joint event encoding' as the most effective approach, where medical codes and their attributes (like dosage or severity) are combined into single tokens rather than split across multiple tokens. This method outperformed alternatives on 73 out of 74 tasks while requiring 39.5% fewer floating-point operations during pretraining. The researchers traced this advantage to 'local binding efficiency'—by pre-combining related information, the model doesn't have to learn these associations from scratch, making training both faster and more effective.

External validation on adult intensive care unit data showed the tokenization advantage generalizes across different patient populations despite vocabulary mismatches, though temporal and workflow effects remained institution-specific. The findings establish tokenization as a practical, tractable lever for improving EHR foundation models, offering a path to more accurate clinical predictions with significantly reduced computational costs. This research provides concrete engineering guidance for teams building medical AI systems that need to balance performance with resource constraints.

Key Points
  • Joint event encoding (combining codes+attributes) improved performance on 73/74 clinical prediction tasks while using 39.5% fewer FLOPs
  • The method's advantage comes from 'local binding efficiency'—pre-combining related information rather than forcing the model to learn associations
  • External validation showed the approach generalizes across patient populations despite vocabulary differences between institutions

Why It Matters

Provides concrete engineering guidance for building more accurate, efficient medical AI that can scale across healthcare systems with lower computational costs.