Developer Tools

AWS SageMaker Feature Store adds Iceberg, Lake Formation, and SDK v3.8.0

New features slash metadata costs and automate fine-grained access for ML feature pipelines.

Deep Dive

Amazon SageMaker Feature Store, a fully managed repository for ML features, has added support for Apache Iceberg table format, streaming ingestion, scalable batch ingestion, and fine-grained access control through AWS Lake Formation. To address common production challenges—like runaway metadata costs and manual access setup—AWS released SageMaker Python SDK v3.8.0 on April 16, 2026, with three key capabilities: native Lake Formation integration that automatically enforces column-, row-, and cell-level access controls during feature group creation (or on existing groups); additional Iceberg table properties to set metadata retention and snapshot lifecycle policies, preventing the kind of 50 TB metadata accumulation seen by a retail analytics team; and a modernized SDK with modular architecture, faster installation, and no legacy dependencies like PyTorch. These features let teams secure sensitive feature data without manual overhead and keep storage costs predictable even under high-frequency streaming workloads.

Developers can activate these via new parameters in the FeatureGroupManager.create() and update() calls: LakeFormationConfig triggers automatic access control, and IcebergProperties configures metadata lifecycle. The SDK v3 also streamlines record operations (PutRecord, GetRecord, BatchGetRecord), point-in-time training dataset extraction, and DataFrame ingestion from Pandas or Spark. Existing code from SDK v2 works with minimal changes. For production ML platforms that scale from experimentation to deployment, these updates eliminate two persistent operational headaches—access governance and cost control—making it easier to build secure, cost-effective feature stores at scale.

Key Points
  • Native AWS Lake Formation integration enables column/row/cell-level access control automatically at feature group creation, no manual setup.
  • Iceberg table properties let teams set metadata retention and snapshot lifecycle policies, preventing 50+ TB metadata accumulation.
  • SageMaker Python SDK v3.8.0 (April 2026) is modular, lighter-weight, and includes all Feature Store operations with minimal legacy dependencies.

Why It Matters

Solves two top production ML pain points: securing feature data without overhead and controlling Iceberg storage costs.