News

How Amazon Web Services Improved Enterprise Data Discovery with Amazon SageMaker

News | 25.05.2026

As organizations scale their data ecosystems, one challenge becomes increasingly difficult to manage: fragmented data discovery. Different teams often create local datasets, dashboards, metrics, and business assets outside centralized enterprise catalogs, making it difficult for users to find, trust, and reuse information efficiently.

To solve this challenge, Amazon’s Business Data Technologies (BDT) team expanded its enterprise data catalog strategy by integrating internal governance systems with Amazon SageMaker. The goal was clear: create a unified discovery and governance experience for both structured datasets and broader business data assets.

For organizations building modern data and AI platforms on Amazon Web Services, this approach provides valuable insights into how centralized cataloging and governance can improve collaboration, analytics, and AI readiness at scale.

The challenge: fragmented catalogs and disconnected governance

Amazon already operated a centralized enterprise data catalog called Andes, designed for secure dataset sharing under strict governance policies. However, many teams also maintained separate catalogs for:

  • Local datasets
  • Dashboards
  • Metrics
  • Business documents
  • ML assets
  • Non-tabular resources

As a result, users had to search across multiple systems depending on the asset type. This created operational overhead, slowed analytics workflows, and reduced visibility into available data resources.

The BDT team identified four key requirements for modernization:

1. Multimodal catalog support

Teams needed a unified platform capable of cataloging:

  • Enterprise datasets
  • Local business datasets
  • Dashboards and KPIs
  • Files and reports
  • Analytical assets

2. Unified governance and access enforcement

Organizations required centralized governance with:

  • Single approval processes
  • Consistent access policies
  • Identity-aware authorization
  • Enterprise-wide auditing

3. Multi-approval workflows

Different asset types often required different approval models. The solution needed to support multiple governance workflows while maintaining centralized visibility.

4. Delegated ownership

Business units needed flexibility to enrich metadata, manage tags, and maintain domain-specific governance without losing enterprise-level oversight.

Extending enterprise catalog capabilities with Amazon SageMaker

To address these requirements, Amazon extended its enterprise catalog environment using Amazon SageMaker’s catalog and governance capabilities.

Rather than operating multiple disconnected catalogs, Amazon created a single enterprise-wide domain integrating datasets and data assets into one discovery experience.

The architecture integrates:

  • Amazon SageMaker
  • AWS IAM Identity Center
  • Enterprise identity systems
  • Existing governance frameworks
  • Internal approval tooling

This enabled a centralized catalog while preserving existing security and governance standards.

Key benefits of the integrated architecture

Single-pane data discovery

Users can now search datasets, dashboards, metrics, and analytical assets from one interface instead of navigating multiple systems. This significantly reduces time spent locating trusted data resources.

Expanded governance across asset types

Governance policies now extend beyond traditional datasets to broader business assets, enabling consistent enforcement across environments.

Improved observability and auditing

Using Trusted Identity Propagation (TIP) with AWS IAM Identity Center, organizations gain detailed visibility into:

  • Who accessed specific assets
  • When assets were used
  • Which systems were involved

This strengthens compliance and enterprise auditing capabilities.

Integration with existing enterprise workflows

The platform integrates with Git repositories, approval systems, and internal tooling to automate permissions, onboarding, and operational workflows.

Core components of the implementation

Catalog connectors and ingestion pipelines

Amazon built connectors to synchronize assets from multiple sources into SageMaker while preserving governance models and metadata.

This included:

  • Integration with Andes datasets
  • AWS account onboarding automation
  • Identity-aware access mapping

Delegated ownership and business glossaries

Business teams can now define and maintain:

  • Business glossaries
  • Domain vocabularies
  • Metadata definitions
  • Classification tags

This improves data discoverability and standardization across the organization.

Integrated analytics and development tooling

Users can consume cataloged assets directly through:

  • SageMaker Unified Studio
  • SQL Query Editors
  • ML development environments
  • Git-integrated workflows
  • AWS analytics services

The environment integrates natively with:

  • Amazon Athena
  • AWS Glue
  • Amazon EMR
  • Amazon Redshift

This enables teams to discover, analyze, and operationalize data within a unified workflow.

Results: faster discovery and stronger collaboration

The integrated SageMaker catalog now supports a broad range of enterprise assets, including:

  • Datasets
  • Dashboards
  • Metrics
  • ML models
  • Business documents
  • Analytical outputs

According to Amazon, the initiative delivered several measurable improvements:

Faster access to trusted data

Teams spend less time searching for data and more time generating insights.

Reduced data silos

Shared governance and centralized discovery encourage reuse of authoritative datasets rather than creating redundant copies.

Improved cross-team collaboration

Standardized metadata and unified visibility make it easier for teams to collaborate across business domains.

Why this matters for modern AI and analytics strategies

As organizations invest in AI, analytics, and data-driven decision-making, fragmented catalogs become a major operational bottleneck.

Unified cataloging with Amazon SageMaker helps organizations:

  • Build AI-ready data foundations
  • Improve governance and compliance
  • Simplify analytics workflows
  • Increase trust in enterprise data
  • Accelerate collaboration between teams

How Softprom helps organizations modernize data discovery on AWS

As an official AWS Partner, Softprom helps organizations design and implement scalable data and AI platforms using AWS technologies, including:

  • Amazon SageMaker
  • AWS analytics services
  • Data governance frameworks
  • AI and ML environments
  • Identity and access integration
  • Enterprise data modernization strategies

By combining AWS services with governance best practices, organizations can create unified data ecosystems that support analytics, AI, and enterprise collaboration at scale.