How Amazon Web Services Improved Enterprise Data Discovery with Amazon SageMaker
News | 25.05.2026
As organizations scale their data ecosystems, one challenge becomes increasingly difficult to manage: fragmented data discovery. Different teams often create local datasets, dashboards, metrics, and business assets outside centralized enterprise catalogs, making it difficult for users to find, trust, and reuse information efficiently.
To solve this challenge, Amazon’s Business Data Technologies (BDT) team expanded its enterprise data catalog strategy by integrating internal governance systems with Amazon SageMaker. The goal was clear: create a unified discovery and governance experience for both structured datasets and broader business data assets.
For organizations building modern data and AI platforms on Amazon Web Services, this approach provides valuable insights into how centralized cataloging and governance can improve collaboration, analytics, and AI readiness at scale.
The challenge: fragmented catalogs and disconnected governance
Amazon already operated a centralized enterprise data catalog called Andes, designed for secure dataset sharing under strict governance policies. However, many teams also maintained separate catalogs for:
- Local datasets
- Dashboards
- Metrics
- Business documents
- ML assets
- Non-tabular resources
As a result, users had to search across multiple systems depending on the asset type. This created operational overhead, slowed analytics workflows, and reduced visibility into available data resources.
The BDT team identified four key requirements for modernization:
1. Multimodal catalog support
Teams needed a unified platform capable of cataloging:
- Enterprise datasets
- Local business datasets
- Dashboards and KPIs
- Files and reports
- Analytical assets
2. Unified governance and access enforcement
Organizations required centralized governance with:
- Single approval processes
- Consistent access policies
- Identity-aware authorization
- Enterprise-wide auditing
3. Multi-approval workflows
Different asset types often required different approval models. The solution needed to support multiple governance workflows while maintaining centralized visibility.
4. Delegated ownership
Business units needed flexibility to enrich metadata, manage tags, and maintain domain-specific governance without losing enterprise-level oversight.
Extending enterprise catalog capabilities with Amazon SageMaker
To address these requirements, Amazon extended its enterprise catalog environment using Amazon SageMaker’s catalog and governance capabilities.
Rather than operating multiple disconnected catalogs, Amazon created a single enterprise-wide domain integrating datasets and data assets into one discovery experience.
The architecture integrates:
- Amazon SageMaker
- AWS IAM Identity Center
- Enterprise identity systems
- Existing governance frameworks
- Internal approval tooling
This enabled a centralized catalog while preserving existing security and governance standards.
Key benefits of the integrated architecture
Single-pane data discovery
Users can now search datasets, dashboards, metrics, and analytical assets from one interface instead of navigating multiple systems. This significantly reduces time spent locating trusted data resources.
Expanded governance across asset types
Governance policies now extend beyond traditional datasets to broader business assets, enabling consistent enforcement across environments.
Improved observability and auditing
Using Trusted Identity Propagation (TIP) with AWS IAM Identity Center, organizations gain detailed visibility into:
- Who accessed specific assets
- When assets were used
- Which systems were involved
This strengthens compliance and enterprise auditing capabilities.
Integration with existing enterprise workflows
The platform integrates with Git repositories, approval systems, and internal tooling to automate permissions, onboarding, and operational workflows.
Core components of the implementation
Catalog connectors and ingestion pipelines
Amazon built connectors to synchronize assets from multiple sources into SageMaker while preserving governance models and metadata.
This included:
- Integration with Andes datasets
- AWS account onboarding automation
- Identity-aware access mapping
Delegated ownership and business glossaries
Business teams can now define and maintain:
- Business glossaries
- Domain vocabularies
- Metadata definitions
- Classification tags
This improves data discoverability and standardization across the organization.
Integrated analytics and development tooling
Users can consume cataloged assets directly through:
- SageMaker Unified Studio
- SQL Query Editors
- ML development environments
- Git-integrated workflows
- AWS analytics services
The environment integrates natively with:
- Amazon Athena
- AWS Glue
- Amazon EMR
- Amazon Redshift
This enables teams to discover, analyze, and operationalize data within a unified workflow.
Results: faster discovery and stronger collaboration
The integrated SageMaker catalog now supports a broad range of enterprise assets, including:
- Datasets
- Dashboards
- Metrics
- ML models
- Business documents
- Analytical outputs
According to Amazon, the initiative delivered several measurable improvements:
Faster access to trusted data
Teams spend less time searching for data and more time generating insights.
Reduced data silos
Shared governance and centralized discovery encourage reuse of authoritative datasets rather than creating redundant copies.
Improved cross-team collaboration
Standardized metadata and unified visibility make it easier for teams to collaborate across business domains.
Why this matters for modern AI and analytics strategies
As organizations invest in AI, analytics, and data-driven decision-making, fragmented catalogs become a major operational bottleneck.
Unified cataloging with Amazon SageMaker helps organizations:
- Build AI-ready data foundations
- Improve governance and compliance
- Simplify analytics workflows
- Increase trust in enterprise data
- Accelerate collaboration between teams
How Softprom helps organizations modernize data discovery on AWS
As an official AWS Partner, Softprom helps organizations design and implement scalable data and AI platforms using AWS technologies, including:
- Amazon SageMaker
- AWS analytics services
- Data governance frameworks
- AI and ML environments
- Identity and access integration
- Enterprise data modernization strategies
By combining AWS services with governance best practices, organizations can create unified data ecosystems that support analytics, AI, and enterprise collaboration at scale.