Build a and operationalize pipelines to include data acquisition, staging, integration of new data sources, cataloging,
Layout and architect a strategic data vision for SAMS
Cleansing, batch and stream processing, transformation, and consumption
Work with our cloud engineering team to setup and provision a team-sandbox with approved managed Google Cloud Platform data services to build the data pipeline and provided sample data sets.
Provide architecture to integrate metadata management, integrate with on-prem Data quality engine.
Provide automated source / destination data comparison architecture
Design and develop a data preparation and quality control mechanism
Implement data classification of incoming data and access controls based on data domains
Architect and implement data exploration platform for data architects and data scientists on staged data sets.
Build and refine job automation and orchestration for pipeline to handle exception handling, rerun jobs, fault tolerance, retrospective, logging, alerts, notifications etc.,
Develop transformation process for handling batch and streaming data
Build compatible target data state to include compatible schema design, catalog, data modeling acceptable for consumption via APIs, Client and analytics
Provide data usage pattern for analytics, API and other consumption patterns from target data store.
Build target state data publishing and visualization and integration
Implement Identity and Access Management (IAM) roles across the data pipeline. Also define and execute different delegate roles as needed.
Implement IAM and dataset-level-access-controls in target data store
Build an CICD automation pipeline facilitating automated deployment and automated testing
Deliver end to end comprehensive documentation along with code samples