The Data Engineer will work collaboratively in multi-disciplinary teams to help drive innovation in the areas of data storage, data ingestion, data quality, performance tuning, optimization, and system troubleshooting. Involved in the design and implementation of algorithms, models, and work flows that lead researchers to discover valuable information within large volumes of data from various sources. May organize, harmonize, and analyze data sets, use various technologies to enable data visualization, and create data-centric applications of enduring value to the business RESPONSIBILITIES:
The Data Engineer will engage in system setup, ETL pipeline development, production support, executing test plans, and writing data preparation and reporting programs. This role works as a part of a multi-faceted product team under the guidance of project and functional managers who will provide detailed specifications, schedules, and priorities, as well as thoroughly review the resource's work. The Data Engineer will be responsible for maintaining, testing, and supporting existing data systems, which may include coding and testing ETL pipelines, data manipulation with integrity preservation, and training others in the end-use of that data. The Data Engineer must be proficient in many different platforms, including SQL, NoSQL, Hadoop/Spark, and data warehousing implementations. The Data Engineer must also be committed to staying up to date with new technologies, and recognizing when to use different or emerging technologies as most appropriate for each task.
The successful candidate should have at least a Bachelors degree in one of the following fields: math, statistics, computer science, data science, or a social science or public policy related field.
must have at least five years experience in positions of increasing responsibility, preferably working with large datasets and conducting statistical and quantitative modeling, melding analytics with programming, data mining, clustering, and segmentation.
He/she should have a strong foundation in areas of statistics, mathematics, and computer programming.
Mature fluency in Python programming and familiarity with both Linux and Windows operating environments required along with parallelization technologies for high performance computing.
Familiarity with utility scripting languages such as Bash and Powershell, code repository tools such as Git and Subversion (SVN), and Atlassian products JIRA and Bitbucket.
Experience with a broad array of data storage platforms, including, but not limited to: SQL (Postgres, MS SQL, MySQL), NoSQL (MongoDB), and data warehousing solutions (Vertica/Greenplum) with emphasis on performance tuning, query optimizations, big data workloads.
Extensive hands on experience in leading large-scale, full-cycle MPP enterprise data warehousing (EDW) projects.
Extensive hands on experience in data warehousing design, tuning, and ETL/ELT process development.
In addition, the successful candidate should:
Recommend solutions that follow accepted standards regarding database physical structure, functional capabilities, and security.
Maintain database performance by calculating optimal parameters values.
Perform query and database design optimization in addition to providing assistance/reviews/feedback to stakeholders in writing complex queries, stored procedures/functions, views, and DDL/DML scripts.
Troubleshoot complex database issues in accurate and timely manner ensuring compliance with SLAs.
Have strong skills in problem solving and quantitative/qualitative analysis required.
Be able to organize and prioritize work assignments to meet project goals