I. General Summary
Under limited direction, responsible for design and implementation of core technologies associated with the organizations data analysis and analytics technical infrastructure. As a core member of a high-performance team, ensure data pipelines are consistently and reliably maintained and analytics capabilities are delivered at an optimum level, helping the organization identify insights from a large number of diverse datasets.
This position reports to the Director, Enterprise Data Integration and Solution Architecture.
II. Principal Responsibilities and Tasks
The following statements are intended to describe the general nature and level of work being performed by people assigned to this classification. These are not to be construed as an exhaustive list of all job duties performed by personnel so classified.
1. Ensure analytics infrastructure and associated systems meet business requirements and industry best practices.
2. Gather and process raw data from multiple disparate sources (including writing scripts, calling APIs, write SQL queries, etc.) into a form suitable for analysis.
3. Gathers, analyzes, documents and translates application requirements into data models.
4. Builds data models.
5. Enables big data, batch and real-time analytical processing solutions leveraging emerging technologies.
6. Researches and proposes opportunities for data acquisition and new uses for existing data.
7. Codes, tests, and documents new or modified data systems to create robust and scalable applications for analytics.
8. Translates complex functional and technical requirements into detailed architecture, design and high performance software.
9. Builds and architects next-generation Big Data analytics framework
10. Expands and grows data platform capabilities to solve new data problems and challenges.
11. Creates data flow diagrams for business systems.
12. Builds automation tools; ensures all automated processes preserve data by managing the alignment of data availability and integration processes.
13. Performs technology and product research to better define requirements, resolve important issues and improve the overall capability of the technology stack.
14. Contributes to the design and direction of enterprise-wide data architecture as well as design documentation deliverables.
15. Supports standardization of documentation and the adoption of standards and practices related to data and applications.
16. Develops Relational Data Models, Dimensional Data Models, Data Dictionary and Metadata.
17. Develop data set processes for data mining and production.
18. Working both independently and in collaboration with our data integration developers, data scientists, designs and builds high-performance algorithms, prototypes, predictive models and proof of concepts.
19. Works closely with our developer team to integrate innovative algorithms into our production systems.
20. Supports business decisions with ad hoc analysis as needed.
21. Assesses and provides recommendations on business relevance, appropriate timing and deployment.
III. Education and Experience
1. Bachelor's Degree in Computer Science, Mathematics, Information Systems, Engineering, Physical Sciences, Life Sciences or closely related field or equivalent related professional experience is required. Additional certifications are preferred.
2. Seven or more (7+) years experience designing, implementing and supporting systems in a large scale analytics or data engineering environment containing many disparate application systems and multiple data sources.
3. Strong knowledge in programming or scripting languages (e.g., C/C, Python, Ruby).
4. Experience with agile or other rapid application development methods.
5. Experience with object-oriented design, coding and testing patterns as well as experience in engineering software platforms and large-scale data infrastructures.
IV. Knowledge, Skills and Abilities
1. Knowledge of data analysis, end user requirements analysis, and business requirements analysis to develop a clear understanding of the business needs and to incorporate these needs into technical solutions.
2. Strong knowledge of and experience with statistics or advanced mathematics.
3. Significant knowledge of data modeling and understanding of different data structures and their benefits and limitations under particular use cases.
4. Working knowledge of relational, document oriented or object oriented databases, such as PostgreSQL, Oracle, Cache, SQL, MongoDB.
5. Deep knowledge in data mining, machine learning or natural language processing.
6. Strong programming experience to clean and scrub noisy datasets; Experience building algorithms.
7. Experience with Hortonworks or the Hadoop ecosystem in general, including HDSF and such tools as Spark, MapReduce, Pig and Hive. Experience with Ranger / Atlas and /or Falcon.
8. Experience with various messaging systems, such as ActiveMQ or RabbitMQ.
9. Experience with Big Data machine learning toolkits, such as Mahout, SparkML or TensorFlow.
10. Must work well in a high-performance team environment.
11. Capability to architect highly scalable distributed systems using non-proprietary tools.
12. Expert knowledge of data modeling and understanding of different data structures and their benefits and limitations under particular use cases.