Job Responsibilities 1. Building and Implementing data ingestion and curation process developed using Big data tools such as Spark (Scala/python), Data bricks, Delta lake, Hive, Pig, Spark, HDFS, Oozie, Sqoop, Flume, Zookeeper, Kerberos, Sentry, Impala etc. 2. Ingesting huge volumes data from various platforms for Analytics needs and writing high-performance, reliable and maintainable ETL code. 3. Monitoring performance and advising any necessary infrastructure changes. 4. Defining data security principals and policies using Ranger and Kerberos. 5. Assisting application developers and advising on efficient big data application development using cutting edge technologies. Knowledge, Skills and Abilities Education · Bachelor's degree in Computer Science, Engineering, or related discipline Experience
4+ years of solutions development experience
Proficiency and extensive Experience with Spark & Scala, Python and performance tuning is a MUST
Hive database management and Performance tuning is a MUST (Partitioning / Bucketing)
Strong SQL knowledge and data analysis skills for data anomaly detection and data quality assurance.
Strong analytic skills related to working with unstructured datasets.
Experience with building stream-processing systems, using solutions such as Storm or Spark-Streaming
Experience in any model management methodologies.
Knowledge and skills Required:
Proficiency and extensive experience in HDFS, Hive, Spark, Scala, Python, Databricks/Delta Lake, Flume, Kafka etc.
Analytical skills to analyze situations and come to optimal and efficient solution based on requirements. · Performance tuning and problem-solving skills is a must · Hive database management and Performance tuning is a MUST. (Partitioning / Bucketing)
Hands on development experience and high proficiency in Java or, Python, Scala and SQL
Experience designing multi-tenant, containerized Hadoop architecture for memory/CPU management/sharing across different LOBs Preferred
Proficiency and extensive Experience with Spark & Scala, Python and performance tuning is a MUST
Hive database management and Performance tuning is a MUST. (Partitioning / Bucketing)
Strong SQL knowledge and data analysis skills for data anomaly detection and data quality assurance.
Knowledge in data science is a plus
Experience with Informatica PC/BDM 10 and implemented push down processing into Hadoop platform, is a huge plus.
Proficiency is using tools Git, Bamboo and other continuous integration and deployment tools
Exposure to data governance principles such as Metadata, Lineage ( Colibra /Atlas)