Designing, creating, testing and maintaining the complete data management & processing systems.
Working closely with the stakeholders & solution architect.
Ensuring architecture meets the business requirements.
Building highly scalable, robust & fault-tolerant systems.
Taking care of the complete ETL process.
Knowledge of Hadoop ecosystem and different frameworks inside it – HDFS, YARN, MapReduce, Apache Pig, Hive, Flume, Sqoop, ZooKeeper, Oozie, Impala and Kafka
Must have knowledge and working experience in Real-time processing Framework (Apache Spark), PySpark and in AWS Redshift
Must have experience on SQL-based technologies (e.g. MySQL/ Oracle DB) and NoSQL technologies (e.g. Cassandra and MongoDB)
Should have Python/Scala/Java Programming skills
Discovering data acquisitions opportunities
Finding ways & methods to find value out of existing data.
Improving data quality, reliability & efficiency of the individual components & the complete system.
Creating a complete solution by integrating a variety of programming languages & tools together.
Creating data models to reduce system complexities and hence increase efficiency & reduce cost.
Introducing new data management tools & technologies into the existing system to make it more efficient.
Setting & achieving individual as well as the team goal.
Problem solving mindset working in agile environment