Location: McLean, VA (3 Days/Week onsite from the start of assignment) Duration: 6 months contract (Possible for Extension)
Job Description/Responsibilities: · Cleanse, manipulate, and analyze large datasets (Semi-Structured and Unstructured data – XMLs, JSONs, CSVs, PDFs) using python and Snowflake database. · Develop Python scripts to filter/cleanse/map/aggregate data. · Manage and implement data processes (Data Quality reports). · Develop data profiling, deduping logic, and matching logic for analysis. · Programming Languages experience in Python, PySpark and SQL for data ingestion. · Present ideas and recommendations on data handling and data parsing technologies to management.
Experience: · 5+ years of experience in processing large volumes and variety of data (Structured and semi-structured data, writing code for parallel processing, shredding XMLS, JSONs and reading PDFs) - Mandatory · 3+ years of development experience in Python for data processing and analysis – Mandatory. · Strong SQL experience is a must – Mandatory. · Detail oriented. Excellent communication skills (verbal and written). · Must be able to manage multiple priorities and meet deadlines. · 3+ years of experience – using Hadoop platform and performing analysis. Familiarity with Hadoop cluster environment and configurations for resource management for analysis work – Mandatory.
Optional: · 2+ years of experience with Snowflake, preferably parsing JSON and XML files using Snow SQL or Snowpark · 2+ years of programming experience in PySpark for data processing and analysis - Optional · Degree in Computer Science, Statistics, Mathematics, or related field
About US Tech Solutions: US Tech Solutions is a global staff augmentation firm providing a wide range of talent on-demand and total workforce solutions. To know more about US Tech Solutions, please visit ;/p>