Microsoft is looking for a Senior Site Reliability Engineer (SRE) to support and expand Viva Engage. Viva Engage (formerly Yammer) is the industry-defining social network for the enterprise. We provide a platform for millions of employees, including those from 85% of Fortune 500 companies, to build community and culture, share knowledge, and connect with their leaders and each other. The user base for Viva Engage is growing quickly. The Site Reliability team is responsible for keeping the services reliable as we scale and modernize our tech stack. We need a SRE who knows how to manage the conflicting priorities of keeping things running today while making sure we have the architecture we need for the future. Acquired by Microsoft in 2012, Viva Engage combines the benefits of a startup - rapid innovation, cutting-edge technology, outsized individual impact - with the advantages of working for one of the most successful software companies in the world. We believe in mission-driven work and in this post-Covid world, our platform has become more indispensable than ever as it fosters connection and a sense of belonging among remote teams. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. Responsibilities
Participate in on-call rotations and incident responses throughout product development and operations cycles. On-call will require responding to support requests after normal business hours to include the weekends and/or holidays in a designated Microsoft office.
Monitor system performance and proactively identify and resolve issues to ensure high availability and performance.
Develop and maintain automation tools and processes for deployment, monitoring, and configuration management.
Apply troubleshooting skills, debugging tools, and examines logs, telemetry, and other methods to verify assumptions and customer impact. Proactively and reactively address findings with customer and/or service engineering efficiently via written and verbal communications.
Lead blameless postmortems for root cause and production resiliency.
Consult with developers to design services that scale in Azure.
Mentor team members and contribute to the overall growth and development of the SRE team.
Stay current with industry trends, emerging technologies, and best practices in site reliability engineering and cloud computing.
Embody our Culture (https://careers.microsoft.com/v2/global/en/culture) & Values (https://www.microsoft.com/en-us/about/corporate-values)QualificationsRequired Qualifications:
6+ years technical experience in software engineering, network engineering, or systems administration
OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration OR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.
Other Requirements:Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Citizenship & Citizenship Verification: This position requires verification of U.S. citizenship due to citizenship- based legal restrictions. Specifically, this position supports United states federal,state, and/or local(or applicable country) United States government agency customers and is subject to certain citizenship-based restrictions where required or permitted by applicable law. To meet this legal requirement, citizenship will be verified via a valid passport.
Preferred Qualifications:
Experience applying SRE principles in a large production environment.
Proficiency in cloud computing platforms (e.g., AWS, Azure, GCP) and related services (e.g., EC2, S3, VPC, IAM, Lambda).
Expertise in automation tools and frameworks (e.g., Terraform, Ansible, Chef, Puppet) and scripting languages (e.g., Python, Bash).
Deep understanding of containerization and orchestration technologies (e.g., Docker, Kubernetes).
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) and incident response processes.
Problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
Effective communication and collaboration skills, with the ability to work effectively in a cross-functional team environment.
Site Reliability Engineering IC4 - The typical base pay range for this role across the U.S. is USD $117,200 - $229,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $153,600 - $250,200 per year.Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-payMicrosoft will accept applications for the role until September 2, 2024#vivaengageMicrosoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations (https://careers.microsoft.com/v2/global/en/accessibility.html) .