HPC Monitoring Tools and Operations Specialist

HPC Monitoring Tools and Operations Specialist

19 Oct 2024
California, Livermore 00000 Livermore USA

HPC Monitoring Tools and Operations Specialist

Vacancy expired!

HPC Monitoring Tools and Operations SpecialistLocation:

Livermore, CACategory:

Technicians/ITOrganization:

ComputingPosting Requirement:

External DOE Q or Top SecretJob ID: 106259Job Code: Systems & Network Associate (393.1)Date Posted: October 18 2019Share this Job Apply Now

Apply For This JobJoin us and make YOUR mark on the World!Come join Lawrence Livermore National Laboratory (LLNL) where we apply science and technology to make the world a safer place; now one of 2019 Best Places to Work by Glassdoor!We have an opening for a High Performance Computing (HPC) Monitoring Tools and Operations Specialist to create and support automated real time monitoring and incident creation for High Performance Computing systems, while contributing to the computing operations team. You will use monitoring tools to provide advanced technical support of high-performance computer systems, as well as massive storage systems, networks and facilities. This position is in the Livermore Computing (LC) Division in the Computing Directorate.Essential Duties

Independently resolve problems and suggest original solutions to streamline HPC operational processes.

Develop tools and/or utilities to enhance and maintain facility wide monitoring solutions.

Collaborate with management, ServiceNow developers, vendors, & system administrators to gather requirements and design recommended solutions for operational automation.

Provide advanced technical support and monitoring for all HPC systems in LC; including large clusters, disk storage systems, Ethernet, and InfiniBand networks and Archival Storage systems.

Create a variety of diagnostic tools to monitor systems including troubleshooting software, hardware, networks, and document issues.

Direct complex troubleshooting by receiving, documenting, and accommodating all customer calls, including monitoring and resolving issues.

Perform computer security, degaussing, system installations, decommissioning of older HPC systems, and other physical security related duties, as required by established policies and procedures.

Perform other duties as assigned.

Qualifications

Bachelor’s degree in a Computer or Engineering related field or equivalent combination of technical training and experience.

Broad and in-depth knowledge of Splunk and ServiceNow, including developing Splunk dashboards, as well as experience developing tools with Splunk, Skummee, Nagios and/or other monitoring software used in HPC data centers.

Advanced knowledge of hardware and software diagnostic tools for high performance systems, file systems and networks.

Experience troubleshooting problems in a heterogeneous platform environment.

Advanced knowledge and training in Linux system administration.

Proficient interpersonal and communication (verbal and written) skills, with the ability to work independently and interact with a multi-disciplinary staff in a team environment.

Experience and knowledge of the skills needed for a customer support and vendor facing role to include a focus on listening, rapport-building, friendly and approachable nature, courtesy and patience, negotiating, and influencing skills.

Ability to work all shifts, including weekends and holidays.

Desired Qualifications

Experience writing intermediate and advanced UNIX scripts (shell, Perl, Python, etc.).

Knowledge and understanding of high-performance systems, parallel file systems, distributed systems, local area networks, and network protocols including InfiniBand.

Experience as a ServiceNow developer.

Pre-Employment Drug Test: External applicant(s) selected for this position will be required to pass a post-offer, pre-employment drug test. This includes testing for use of marijuana as Federal Law applies to us as a Federal Contractor.Security Clearance: This position requires an active Department of Energy (DOE) Q-level clearance or active Top Secret clearance issued by another U.S. government agency at time of hire. Note: This is a Career Indefinite position. Lab employees and external candidates may be considered for this position.About UsLawrence Livermore National Laboratory (LLNL), located in the San Francisco Bay Area (East Bay), is a premier applied science laboratory that is part of the National Nuclear Security Administration (NNSA) within the Department of Energy (DOE). LLNL's mission is strengthening national security by developing and applying cutting-edge science, technology, and engineering that respond with vision, quality, integrity, and technical excellence to scientific issues of national importance. The Laboratory has a current annual budget of about $2.1 billion, employing approximately 6,800 employees.LLNL is an affirmative action/ equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, marital status, national origin, ancestry, sex, sexual orientation, gender identity, disability, medical condition, protected veteran status, age, citizenship, or any other characteristic protected by law.

Job Details

  • ID
    JC2787415
  • State
  • City
  • Full-time
  • Salary
    N/A
  • Hiring Company
    Lawrence Livermore National Laboratory
  • Date
    2019-10-19
  • Deadline
    2019-12-17
  • Category

Jocancy Online Job Portal by jobSearchi.