The Operations Engineer will support the hardware/software/network technologies production environment by proactively monitoring and quickly responding to hardware/software/network incidents across multiple technologies within the technical area of expertise. Collaborate with vendor/contractor partners to develop and implement detailed design, configuration and engineering strategies/solutions to resolve issues/incidents while remaining focused on security, up-time and performance. Provide troubleshooting and resolution to complex problems/issues. Responsibilities
Major Areas of Accountability:
- Exposure to container orchestration technologies like Kubernetes/Mesosphere
- Build and Operationalize EKS clusters and related infrastructure in AWS. Creation and maintenance of provisioning and configuration automation for the EKS clusters, such as Cloud Formation Templates and Ansible playbooks. Creation of automated jobs for operational and maintenance activities (e.g. remove a member from cluster, patching, container stop/start, etc.). Deployment of application containers onto the clusters.
- Application server administration and support, including but not limited to IIS, Tomcat, WebSphere, and Apache. Application deployment to the cluster environments. Server script development used to administer and deploy content to the environment. Assist in troubleshooting applications using JVM and performance diagnostic tools.
- Troubleshooting & Incident Management ~~ Proactive Monitoring & Preventative Maintenance ~~ Analysis ~~ Leadership & Partnerships ~~ Processes, Standards & Best Practices ~~ Documentation ~~ Continuous Learning
- Perform complicated, difficult, and independent assignments in the troubleshooting, problem diagnosis, problem resolution and ongoing production support for multiple technologies with the technical area of expertise. Responsible for designing, reviewing and approving and deploying robust, stable and manageable solutions while minimizing hardware/software/network downtime. Frequently assist in the procurement, configuration, and integration of new technologies.
- Ensure the up time and response time SLAs/OLAs for services are met and or exceeded. Proactively monitor the stability and performance of various technologies within area of expertise and takes appropriate corrective action prior to an incident or problem occurring. Ensure patching and regular maintenance is performed as required. Actively collaborate with fellow members of the team and contractors/vendors on bridge calls to prevent or resolve incidents/problems in an expeditious manner.
- Recommend, document and deploy and design strategies and solutions for complex software/hardware/network engineering problems, based upon comprehensive analysis of business goals, objectives, requirements and existing technologies. Independently identify key issues, patterns and deviations during the analysis. Recommend robust solutions utilizing pragmatic judgment, creativity, and in-depth technical knowledge and evaluation that comprehensively meet the needs of the business.
- Manage effective relationships and works in partnership with leadership, team members, vendors, and contractors to deliver robust technical solutions ensuring that service level commitments and project time lines are maintained. Provide technical expertise, leadership, direction and prioritization of work to team members ensuring outstanding service delivery. Frequently mentor, coach and contribute to the development of peers and other team members.
- Provide thought leadership and influence the continual refinement of processes, policies and best practices to ensure the highest possible performance and availability of technologies. Promote re-use and develop consistent technical build, implementation and support processes. Responsible for the validation and adherence to defined standards. Ensure ongoing improvements align with IT Information Library (ITIL) principles and technology Systems Development Life Cycle (SDLC) methods.
- Create, maintain and update documentation of detailed design documents, diagrams, engineering specifications, build changes, models, troubleshooting and support guides, systems metrics and Standard Operating Procedures as required to ensure operational excellence.
- Bachelors degree in Computer Science, Engineering, or related field; or equivalent work experience.
- 5-7 years of relevant experience required.
- 5-7 years proven engineering expertise within the subject matter domain.
- Ability to support working outside of normal business hours to provide after hour or "on-call" support when necessary to solve high profile incidents/problems.
- Highly innovative problem solver with strong analytical and customer service abilities.
- Ability to communicate and articulate technical information across various organizational levels.
- High reasoning aptitude and ability to quickly understand complex operating environments.
- Certifications Preferred: ITIL Foundations, AWS Certified Solutions Architect Associate
- Implementation\architectural experience in AWS Cloud, Kubernetes/Mesosphere, Docker, Red Hat Linux, Windows Server, Ansible, Python, Storage Area Network (SAN/NAS), Clustering, Volume Management, Filesystems, Disaster Recovery, Application Recovery, Cloud Computing concepts, Web\Database Farm concepts.
- Strong communications skills both verbal and written.
- Domains: Middleware, Infrastructure, Cloud Computing