Introducing Resumonk AI Plan! Leverage AI rewrites & personalized suggestions to create a winning resume. Start your free trial now.

× Close

Published almost 3 years ago

site reliability engineering manager aws distributed systems

Company Description

Shopify is now permanently remote and working towards a future that is digital by default. Learn more about what this can mean for you.

Over 1.7 million businesses have bet their success on the stability and performance of the Shopify platform. In order to support these growing businesses—as well as the next million—our systems need to be fast, reliable, scalable and secure. Accomplishing that will require people like you: talented, curious, growth-minded and empathetic engineering managers that are excited to build, support and lead our infrastructure teams.

Job Description

Production Engineering, which is part of our core engineering organization, builds, operates and improves the heart of Shopify’s technical platform. We are a fast-growing team focused on building and maintaining tools and services to unlock the power of planet scale infrastructure for all of Shopify’s merchants, buyers and developers.

Shopify has grown rapidly over the last number of years. As an experienced infrastructure engineering manager, we need your help to both start new teams and expand and grow the missions of our existing teams. There are multiple positions available on a variety of teams and we will work with you as part of the interview process to identify which team best fits your interests, needs and experience.

Here is a sampling of some of the teams, systems and projects to which you could contribute:

  • Expand the reach of our search systems to standardize the way we index documents in different languages and in various locations around the world

  • Scale a team looking at solving issues with shopping cart access, configuration plane information and package tracking data using a globally accessible, high write key/value store

  • Grow the capacity of our worldwide distributed site reliability engineering teams, consulting with other engineering groups on how to build low latency, highly resilient systems

  • Take our observability systems to the next level, expanding and evangelizing the usage of tracing, metrics and structured logging across the company

  • Work on expanding our highly scalable and configurable job system to support all of the applications on the platform

  • Keep our databases operating optimally using proxies, load shedding, custom routing layers and application transparent sharding

  • Build manipulation primitives such as combination and filtering into our streaming infrastructure to allow teams to translate existing data streams into specific business problems

Qualifications

While we don’t need you to have specific experience with our technology stack, these are leadership positions that do require that you have:

  • Proven management and leadership skills, allowing you to develop and mentor others as well as build credibility with your team while executing broader engineering strategies

  • Demonstrated proficiency designing and improving the development, delivery and automation of software infrastructure within a cloud environment

  • Experience developing and designing solutions in a modern, high-level/systems programming language (Go, Ruby, Python, Java, C++, C, etc…)

  • Familiarity working with senior stakeholders across the organization, both technical and non technical, to develop roadmaps, integrate with larger company initiatives and deliver business and engineering value.

If you have experience in any of the following areas, that will certainly be put to good use. But if you don’t, that’s ok – the faster you apply, the quicker we can get to teaching you about:

  • Building services and deploying them on top of Kubernetes and/or Google Cloud Platform

  • Familiarity with how to design, build, understand and maintain distributed systems

  • Working with Terraform and/or other infrastructure orchestration tooling

  • Participating in an on call rotation and/or site reliability engineering (SRE) experience

  • Automating infrastructure operations

Additional information

We know that applying to a new role takes a lot of work and we truly value your time. We’re looking forward to reading your application.

At Shopify, we are committed to building and fostering an environment where our employees feel included, valued, and heard. Our belief is that a strong commitment to diversity and inclusion enables us to truly make commerce better for everyone. We strongly encourage applications from Indigenous peoples, racialized people, people with disabilities, people from gender and sexually diverse communities and/or people with intersectional identities.

Location

United States, Canada