Published 10 months ago

design grafana system

The Observability team owns the observability platform that monitors internal services of ClickHouse Cloud as well as the customers’ ClickHouse instances. As a part of the team you will be responsible for designing, building, operating and maintaining components of the petabyte-scale platform that stores trillions of events and processes tens of millions new events every second, owning its reliability, performance and availability. Our stack is built with open source technologies, such as OpenTelemetry and Grafana and of course based on ClickHouse.

In addition to the core responsibilities, the Observability team also manages the Product Metrics component of the platform. This function is essential for gathering and processing data for the internal billing and accounting system as well as customer-facing dashboards that provide our customers with immediate insights and analytics. The Product Metrics system is specifically designed to prioritize delivery guarantees, precision, and accuracy in handling extensive data volumes. Its primary goal is to provide data that is both prompt and accurate, supporting reliable operational decisions and enhancing the overall customer experience.

We are looking for highly skilled software and site reliability engineers to join our team. What will you do?

  • Take an active part in determining the roadmap for the Product Metrics component

  • Work closely within the team to deliver new features, iterate and improve them

  • Design, build, operate, and maintain business-critical petabyte-scale systems

  • Be responsible for the performance, reliability, availability and cost-efficiency of the Product Metrics component

  • Mentor and support other team members, participate in design discussions and collaborate with the team

  • Be a part of on-call rotation and take ownership of the services you’re running

  • Educate and lead efforts to improve observability among all engineering teams

What you bring along:

  • You demonstrate a strong initiative and a preference for action, high level of responsibility, ownership and accountability

  • You prioritize customer needs, ensuring that our products are designed with the user in mind

  • You are able to take on complex challenges and break them down to achieve short feedback loops: to analyze, design, and build modular solutions, deliver MVPs, gather data and feedback and then progress iteratively

  • You have a strong problem solving mindset and have solid production debugging skills

  • You have excellent communication skills and the ability to work well within a team and across engineering teams in a fully remote environment

  • You thrive in a fast-paced environment, and see yourself as a partner with the business with the shared goal of moving the business forward

Requirements:

  • 5+ years of relevant software development industry experience building and operating scalable, fault-tolerant, distributed systems

  • Solid experience with at least one programming language. We use Go, but if you have familiarity with Python, C, C++, Rust or similar that translates well

  • Experience with at least one of the major Cloud Service Providers such as AWS, GCP or Azure

  • Experience with data streaming/message brokering systems such as Kafka, RedPanda or similar

  • Experience with technologies such as Kubernetes, Helm, ArgoCD, Temporal as well as infrastructure-as-code tools such as Terraform

Bonus Points:

  • Experience with ClickHouse

  • Familiarity with open source or enterprise observability technologies; familiarity with solutions in metrics, logs, and tracing domains

LI-Remote

Salary and compensation

No salary data published by company so we estimated salary based on similar jobs related to Design, DevOps, Cloud, Senior and Engineer jobs that are similar:

$60,000 β€” $100,000/year

Benefits

πŸ’° 401(k)

🌎 Distributed team

⏰ Async

πŸ€“ Vision insurance

🦷 Dental insurance

πŸš‘ Medical insurance

πŸ– Unlimited vacation

πŸ– Paid time off

πŸ“† 4 day workweek

πŸ’° 401k matching

πŸ” Company retreats

🏬 Coworking budget

πŸ“š Learning budget

πŸ’ͺ Free gym membership

🧘 Mental wellness budget

πŸ–₯ Home office budget

πŸ₯§ Pay in crypto

πŸ₯Έ Pseudonymous

πŸ’° Profit sharing

πŸ’° Equity compensation

⬜️ No whiteboard interview

πŸ‘€ No monitoring system

🚫 No politics at work

πŸŽ… We hire old (and young)

Location

Singapore