Chorus One is one of the leading operators of infrastructure for Proof-of-Stake networks and decentralized protocols. Tens of thousands of retail customers and institutions are staking billions in assets through our infrastructure, helping to secure protocols and earn rewards. Our mission is to increase freedom and speed of innovation through decentralized technologies. We are a diverse team of 70+ people distributed globally. We value radical transparency, striving for excellence and improvement while treating each other with kindness and generosity. If this resonates with you, we’d love to hear from you.
As a senior software engineer, you will join one of our engineering teams to assist in building and maintaining tools and automation to support our validator operations. We take the upstream node software from projects like Ethereum, Solana, Cosmos, or Avalanche; compile it; run it on one of our servers; and then make sure it is reliable and secure, monitor it, and keep it up to date. We do this for more than 60 blockchain networks, which means that it is not feasible to do all of this by hand. Instead, we build automation.
Some of the things we do:
- Contribute to upstream software to improve observability, and build monitoring tools from scratch where none exist. The teams that build the node software are not the teams that operate this software at scale, and as such, observability is often not a first priority. We develop our own tools for on-chain and off-chain monitoring, both for short-term metrics (to alert on) and long-term metrics to measure our performance, and to support optimization decisions.
- Build tools to track and manage our fleet of servers. We work mostly with bare-metal servers across multiple providers. This means that no vendor-specific portal is going to give us a complete overview of our infrastructure; instead, we have an in-house tool that integrates with vendor APIs and gives us a central overview.
- Automate machine provisioning. Instead of working with 10+ cloud and bare metal providers’ flavor of installing Ubuntu, we build our own installer that is uniform across our infrastructure.
- Track and automate builds. Each of the 60+ networks we operate regularly releases updates. It would be tedious to manually git pull && make for every release; instead, we have automation watching for new releases that automatically builds them and registers them in our package registry.
- Automate updates and failover. When we have a new package, we still need to roll it out to our fleet and restart any nodes, in a controlled manner and without downtime. For validating nodes, we also need to fail over before we restart them and confirm the new node is healthy. To automate this, we need to have 100% confidence in our tooling, because a mistake here can lead to double-signing, which incurs a financial penalty.
- Automate snapshot creation and storage. Blockchain node software is stateful in nature: the chains are often terabytes in size. While it is possible for new nodes to sync from the P2P network, this can take days to weeks, which means it is not a suitable method when we move workloads between machines. We automate taking snapshots of this data so we can be more flexible about what runs where, without compromising on security.
It is written in a mix of Rust, Python, Go, and a bit of TypeScript. We use Postgres as our database of choice. We deploy our code either directly onto Ubuntu hosts, running under systemd, or in Docker containers, and we also have a Kubernetes cluster running various stateless applications. Due to the diverse nature of the software we run, we also have to occasionally dive into codebases written in C, C++, OCaml, or TypeScript.
You can learn more about our approach to operating nodes in our Network Handbook.
When applying, please let Chorus One know you found this job through our website. This helps us continue to provide this service!
If you're excited about this opportunity and meet the requirements, we'd love to hear from you!
Apply NowLast updated: September 26, 2024
Nethermind
Seeking experienced Site Reliability Engineer Technical Lead to oversee SRE practices, automate systems, manage hybrid cloud solutions, and mentor team members in a remote environment.