Tl;dr: This blog post describes how we developed an efficient, reliable Python ecosystem using Pants, an open source build system, and solved the challenge of managing Python applications at a large scale at Coinbase.
By The Coinbase Compute Platform Team
Python is one of the most frequently used programming languages for data scientists, machine learning practitioners, and blockchain researchers at Coinbase. Over the past few years, we have witnessed a growth of Python applications that aim to solve many challenging problems in the cryptocurrency world like Airflow data pipelines, blockchain analytics tools, machine learning applications, and many others. Based on our internal data, the number of Python applications has almost doubled since Q3, 2022. According to our internal data, today there are approximately 1,500 data processing pipelines and services developed with Python. The total number of builds is around 500 per week at the time of writing. We foresee an even wider application as more Python centric frameworks (such as Ray, Modin, DASK, etc.) are adopted into our data ecosystem.
Engineering success comes largely from choosing the right tools. Building a large-scale Python ecosystem to support our growing engineering requirements could raise some challenges, including using a reliable build system, flexible dependency management, fast software release, and consistent code quality check. However, these challenges can be combated by integrating Pants, a build system developed by Toolchain labs, into the Coinbase build infrastructure. We chose this as the Python build system for the following reasons:
- Pants is ergonomic and user-friendly,
- Pants understands many build-related commands, such as “test”, “lint”, “fmt”, “typecheck”, and “package”
- Pants was designed with real-world Python use as a first-class use-case, including handling third party dependencies. In fact, parts of Pants itself is written in Python (with the rest written in Rust).
- Pants requires less metadata and BUILD file boilerplate than other tools, thanks to the dependency inference, sensible defaults and auto-generation of BUILD files. Bazel requires a huge amount of handwritten BUILD boilerplate.
- Pants is easy to extend, with a powerful plugin API that uses idiomatic Python 3 async code, so that users can have a natural control flow in their plugins.
- Pants has true OSS governance, where any org can play an equal role.
- Pants has a gentle learning curve. It has much less…
Click Here to Read the Full Original Article at The Coinbase Blog – Medium…