Team |
Location |
ENGINEERING |
Remote |
What You Will Be Doing:
- Design, develop, and optimize data pipelines and architectures to support our composable CDP product.
- Work with large datasets and develop scalable data models on Google Bigquery, Databricks and Snowflake.
- Utilize Databricks for big data processing and machine learning workflows.
- Manage and optimize cloud infrastructure on AWS and GCP to ensure high performance and availability.
- Collaborate with cross-functional teams to understand data requirements and deliver solutions.
- Ensure data quality, governance, and compliance across all data pipelines.
- Implement best practices for data management, including data security and privacy.
- Lead and mentor junior data engineers, fostering a culture of continuous learning and improvement.
- Actively participate in the product development process as part of the core team.
- Publish product updates on Databricks and Snowflake marketplace.
Experience & Requirements:
- Database and Data Warehousing: Advanced experience with Snowflake, Databricks, and other data warehousing solutions.
- Data Transformation: Proficiency in tools like dbt, AWS Glue, Google Dataflow
- Cloud Platforms: Extensive experience with AWS and GCP.
- Programming Languages:
- Strong proficiency in Python programming language.
- Experience designing and implementing scalable and efficient data pipelines using Python and relevant libraries (such as Pandas, NumPy, SQLAlchemy, etc.).
- Solid understanding of ETL, Reverse ETL principles, data modeling, and schema design.
- Big Data Technologies: Experience with Apache Spark, Kafka, and other big data technologies.
- Data Integration and Automation: Familiarity with data integration and automation technologies (such as Airbyte, Apache Airflow, Luigi, etc.).
- DevOps and CI/CD: Familiarity with DevOps practices and tools like Jenkins, Docker, and Kubernetes.