Job description
Aufgaben
Key Responsibilities
Data Engineering & Platform Architecture
- Architect, design, and govern scalable batch and streaming data platforms using Python (strong OOP design), Apache Spark (PySpark, Spark SQL), and Azure Databricks.
- Define end-to-end reference architectures for real-time and near-real-time streaming solutions using Apache Kafka or Apache Flink, with Spark Structured Streaming or Apache Flink, ensuring:
- Event-time correctness
- Exactly-once processing guarantees
- Stateful stream processing and checkpointing
- High availability and fault tolerance
- Own the architectural design of Medallion architecture (Bronze, Silver, Gold) on ADLS Gen2 with Delta Lake, including data lifecycle, retention, and cost optimization strategies.
- Lead data governance architecture, defining standards for security, access control, lineage, and metadata using Databricks Unity Catalog.
- Design domain-oriented, scalable data products aligned with Data Mesh and cloud-native architecture principles.
- Define integration patterns for data services, streaming producers/consumers, and microservices-based architectures.
- Architect and oversee deployment of data and streaming workloads on AKS, including container strategy, scaling, resiliency, and networking.
- Define and enforce performance, scalability, and reliability standards for Spark and streaming workloads (partitioning, Z-ordering, state tuning, caching).
- Establish data quality, validation, and schema evolution standards for both batch and streaming pipelines.
- Design secure-by-default architectures using Azure IAM, RBAC, Key Vault, private endpoints, VNet integration, and network isolation.
- Lead CI/CD and DevOps architecture using GitLab, enabling automated testing, deployment, rollback, and environment promotion.
- Define Infrastructure as Code architecture using Terraform and Pulumi for repeatable, auditable deployments.
- Establish observability architecture (monitoring, logging, alerting) across Databricks, AKS, streaming platforms, and Azure services.
- Review and approve solution designs, ADRs, and technical proposals, ensuring alignment with enterprise standards.
Product Owner (PO) Responsibilities
- Act as Technical Product Owner for data platforms and streaming solutions, owning one or more data domains or products.
- Partner with business stakeholders, analysts, and architects to translate business objectives into architecture-aligned epics, user stories, and acceptance criteria.
- Own and prioritize the product and technical backlog, balancing feature delivery, technical debt, scalability, security, and cost efficiency.
- Define and track product KPIs (data latency, data quality, availability, adoption, platform cost).
- Drive roadmap definition for data and streaming platforms, aligned with enterprise strategy and architectural vision.
- Manage dependencies, risks, and cross-team coordination across engineering, analytics, DevOps, and security teams.
- Support release planning, stakeholder communication, and architectural decision-making.
- Act as a subject-matter expert and decision authority for the data platform.
Leadership & Engineering Excellence
- Provide architectural and technical leadership to data engineers and cross-functional teams.
- Conduct design reviews, code reviews, and architecture walkthroughs.
- Mentor engineers on distributed systems, streaming design, cloud-native patterns, and performance optimization.
- Establish and enforce coding standards, design patterns, and best practices.
- Champion continuous improvement, innovation, and engineering excellence.
Required Skills & Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 8–12 years of experience in Data Engineering, with demonstrated architecture ownership on Azure-based platforms.
- Strong proficiency in Python, with solid OOP, design patterns, and system design principles.
- Deep expertise in Apache Spark (PySpark, Spark SQL) and Azure Databricks.
- Strong hands-on and architectural experience with streaming platforms:
- Apache Kafka OR Apache Flink
- Spark Structured Streaming
- Proven experience designing microservices and event-driven architectures.
- Strong experience deploying and operating workloads on Azure Kubernetes Service (AKS).
- Deep understanding of Delta Lake and large-scale lakehouse architectures.
- Advanced SQL skills for analytics and optimization.
- Strong experience with Azure ADLS Gen2, Databricks, Azure Functions, Service Bus, Key Vault.
- Strong experience with Git, GitLab CI/CD, and release management.
- Experience with Terraform and Pulumi for enterprise-grade IaC.
- Strong knowledge of data modeling, distributed systems, and fault tolerance
Nice to Have
- Experience with both Kafka and Flink in large-scale production systems.
- Exposure to Kafka Schema Registry, CDC, and event versioning.
- Experience implementing Data Mesh or domain-driven data platforms.
- Exposure to DevSecOps and Zero Trust architectures.
- Experience with Generative AI / LLM-enabled data platforms (RAG, embeddings, vector databases).
- Background in regulated or large enterprise environments (banking, automotive, telecom).
Qualifikationen
Key Responsibilities
Data Engineering & Platform Architecture
- Architect, design, and govern scalable batch and streaming data platforms using Python (strong OOP design), Apache Spark (PySpark, Spark SQL), and Azure Databricks.
- Define end-to-end reference architectures for real-time and near-real-time streaming solutions using Apache Kafka or Apache Flink, with Spark Structured Streaming or Apache Flink, ensuring:
- Event-time correctness
- Exactly-once processing guarantees
- Stateful stream processing and checkpointing
- High availability and fault tolerance
- Own the architectural design of Medallion architecture (Bronze, Silver, Gold) on ADLS Gen2 with Delta Lake, including data lifecycle, retention, and cost optimization strategies.
- Lead data governance architecture, defining standards for security, access control, lineage, and metadata using Databricks Unity Catalog.
- Design domain-oriented, scalable data products aligned with Data Mesh and cloud-native architecture principles.
- Define integration patterns for data services, streaming producers/consumers, and microservices-based architectures.
- Architect and oversee deployment of data and streaming workloads on AKS, including container strategy, scaling, resiliency, and networking.
- Define and enforce performance, scalability, and reliability standards for Spark and streaming workloads (partitioning, Z-ordering, state tuning, caching).
- Establish data quality, validation, and schema evolution standards for both batch and streaming pipelines.
- Design secure-by-default architectures using Azure IAM, RBAC, Key Vault, private endpoints, VNet integration, and network isolation.
- Lead CI/CD and DevOps architecture using GitLab, enabling automated testing, deployment, rollback, and environment promotion.
- Define Infrastructure as Code architecture using Terraform and Pulumi for repeatable, auditable deployments.
- Establish observability architecture (monitoring, logging, alerting) across Databricks, AKS, streaming platforms, and Azure services.
- Review and approve solution designs, ADRs, and technical proposals, ensuring alignment with enterprise standards.
Product Owner (PO) Responsibilities
- Act as Technical Product Owner for data platforms and streaming solutions, owning one or more data domains or products.
- Partner with business stakeholders, analysts, and architects to translate business objectives into architecture-aligned epics, user stories, and acceptance criteria.
- Own and prioritize the product and technical backlog, balancing feature delivery, technical debt, scalability, security, and cost efficiency.
- Define and track product KPIs (data latency, data quality, availability, adoption, platform cost).
- Drive roadmap definition for data and streaming platforms, aligned with enterprise strategy and architectural vision.
- Manage dependencies, risks, and cross-team coordination across engineering, analytics, DevOps, and security teams.
- Support release planning, stakeholder communication, and architectural decision-making.
- Act as a subject-matter expert and decision authority for the data platform.
Leadership & Engineering Excellence
- Provide architectural and technical leadership to data engineers and cross-functional teams.
- Conduct design reviews, code reviews, and architecture walkthroughs.
- Mentor engineers on distributed systems, streaming design, cloud-native patterns, and performance optimization.
- Establish and enforce coding standards, design patterns, and best practices.
- Champion continuous improvement, innovation, and engineering excellence.
Required Skills & Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 8–12 years of experience in Data Engineering, with demonstrated architecture ownership on Azure-based platforms.
- Strong proficiency in Python, with solid OOP, design patterns, and system design principles.
- Deep expertise in Apache Spark (PySpark, Spark SQL) and Azure Databricks.
- Strong hands-on and architectural experience with streaming platforms:
- Apache Kafka OR Apache Flink
- Spark Structured Streaming
- Proven experience designing microservices and event-driven architectures.
- Strong experience deploying and operating workloads on Azure Kubernetes Service (AKS).
- Deep understanding of Delta Lake and large-scale lakehouse architectures.
- Advanced SQL skills for analytics and optimization.
- Strong experience with Azure ADLS Gen2, Databricks, Azure Functions, Service Bus, Key Vault.
- Strong experience with Git, GitLab CI/CD, and release management.
- Experience with Terraform and Pulumi for enterprise-grade IaC.
- Strong knowledge of data modeling, distributed systems, and fault tolerance.
Nice to Have
- Experience with both Kafka and Flink in large-scale production systems.
- Exposure to Kafka Schema Registry, CDC, and event versioning.
- Experience implementing Data Mesh or domain-driven data platforms.
- Exposure to DevSecOps and Zero Trust architectures.
- Experience with Generative AI / LLM-enabled data platforms (RAG, embeddings, vector databases).
- Background in regulated or large enterprise environments (banking, automotive, telecom).
This job post has been translated by AI and may contain minor differences or errors.