Submitting more applications increases your chances of landing a job.
Here’s how busy the average job seeker was last month:
Opportunities viewed
Applications submitted
Keep exploring and applying to maximize your chances!
Looking for employers with a proven track record of hiring women?
Click here to explore opportunities now!You are invited to participate in a survey designed to help researchers understand how best to match workers to the types of jobs they are searching for
Would You Be Likely to Participate?
If selected, we will contact you via email with further instructions and details about your participation.
You will receive a $7 payout for answering the survey.
We are seeking a highly skilled and experienced Senior Data Engineer to join our growing team in Bangalore, India. We operate a large-scale private cloud infrastructure spanning thousands of servers across multiple data centers, built on OpenStack, Kubernetes, Ceph, and VMware. In this role, you will design, build, and maintain scalable data pipelines that collect, process, and deliver data from across this infrastructure to power analytics, capacity planning, cost optimization, and AI/ML initiatives. You will collaborate closely with data scientists, platform engineers, SRE, and product teams to deliver robust, real-time, and batch data solutions.
Design, develop, and maintain scalable data pipelines for ingestion, transformation, and delivery of structured and unstructured data
Build and optimize real-time streaming architectures using Apache Kafka and related ecosystem tools
Develop and manage ETL/ELT workflows using dbt (dbt Labs) to support analytics, reporting, and AI/ML model training
Implement data collection strategies from diverse infrastructure sources including OpenStack, Kubernetes, Ceph, VMware, and ServiceNow (Snow), as well as APIs, databases, and log files
Collaborate with AI/ML teams to build feature stores and prepare training datasets at scale
Ensure data quality, integrity, and governance through monitoring, validation, automated testing frameworks, and metadata management using DataHub
Implement and maintain data quality validation across pipelines (e.g. Great Expectations) to ensure correctness, completeness, consistency, and freshness of data at every stage
Optimize data storage and processing solutions within a private cloud environment (OpenStack, Ceph, Kubernetes)
Build and manage observability and monitoring solutions with strong emphasis on the ELK stack (Elasticsearch, Logstash, Kibana) and Prometheus as core platforms, complemented by OpenTelemetry for distributed tracing and telemetry collection
Mentor junior engineers and contribute to engineering best practices and technical documentation
You have:
Bachelor’s or master’s degree in computer science, Data Engineering, or a related field with 12+ years of professional experience and 6+yrs experience in data engineering or a closely related discipline. Strong expertise in data pipeline design, data modelling, and data manipulation at scale.
Strong hands-on experience with the ELK stack (Elasticsearch, Logstash, Kibana) and Prometheus — these are essential to the role.
Deep experience with SQL and NoSQL databases (PostgreSQL, MongoDB, Cassandra, etc.)
Hands-on experience with Apache Kafka (or equivalent streaming platforms such as Apache Pulsar)
Experience with dbt (dbt Labs) for data transformation, modelling, and testing
Experience with data quality frameworks (e.g. Great Expectations) and pipeline validation practices such as data contracts, automated testing, and anomaly detection
Solid knowledge of big data technologies such as Apache Spark, Hadoop, or Flink
Experience with open table formats, particularly Apache Iceberg, for large-scale data lakehouse architectures
Familiarity with private cloud platforms (OpenStack, VMware) and containerization (Docker, Kubernetes)
Experience with OpenTelemetry for instrumentation, distributed tracing, and telemetry data collection
Nice to have:
Proficiency in Python, Scala, or Java for data processing and automation
Experience building data infrastructure to support AI/ML workflows and model serving
Familiarity with LLM tooling, vector databases (e.g. Milvus), and AI data pipelines
Knowledge of data governance frameworks, compliance standards, and metadata platforms such as DataHub
Experience with orchestration tools such as Apache Airflow or Prefect. Experience collecting and processing data from Ceph storage clusters, OpenStack APIs, or VMware vCenter
Familiarity with ServiceNow (Snow) for CMDB, ITSM data extraction, and asset management reporting.
Contributions to open-source data engineering projects
You'll no longer be considered for this role and your application will be removed from the employer's inbox.