Data Lake Implementation Specialist

Full Time2 weeks ago

Employment Information

We are seeking an EXPERIENCED Data Lake Implementation Specialist to be responsible for guiding the setup and/or integration of on-premises and cloud data lakes to enable real-time analytics and AI in medium to large digital businesses. Experience in Apache Doris is an added advantage.

Core Skills & Expertise

Data Lake Architecture (Hybrid & Multi-Cloud)


Designing modern data lakehouses with raw + curated layers, unified batch + streaming ingestion
Integration with enterprise systems and support for schema-on-read
Familiarity with lakehouse tools: Delta Lake, Apache Iceberg, Hudi


Real-Time Data Processing


Expertise with streaming architectures: Apache Kafka, Flink, Spark Streaming
Experience with event-driven design, CDC, and real-time ETL tool
Delivered at least one large-scale Doris-based or comparable OLAP system in production
Tools: Debezium, StreamSets, Apache NiFi


Cloud & On-Prem Data Services


Cloud: AWS (S3, Glue, EMR, Kinesis), Azure (ADLS Gen2, Synapse), GCP (BigLake, Dataflow)
On-prem: Hadoop, Cloudera, MapR, private cloud environments


 

AI/ML Enablement

Data Preparation for AI/ML


Building pipelines for feature extraction and versioning datasets
Integration with feature stores and data quality enforcement



ML Ops Readiness



Integration with ML pipelines (Kubeflow, MLflow, SageMaker)
Model deployment, tuning, and monitoring at scale


Analytics & BI Integration


Support for BI tools (Power BI, Tableau) and fast querying layers (Presto, Trino)
Near real-time dashboard enablement


 

Governance, Observability, and Security

Enterprise Data Governance


Implementing data ownership, lineage, and access policies
Use of catalogs: Collibra, Apache Atlas, AWS Glue Catalog


Observability & Monitoring


End-to-end pipeline visibility, logs, and metrics
Tools: Prometheus, Grafana, OpenTelemetry, Monte Carlo


Security & Compliance


Encryption, tokenization, and data masking
Adhering to regulations: GDPR, HIPAA, SOC2


 

Execution Experience

Large-Scale Implementations


Hands-on delivery of hybrid data lake architectures
Experience with syncing on-prem and cloud data systems


Cross-Functional Leadership


Working with data scientists, product teams, and security teams
Leading data platform teams or Centers of Excellence


Agility at Scale


Agile delivery models for data initiatives
Delivering data products and ML capabilities incrementally


 

Ideal candidate profile summary

A hands-on and strategic data lake architect/engineer with deep knowledge of hybrid and multi-cloud systems, proven experience with streaming data and ML enablement, and the leadership to orchestrate teams around real-time analytics and decision intelligence for digital enterprise scale.

 

Bonus: Certifications & Tools

Certifications


AWS/GCP/Azure Data Engineer or ML Engineer
Databricks Lakehouse Accreditation
CDMP or DAMA certification


Tools Stack


Airflow, dbt, Spark, Flink, Kafka
Terraform, GitOps, CI/CD
MLflow, Feature Store, SageMaker, Vertex AI
Apache Ranger, Atlas, Lake Formation