Senior Data Engineer based in the USA, focused on architecting scalable data platforms, building real-time pipelines, and transforming raw data into meaningful insights that drive business impact.
I believe in creating data systems that are not only reliable and secure but also efficient and future-ready. My approach is rooted in simplicity, scalability, and performance — ensuring that every pipeline, model, or architecture I design empowers teams to make smarter, data-driven decisions.
In my spare time, I enjoy exploring advancements in AI/ML, mentoring upcoming engineers, experimenting with cloud technologies, and occasionally unwinding with music, long walks, and photography.
Skills
Python
SQL
Apache Spark
Apache Kafka
Apache Airflow
Apache Flink
dbt
Snowflake
Amazon Redshift
Google BigQuery
Azure Synapse
Delta Lake
Apache Iceberg
Docker
Kubernetes
Terraform
Jenkins
Git/GitHub
AWS
Azure
Google Cloud
Tableau
Power BI
Looker
Prometheus
Grafana
Datadog
Dec 2022 - Present
Senior Data Engineer - Labelbox
Designed and implemented ETL/ELT pipelines with Apache Airflow, AWS Glue, and dbt, processing
5TB of data daily from Kafka, S3, and RDS into Redshift and Snowflake for AI analytics.
Built real-time streaming applications with Apache Kafka, Kinesis, and Spark, providing high-quality
data for AI and reducing incident response time by 60%.
Modeled data warehouse using Star and Snowflake Schemas in Redshift, enabling scalable reporting
for BI tools like Tableau, Power BI, and QuickSight for AI model evaluation.
Led adoption of Great Expectations, Deequ, and Python-based validations, ensuring data quality
and compliance with GDPR, SOC 2, and HIPAA standards.
Integrated MLOps pipelines with Vertex AI, AWS Lambda, and Pinecone, enhancing AI feature en-
gineering and retrieval-augmented generation (RAG).
Managed infrastructure with Terraform, Docker, Kubernetes (EKS), and GitHub Actions, achieving
CI/CD automation and cost-optimized deployments for AI models.
Developed observability stack with Datadog, AWS CloudWatch, ELK, and Prometheus, ensuring
99.99% uptime for production data systems in AI data factories.
Oct 2019 - Nov 2022
Cloud Data Engineer - BlueLight
Architected hybrid cloud data platforms using Azure Synapse, Data Factory, and GCP BigQuery/Dataflow, enabling unified access to operational and patient data for AI-driven insights.
Built high-throughput pipelines with Apache Flink, Kafka Streams, and Pub/Sub, reducing batch latency and supporting near-real-time data processing for patient alerting and operational efficiency.
Developed data models (Star, Snowflake, Data Vault) and implemented dbt transformations, streamlining patient cohort segmentation and BI reporting in Looker and Power BI.
Implemented data quality frameworks using dbt tests, custom Python scripts, and Great Expectations, ensuring compliance with HIPAA and SOC 2 while maintaining data integrity.
Containerized microservices and DAGs with Docker, deployed via Kubernetes (AKS, GKE), and automated using GitLab CI/CD and Terraform for Infrastructure as Code.
Collaborated with data scientists to productionize ML models, leveraging MLOps best practices, FAISS for vector search, and secured access through IAM, KMS, and HashiCorp Vault.
Enabled end-to-end monitoring and auditing via Grafana, Splunk, and Monte Carlo, reducing incident detection time and supporting data lineage reporting for operational transparency.
July 2016 - Aug 2019
Data Infrastructure Engineer -
Teza Technologies
Built and maintained CI/CD pipelines using TFS, Git, and Jenkins to automate builds, tests, and deployments, accelerating delivery of financial data infrastructure.
Deployed and managed trading data pipelines on AWS and Azure, ensuring high availability, low-latency, and consistent performance across hybrid environments.
Automated infrastructure provisioning with Terraform, PowerShell, and Ansible, streamlining deployments and reducing manual overhead for quant research teams.
Monitored system health and pipeline performance with CloudWatch and Azure Monitor, implementing RBAC and encryption to meet financial data security requirements.
Enhanced data quality and reliability by adding automated validation, error handling, and recovery mechanisms, supporting accurate and timely trading strategies.