Allen M.

Senior Data Engineer based in the USA, focused on architecting scalable data platforms, building real-time pipelines, and transforming raw data into meaningful insights that drive business impact.


I believe in creating data systems that are not only reliable and secure but also efficient and future-ready. My approach is rooted in simplicity, scalability, and performance — ensuring that every pipeline, model, or architecture I design empowers teams to make smarter, data-driven decisions.


In my spare time, I enjoy exploring advancements in AI/ML, mentoring upcoming engineers, experimenting with cloud technologies, and occasionally unwinding with music, long walks, and photography.

Skills

Python
SQL
Apache Spark
Apache Kafka
Apache Airflow
Apache Flink
dbt
Snowflake
Amazon Redshift
Google BigQuery
Azure Synapse
Delta Lake
Apache Iceberg
Docker
Kubernetes
Terraform
Jenkins
Git/GitHub
AWS
Azure
Google Cloud
Tableau
Power BI
Looker
Prometheus
Grafana
Datadog
Dec 2022 - Present

Senior Data Engineer - Labelbox

  • Designed and implemented ETL/ELT pipelines with Apache Airflow, AWS Glue, and dbt, processing 5TB of data daily from Kafka, S3, and RDS into Redshift and Snowflake for AI analytics.
  • Built real-time streaming applications with Apache Kafka, Kinesis, and Spark, providing high-quality data for AI and reducing incident response time by 60%.
  • Modeled data warehouse using Star and Snowflake Schemas in Redshift, enabling scalable reporting for BI tools like Tableau, Power BI, and QuickSight for AI model evaluation.
  • Led adoption of Great Expectations, Deequ, and Python-based validations, ensuring data quality and compliance with GDPR, SOC 2, and HIPAA standards.
  • Integrated MLOps pipelines with Vertex AI, AWS Lambda, and Pinecone, enhancing AI feature en- gineering and retrieval-augmented generation (RAG).
  • Managed infrastructure with Terraform, Docker, Kubernetes (EKS), and GitHub Actions, achieving CI/CD automation and cost-optimized deployments for AI models.
  • Developed observability stack with Datadog, AWS CloudWatch, ELK, and Prometheus, ensuring 99.99% uptime for production data systems in AI data factories.
Oct 2019 - Nov 2022

Cloud Data Engineer - BlueLight

  • Architected hybrid cloud data platforms using Azure Synapse, Data Factory, and GCP BigQuery/Dataflow, enabling unified access to operational and patient data for AI-driven insights.
  • Built high-throughput pipelines with Apache Flink, Kafka Streams, and Pub/Sub, reducing batch latency and supporting near-real-time data processing for patient alerting and operational efficiency.
  • Developed data models (Star, Snowflake, Data Vault) and implemented dbt transformations, streamlining patient cohort segmentation and BI reporting in Looker and Power BI.
  • Implemented data quality frameworks using dbt tests, custom Python scripts, and Great Expectations, ensuring compliance with HIPAA and SOC 2 while maintaining data integrity.
  • Containerized microservices and DAGs with Docker, deployed via Kubernetes (AKS, GKE), and automated using GitLab CI/CD and Terraform for Infrastructure as Code.
  • Collaborated with data scientists to productionize ML models, leveraging MLOps best practices, FAISS for vector search, and secured access through IAM, KMS, and HashiCorp Vault.
  • Enabled end-to-end monitoring and auditing via Grafana, Splunk, and Monte Carlo, reducing incident detection time and supporting data lineage reporting for operational transparency.
July 2016 - Aug 2019

Data Infrastructure Engineer - Teza Technologies

  • Built and maintained CI/CD pipelines using TFS, Git, and Jenkins to automate builds, tests, and deployments, accelerating delivery of financial data infrastructure.
  • Deployed and managed trading data pipelines on AWS and Azure, ensuring high availability, low-latency, and consistent performance across hybrid environments.
  • Automated infrastructure provisioning with Terraform, PowerShell, and Ansible, streamlining deployments and reducing manual overhead for quant research teams.
  • Monitored system health and pipeline performance with CloudWatch and Azure Monitor, implementing RBAC and encryption to meet financial data security requirements.
  • Enhanced data quality and reliability by adding automated validation, error handling, and recovery mechanisms, supporting accurate and timely trading strategies.

Bachelor in Science- Computer Science