Naman Kanwar

Data Engineer | Building AI-Driven Cloud Data Pipelines

Experience Projects Certs + Skills

Experience

Data Engineer | FanDuel

Nov 2024 - Present

Data Engineer specializing in single-source-of-truth migrations, architecting scalable batch and streaming infrastructure on AWS and Databricks, and modernizing legacy pipelines into highly optimized, enterprise frameworks.

Apache Spark Apache Spark
Databricks Databricks
AWS AWS
dbt dbt
flink Apache Flink
Apache Kafka Apache Kafka
Apache Airflow Apache Airflow

Data Engineer II | M3

Aug 2024 - Nov 2024

Data Engineer developing scalable data pipeline architectures and CI/CD frameworks, driving modernization and best practices across the Azure ecosystem.(expand for details)

  • Formulated and demonstrated a production-ready proof of concept (POC) for cloud data pipelines, establishing the core infrastructure across Azure DevOps, Apache Airflow, and virtual machines.
  • Conceptualized a comprehensive data lake architecture, authoring the foundational technical requirements and system documentation for enterprise implementation.
  • Standardized development workflows by introducing configuration-driven pipelines, secure key vaults, strict Git branching methodologies, and automated CI/CD deployment processes to elevate organizational data maturity.
  • Established code coverage, branching, and data maturity standards to ensure engineering consistency.
  • Formulated the strategic migration roadmap for transitioning legacy SQL Server Integration Services (SSIS) packages to modern cloud alternatives.
Azure Azure
Apache Airflow Apache Airflow
Azure DevOps Azure DevOps
SQL Server SQL Server

Data Engineer III | Walmart

Aug 2022 - Aug 2024

Cloud Data Engineer specializing in architecting multi-cloud streaming data lakes, automating CI/CD workflows, and leading Agile teams to deliver high-performance, cost-optimized data infrastructure.(expand for details)

  • Guided a team of 5 engineers to boost sprint velocity by 15%, leveraging strict Agile/Scrum methodologies to eliminate dependencies and guarantee consistent, on-time project delivery(Jira, Miro).
  • Architected a multi-cloud streaming data lake utilizing medallion principles and SCD tables, scaling infrastructure across both GCP and Azure ecosystems(Airflow, Dataproc, Dataflow, PySpark, BigQuery, GCS, Databricks).
  • Enforced enterprise coding and testing standards across major repositories, raising code coverage to 95% via SonarQube while contributing reusable modules to an internal Artifactory with 70+ divisional downloads(JFrog).
  • Scripted automated CI/CD flows to dynamically drain and provision new Dataflow jobs, eliminating 1/2 sprints of manual effort typically required per configuration and pom.xml update(Jenkins, Docker, Maven).
  • Slashed cloud infrastructure spend by over $100K annually through aggressive cluster and worker node fine-tuning, while engineering an 80% compute-time reduction for a high-volume REST API DAG.
  • Managed the end-to-end migration of legacy Automic pipelines to Airflow with 100% code coverage, securing all data credentials via Google Secret Manager to guarantee zero data loss during the transition.
  • Automated change tickets by integrating ServiceNow requests directly into the existing CI/CD pipeline, eliminating manual overhead to save 2+ engineering hours per deployment.
  • Cultivated a resilient on-call support model that resolved pipeline failures and delivered client data within the first hour of an outage, while actively onboarding and pair-programming with 3 new developers to accelerate productivity.
Apache Airflow Apache Airflow
Microsoft Azure Microsoft Azure
Apache Spark Apache Spark
Google Cloud Google Cloud Platform
Databricks Databricks

Software Engineer II | LexisNexis

Feb 2021 - Aug 2022

Data Engineer specializing in architecting high-performance search engines, optimizing database retrieval speeds, and automating testing frameworks to process over 30M+ records under strict enterprise SLAs.(expand for details)

  • Designed a high-performance search interface and underlying matching engine, enabling clients to query and retrieve database-wide matches within a sub-2-second response window.
  • Optimized data retrieval speeds by 5ms through the implementation of phonetic and fuzzy matching logic within core ETL pipelines.
  • Audit database utilization and storage footprints to minimize overhead costs, effectively optimizing the data tier ahead of a major cloud migration.
  • Evaluated and deployed crosswalk interfaces, using strict precision and recall data metrics to maximize record-linking accuracy.
  • Maintained multi-environment release stability, coordinating complex code merges and database promptly Git and GitLab CI/CD pipelines(SQL Server).
  • Developed an automated regression testing framework triggered by custom CRON jobs, eliminating technical debt and saving 120 manual engineering hours.
  • Aggregated over 30M+ records to generate high-volume B2C and C2B data reports, consistently meeting strict SLA timelines for enterprise clients.
Microsoft Azure Microsoft Azure
ECL ECL
MySQL MySQL
Power BI PowerBI

Projects

Database Automation

  • Automated database server shutdown and creation, saving 12% annual costs.
  • Worked on 35,000+ lines of C++ code with no existing documentation. Wrote documentation for existing and newly added code.

Analytics Dashboard

  • Built analytics dashboards from Hive tables using Java to depict user driving behavior.
  • Directly increased userbase by 20% utilizing data mining techniques for analysis.

Web Scraping & Geo-Visualization

  • Built web scrapers with 90% CAPTCHA-bypass accuracy to gather data from high-risk sites and automate weekly analytics reports (click for details)

Skills & Certifications

Databricks Certified Data Engineer Associate
AWS Cloud Solutions Specialization (Coursera)

Languages

Python Java SQL

Data Engineering & Cloud

PySpark Airflow Databricks AWS GCP Azure ETL Data Modeling

DevOps & Infrastructure

Docker Jenkins Git Linux CI/CD

Methodologies

Agile

Let's Connect

Email: nmnknwr@gmail.com

LinkedIn GitHub