Data Engineer · Data Analyst · AI/ML Engineer
I build cloud-native data pipelines that turn raw telemetry into decisions. 5+ years across Capital One, FinTech and Cybersecurity — from NLP vulnerability prediction engines and ML forecasting models to interactive Tableau dashboards published on Tableau Public.
→ View my visualisations on Tableau Public ↗
I am a Data Engineer with strong Analytics background who specialises in building reliable, automated data systems at enterprise scale. My work sits at the intersection of engineering and insight — I don't just move data, I make it useful.
At Capital One I engineered Databricks and PySpark pipelines for cybersecurity controls monitoring, cutting execution time by 25% and replacing manual audit workflows with real-time dashboards. At ThreatModeler I built a first-of-its-kind NLP engine achieving 85% precision in vulnerability prediction — and completed IBM's Apache Spark ML certification building end-to-end distributed ML pipelines using MLlib, Prophet time-series forecasting and Random Forest models.
Outside work I build data visualisations published on Tableau Public, solve SQL challenges on LeetCode, and ship open-source data projects on GitHub. I'm currently open to full-time Data Analyst, Data Engineer and AI/ML Engineer roles — remote, hybrid or onsite anywhere in the US.
Interactive Tableau dashboard visualising 5 years of dark web market pricing data — stolen credentials, identity documents, financial accounts and cybercrime services. Built with a custom Python scraper and 2024 interpolation pipeline. "Your SSN costs $1. A corporate server key costs $200,000. Same market."
End-to-end machine learning pipeline built on Apache Spark as part of IBM's professional certification. Covers distributed data processing, feature engineering, model training and evaluation at scale using PySpark MLlib.
Virtual experience program from J.P. Morgan's Software Engineering track via Forage. Implemented financial data feeds, fixed broken visualisation output and used JPMorgan's Perspective library to create live trading dashboards.
Collection of Python problem solutions, algorithms, data structures and scripting exercises. Covers everything from string manipulation and list comprehensions to file I/O and API integrations — a living reference of daily Python practice.
Statistical interpolation model for forecasting natural gas prices using time-series techniques. Applied Prophet forecasting, linear regression and seasonal decomposition to identify price trends and generate forward-looking estimates from historical commodity data.
Credit risk model predicting the probability that a borrower defaults on a loan. Built using logistic regression, decision trees and random forest classifiers on financial features — evaluating model performance via AUC-ROC, precision-recall and confusion matrices.
Personal collection of LeetCode problem solutions across SQL, Python and algorithms — covering arrays, strings, dynamic programming, joins, window functions and more. Built as a living reference of problem-solving practice for coding interviews.
Writing about data engineering, analytics, ML and the messy realities of building real-world pipelines. Watch this space.
Open to Data Analyst, Data Engineer and AI/ML Engineer roles — remote, hybrid or onsite anywhere across the US.
buchdevashree17@gmail.com