Data Engineer · Researcher · Chicago

Building data infrastructure that scales and research that matters.

I'm Chaitanya, a data engineer at Egen and a researcher in clinical AI and graph neural networks. I design ETL pipelines and cloud-based bioinformatics systems, publish research in medical AI, present at IEEE conferences, and maintain open-source Python packages including cloudfit.

See my work Get in touch

5+ yrs in data engineering 7+ peer-reviewed papers 4+ OSS Python packages GCP Professional Data Engineer

Currently building

Open source

All work →

2026

cloudfit

Cloud-agnostic machine type recommender for batch and bioinformatics workloads. Multi-package OSS ecosystem (scoring engine, GCP provider, FastAPI service, Gradio UI) with a multi-region bundled snapshot and a live demo on Hugging Face Spaces.

PyPI FastAPI Multi-cloud Try it ↗ API docs ↗
2026

samplesheet-parser

Format-agnostic parser for Illumina SampleSheet.csv files. Auto-detects IEM v1 vs. BCLConvert v2, validates index integrity with Hamming distance checks, and converts or merges sheets across mixed sequencing fleets.

PyPI Bioconda Bioinformatics Apache 2.0

Selected

Recent research

All publications →

2026

AI-Driven Early-Warning System to Predict Multi-Organ Deterioration in Critical Care Patients

Temporal deep learning with cross-organ attention to detect ICU deterioration a median of 6.2 hours before conventional clinical detection, across five organ systems. Presented at IEEE ICHI 2026, Session 16.

IEEE ICHI 2026 Clinical AI ICU Temporal Deep Learning IEEE Xplore forthcoming
2025

AI-driven drug repurposing: a graph neural network and self-supervised learning approach

Computational drug discovery using GNNs and self-supervised pretraining over biomedical knowledge graphs.

IEEE GNN Drug Discovery

Selected

Writing

All articles →

2026

Why I built cloudfit

The gap between free cloud sizing tools (Compute Optimizer, GCP Recommender, Azure Advisor) and what I actually needed for new batch and bioinformatics workloads. Why I built an open-source recommender that doesn't need historical metrics from a running workload.

Launch Open Source Cloud
2025

Understanding recommender systems: the engine behind personalized experiences

A primer on collaborative filtering, content-based, and hybrid approaches to recommendation. Why personalization engines work the way they do, and where they break.

Data Science Collective Recsys ML
2021

An introduction to explainable AI and explainable boosting machines

A primer on XAI fundamentals and how EBMs combine accuracy with interpretability.

KDnuggets XAI EBM

Let's build or research something together.

Open to conversations about data engineering, clinical AI, and cloud architecture. Also available for research collaboration, conference talks, and speaking invitations.

Get in touch Connect on LinkedIn

Building data infrastructure that scales and research that matters.

Open source

`cloudfit`

`samplesheet-parser`

Recent research

AI-Driven Early-Warning System to Predict Multi-Organ Deterioration in Critical Care Patients

AI-driven drug repurposing: a graph neural network and self-supervised learning approach

Writing

Why I built cloudfit

Understanding recommender systems: the engine behind personalized experiences

An introduction to explainable AI and explainable boosting machines

Let's build or research something together.