ML Researcher

Shamanthak
Hegde

I build multimodal and agentic AI, systems that see, read, and act by reasoning across text, images, and video.

I recently finished my Master's in Computer Science at Arizona State University, advised by Yezhou Yang, where I still collaborate with the lab. My work spans vision-language models, preference optimization for diffusion models, and agentic systems that turn perception into action. Before ASU, I earned a B.E. from KLE Technological University (2023), advised by Shankar Gangisetty on visual question answering.

Email Résumé Scholar X GitHub

Research

WACV 2026

ChartQA-X: Generating Explanations for Visual Chart Reasoning

Shamanthak Hegde, Pooyan Fazli, Hasti Seifi

Paper·Project·Dataset·

@article{hegde2025chartqa,
  title={ChartQA-X: Generating Explanations for Charts},
  author={Hegde, Shamanthak and Fazli, Pooyan and Seifi, Hasti},
  journal={arXiv preprint arXiv:2504.13275}, year={2025}}

TMLR 2025

Dual Caption Preference Optimization for Diffusion Models

Amir Saeidi*, Yiran Luo*, Agneet Chatterjee, Shamanthak Hegde, Bimsara Pathiraja, Yezhou Yang, Chitta Baral

Paper·Project·Code·

@article{saeidi2025dual,
  title={Dual Caption Preference Optimization for Diffusion Models},
  author={Amir Saeidi and Yiran Lawrence Luo and Agneet Chatterjee and Shamanthak Hegde and Bimsara Pathiraja and Yezhou Yang and Chitta Baral},
  journal={Transactions on Machine Learning Research}, year={2025}}

CVPR EvGenFM Workshop 2024

Evaluating Multimodal LLMs Across Distribution Shifts and Augmentations

Aayush Atul Verma*, Amir Saeidi*, Shamanthak Hegde*, Ajay Therala*

Paper·

@InProceedings{Verma_2024_CVPR,
  author    = {Verma, Aayush Atul and Saeidi, Amir and Hegde, Shamanthak and Therala, Ajay and Bardoliya, Fenil Denish and Machavarapu, Nagaraju and Ravindhiran, Shri Ajay Kumar and Malyala, Srija and Chatterjee, Agneet and Yang, Yezhou and Baral, Chitta},
  title     = {Evaluating Multimodal Large Language Models Across Distribution Shifts and Augmentations},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  month     = {June}, year = {2024}, pages = {5314-5324}}

CVPR O-DRUM Workshop 2023

Making the V in Text-VQA Matter

Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty

Paper·

@inproceedings{hegde2023making,
  title={Making the V in Text-VQA Matter},
  author={Hegde, Shamanthak and Jahagirdar, Soumya and Gangisetty, Shankar},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5580--5588}, year={2023}}

CVPR O-DRUM Workshop 2023

Weakly Supervised Visual Question Answer Generation

Charani Alampalle, Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty

Paper·

@inproceedings{alampalle2023weakly,
  title={Weakly Supervised Visual Question Answer Generation},
  author={Alampalle, Charani and Hegde, Shamanthak and Jahagirdar, Soumya and Gangisetty, Shankar},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5589--5597}, year={2023}}

Projects

TreeHacks · 1st place, Healthcare

ShadowGuard: Real-Time PHI Detection

A reverse-proxy that intercepts live HTTPS traffic and runs a local LLM to flag and redact healthcare identifiers (MRNs, SSNs, ICD-10 codes, medication names) before any request reaches an external API. Ships with a compliance dashboard and automated voice alerts.

Code

Personal project

AutoScout: Web Monitoring Agents

An agentic system (Gemini + LangChain + Playwright) that pulls structured fields out of dynamic pages and watches them for changes, running on a serverless FastAPI + AWS Lambda backend.

Code

Experience

Active Perception Group · Arizona State University

Graduate Research Assistant

Advised by Yezhou Yang

Training and evaluating multimodal LLMs and VLMs (SFT, DPO/GRPO) for robustness benchmarking and diffusion-model alignment. Work published at TMLR and CVPR workshops.

DCPO·MLLM Eval