Shamanthak Hegde

I am a final year Computer Science Master's student at Arizona State University where I am advised by Yezhou Yang. My current research interests lies in combining information from different sources-like text, images, and video to help machines develop more robust, reliable, and safe commonsense reasoning.

I received my Bachelors in Computer Science Engineering from KLE Technological University in 2023 where I was advised by Shankar Gangisetty on various projects in the VQA domain.

Email  /  Resume  /  Google Scholar  /  X  /  GitHub

profile photo

News

Research

I'm interested in computer vision, machine learning, generative AI, and natural language processing. Representative papers are highlighted.

chartqax_overview ChartQA-X: Generating Explanations for Visual Chart Reasoning
Shamanthak Hegde, Pooyan Fazli, Hasti Seifi
Winter Conference on Applications of Computer Vision (WACV), 2026

Paper | Dataset | bibtex
@article{hegde2025chartqa, title={ChartQA-X: Generating Explanations for Charts}, author={Hegde, Shamanthak and Fazli, Pooyan and Seifi, Hasti}, journal={arXiv preprint arXiv:2504.13275}, year={2025} }
dcpo_overview Dual Caption Preference Optimization for Diffusion Models
Amir Saeidi*, Yiran Luo*, Agneet Chatterjee, Shamanthak Hegde, Bimsara Pathiraja, Yezhou Yang, Chitta Baral
Transactions on Machine Learning Research (TMLR), 2025

Paper | Project | Code | bibtex
@article{saeidi2025dual, title={Dual Caption Preference Optimization for Diffusion Models}, author={Amir Saeidi and Yiran Lawrence Luo and Agneet Chatterjee and Shamanthak Hegde and Bimsara Pathiraja and Yezhou Yang and Chitta Baral}, journal={Transactions on Machine Learning Research}, issn={2835-8856}, year={2025}, url={https://openreview.net/forum?id=ruZksIJBBd}, note={} }
mtvtvqam Evaluating Multimodal Large Language Models Across Distribution Shifts and Augmentations
Aayush Atul Verma*, Amir Saeidi*, Shamanthak Hegde*, Ajay Therala*, Fenil Denish Bardoliya*, Nagaraju Machavarapu*, Shri Ajay Kumar Ravindhiran*, Srija Malyala*, Agneet Chatterjee*, Yezhou Yang, Chitta Baral
CVPR EvGenFM Workshop, 2024

Paper | bibtex
@InProceedings{Verma_2024_CVPR, author = {Verma, Aayush Atul and Saeidi, Amir and Hegde, Shamanthak and Therala, Ajay and Bardoliya, Fenil Denish and Machavarapu, Nagaraju and Ravindhiran, Shri Ajay Kumar and Malyala, Srija and Chatterjee, Agneet and Yang, Yezhou and Baral, Chitta}, title = {Evaluating Multimodal Large Language Models Across Distribution Shifts and Augmentations}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {5314-5324} }
mtvtvqam Making the V in Text-VQA Matter
Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty
CVPR O-DRUM Workshop, 2023

Paper | bibtex
@inproceedings{hegde2023making, title={Making the v in text-VQA matter}, author={Hegde, Shamanthak and Jahagirdar, Soumya and Gangisetty, Shankar}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={5580--5588}, year={2023} }
wsvqag Weakly Supervised Visual Question Answer Generation
Charani Alampalle, Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty
CVPR O-DRUM Workshop, 2023

Paper | bibtex
@inproceedings{alampalle2023weakly, title={Weakly Supervised Visual Question Answer Generation}, author={Alampalle, Charani and Hegde, Shamanthak and Jahagirdar, Soumya and Gangisetty, Shankar}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={5589--5597}, year={2023} }

Thank you to Jon Barron for the source code for the website!