Huichi Zhou
Huichi Zhou is currently a graduate student at Imperial College London with
YangLab advised by Professor Guang Yang.
My research interest is broadly in Adversarial Machine Learning and Large Language Models.
Email: h.zhou24 [at] imperial.ac.uk
Github     Google Scholar
|
|
Selected Publications      (* denotes equal contribution)
|
|
Revisiting Medical Image Retrieval via Knowledge Consolidation
Yang Nan,
Huichi Zhou,
Xiaodan Xing,
Giorgos Papanastasiou,
Lei Zhu,
Zhifan Gao,
Alejandro F Frangi,
Guang Yang
Medical Image Analysis    Impact Factor (10.7)
paper
ACIR (Anomaly-aware Content-based Image Recommendation) is a framework designed to improve medical image retrieval through knowledge consolidation. It combines Depth-aware Representation Fusion (DaRF) to integrate multi-level features and Structure-aware Contrastive Hashing (SCH) for enhanced positive/negative pair assignments using image fingerprints. ACIR also introduces a self-supervised OOD detection module and a content-guided ranking mechanism to improve retrieval accuracy and resilience against out-of-distribution (OOD) data. It significantly outperforms existing methods, achieving up to 38.9% improvement in mean Average Precision (mAP) on anatomical datasets.
|
|
TrustRAG: Enhancing Robustness and Trustworthiness in RAG
Huichi Zhou*,
Kin-Hei Lee*,
Zhonghao Zhan*,
Yue Chen,
Zhenhao Li,
Zhaoyang Wang,
Hamed Haddadi,
Emine Yilmaz
Under Review
project page
/
paper
/
code
We introduce TrustRAG, a robust Retrieval-Augmented Generation (RAG) framework. It defends against corpus poisoning attacks by a two-stage mechanism: identifying potential attack patterns with K-means clustering and detecting malicious docs via self-assessment.
|
|
Verifiable Format Control for Large Language Model Generations
Zhaoyang Wang*,
Jinqi jiang*,
Huichi Zhou*,
Wenhao Zheng,
Xuchao Zhang,
Chetan Bansal,
Huaxiu Yao
NAACL 2025 Findings
paper
/
code
We introduce VFF, which is a dataset and training framework that improves the format-following ability of 7B-level LLMs using Python-based verification and progressive training. It enhances LLMs’ ability to follow specific formats (e.g., JSON) through self-generated data and direct preference optimization (DPO).
|
|
Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks?
Zhongjian Zhang*,
Xiao Wang*,
Huichi Zhou,
Mengmei Zhang,
Cheng Yang,
Chuan Shi
KDD 2025 Research Track
paper
/
code
LLM4RGNN is a framework that enhances the adversarial robustness of Graph Neural Networks (GNNs) using Large Language Models (LLMs). It distills the inference capability of GPT-4 into a local LLM to identify malicious edges and trains an LM-based edge predictor to find missing important edges. By removing malicious edges and adding important ones, LLM4RGNN significantly improves GNN performance under adversarial attacks, outperforming existing robust GNN frameworks in various datasets and attack scenarios.
|
|
Evaluating the Validity of Word-level Adversarial Attacks with Large Language Models
Huichi Zhou*,
Zhaoyang Wang*,
Hongtao Wang,
Dongping Chen,
Wenhan Mu,
Fangyuan Zhang
ACL 2024 Findings
paper
/
code
AVLLM is a framework for evaluating and improving the validity of word-level adversarial attacks on LLMs. It fine-tunes a lightweight LLM to provide a validity score and explanation, helping to generate semantically consistent adversarial examples. AVLLM improves attack quality and consistency through enhanced semantic understanding.
|
|
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
Dongping Chen, Yue Huang, Siyuan Wu, Jingyu Tang, Huichi Zhou, Qihui Zhang, Zhigang He, Yilin Bai, Chujie Gao, Liuyi Chen, Yiqiang Li, Chenlong Wang, Yue Yu, Tianshuo Zhou, Zhen Li, Yi Gui, Yao Wan, Pan Zhou, Jianfeng Gao, Lichao Sun
ICLR 2025
paper
/
code
GUI-WORLD is a comprehensive video-based benchmark and dataset designed for GUI-oriented multimodal understanding. It includes 12,379 GUI videos covering six scenarios (e.g., software, websites, mobile, XR) and eight question types. The dataset captures dynamic and sequential GUI content, addressing challenges faced by existing models. The paper introduces GUI-VID, a fine-tuned Video LLM that improves dynamic GUI understanding but highlights that current models still struggle with complex GUI tasks.
|
|
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
Dongping Chen*, Ruoxi Chen*, Shilin Zhang*, Yaochen Wang*, Yinuo Liu*, Huichi Zhou*, Qihui Zhang*, Yao Wan, Pan Zhou, Lichao Sun
ICML 2025 Oral
paper
/
code
MLLM-as-a-Judge is a benchmark designed to evaluate the judging capabilities of Multimodal Large Language Models (MLLMs) in vision-language tasks. It assesses MLLMs on Scoring Evaluation, Pair Comparison, and Batch Ranking across 14 datasets. Results show that while MLLMs align well with human judgments in pair comparison, they struggle with scoring and batch ranking due to biases and hallucinations. GPT-4V performs best, but the study highlights the need for improvements in consistency and reasoning.
|
ICLR 2025
ACL ARR 2024-2025
|
Chinese National Scholarship (top1%) 2023
Excellent Graduation Thesis Award (top1%) 2024
|
Website template comes from Jon Barron
Last update: March, 2025
|
|