Meihao Fan
Ph.D. student at Renmin University of China 
Ph.D. student at Renmin University of China 
Published:
A cost-effective in-context learning framework for entity resolution.
Published:
A multi-agent data preparation framework with planner and programmer agents.
Published:
An LLM-powered data agent system for autonomous data preparation, featuring tree-based reasoning and post-training.
Published in Wireless Communications and Mobile Computing, 2021
This paper is about Question Answering over Knowledge Base.
Recommended citation: Fan, Meihao, Lei Zhang, Siyao Xiao, and Yuru Liang. "Few-shot multi-hop question answering over knowledge base." arXiv preprint arXiv:2112.11909 (2021). https://onlinelibrary.wiley.com/doi/abs/10.1155/2022/8045535
Published in IEEE 40th International Conference on Data Engineering (ICDE), 2024
This paper is about LLM for Entity Resolution.
Recommended citation: Fan, Meihao, Xiaoyue Han, Ju Fan, Chengliang Chai, Nan Tang, Guoliang Li, and Xiaoyong Du. "Cost-effective in-context learning for entity resolution: A design space exploration." In 2024 IEEE 40th International Conference on Data Engineering (ICDE), pp. 3696-3709. IEEE, 2024. https://ieeexplore.ieee.org/abstract/document/10597751
Published in VLDB 2025 (Accepted), 2025
This paper proposes AutoPrep, a multi-agent framework for data preparation.
Recommended citation: Meihao Fan, Ju Fan, Nan Tang, Lei Cao, Guoliang Li, Xiaoyong Du. "AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework." VLDB 2025 (Accepted). https://arxiv.org/abs/2412.10422
Published in SIGMOD 2026 (Accepted), 2026
This paper proposes Reward-SQL, a framework for improving Text-to-SQL reasoning using process-supervised reward models.
Recommended citation: Yuxin Zhang, Meihao Fan, Ju Fan, Mingyang Yi, Yuyu Luo, Jian Tan, Guoliang Li. "Reward-SQL: Boosting Text-to-SQL via Stepwise Execution-Aware Reasoning and Process-Supervised Rewards." SIGMOD 2026 (Accepted).
Published in VLDB 2026 (Under Review), 2026
This paper proposes DeepPrep, an LLM-powered agentic system for autonomous data preparation.
Recommended citation: Meihao Fan, Ju Fan, Yuxin Zhang, Shaolei Zhang, Xiaoyong Du, Jie Song, Peng Li, Fuxin Jiang, Tieying Zhang, Jianjun Chen. "DeepPrep: An LLM-Powered Agentic System for Autonomous Data Preparation." VLDB 2026 (Under Review).
Published in VLDB 2026 (Accepted), 2026
This paper proposes TACO, a benchmark for Open-Domain Text-to-SQL.
Recommended citation: Chao Deng, Ju Fan, Yuyu Luo, Qinliang Xue, Meihao Fan, Yuxin Zhang, Min Zhang, Xiaofeng Jia, Jing Zhang, Xiaoyong Du. "TACO: A Benchmark for Open-Domain Text-to-SQL with Ambiguous and Cross-Database Queries." VLDB 2026 (Accepted).
Published in ICML 2026 (Under Review), 2026
This paper explores agentic large language models for autonomous data science.
Recommended citation: Anonymous Authors (incl. **Meihao Fan**). "DeepAnalyze: Agentic Large Language Models for Autonomous Data Science." ICML 2026 (Under Review).
Published in ICML 2026 (Under Review), 2026
This paper introduces CODA-BENCH to evaluate code agents on data-intensive tasks.
Recommended citation: Anonymous Authors (incl. **Meihao Fan**). "CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?" ICML 2026 (Under Review).
The open-source R package NPRED is used to identify the meaningful predictors to the response from a large set of potential predictors.
The open-source software WASP is used for system modeling and prediction.
The open-source software WQM is used for post-processing numerical weather prediction.
Generate synthetic time series from commonly used statistical models, including linear, nonlinear and chaotic systems.