Summary
Ph.D. student in Computer Science at Renmin University of China, focusing on LLM-powered data agents, reinforcement learning, and post-training for open-source LLMs. My research centers on building cost-effective and reliable agentic systems for data-intensive tasks such as data preparation, entity resolution, and autonomous data science. More broadly, I am interested in developing lightweight agents with stronger data awareness to optimize deep research workflows, reduce expensive trial-and-error, and improve the practical efficiency of LLM-based systems.
Education
- Renmin University of China, Beijing, China
- Ph.D. in Computer Application Technology (Sept 2023 - Expected June 2028)
- Honors: National Scholarship (The only 2nd-year Ph.D. recipient in the college)
- Advisor: Prof. Ju Fan / Prof. Xiaoyong Du
- Chongqing Jiaotong University, Chongqing, China
- B.S. in Computer Science and Technology (Sept 2019 - June 2023)
- GPA: 4.31/5.00
- Honors: National Scholarship (The only recipient in the college), Mingde Scholarship Nomination Award (Top 20 in the university)
Experience
- ByteDance, Beijing, China
- Research Intern (LLM Data Agent Systems) (March 2025 - March 2026)
- Led the research and development of DeepPrep, an LLM-powered data agent system for autonomous data preparation.
- Proposed a tree-based agentic reasoning mechanism impacting reliability and steerability.
- Designed a progressive post-training framework combining SFT and multi-turn RL.
- Built large-scale benchmarks and achieved strong performance at up to 15x lower inference cost.
- Renmin University of China, Beijing, China
- Research Assistant (Sept 2023 - Present)
- Led research on AutoPrep, a multi-agent data preparation framework.
- Studied cost-effective LLM inference for entity resolution and proposed the BATCHER framework.
- Co-authored research on autonomous data agents, text-to-SQL, and data-centric LLM systems.
Projects
- DeepPrep: Data Agent System for Autonomous Data Preparation (March 2025 - March 2026)
- Developed an execution-grounded environment supporting 31 data preparation operators.
- Implemented a full post-training pipeline for open-source LLMs (0.5B to 14B).
- GitHub
- AutoPrep: Multi-Agent Data Preparation Framework (2024 - 2025)
- Designed a planner agent with Chain-of-Clauses reasoning.
- Developed programmer agents with tool-augmented code generation.
- GitHub
- BATCHER: Cost-Effective LLM Inference for Entity Resolution (2023 - 2024)
- Introduced a batch prompting framework with demonstration selection.
- Achieved 4x-7x cost savings over standard prompting.
- GitHub
Honors and Awards
- National Scholarship (Ph.D.), Ministry of Education, China (2025)
- National Scholarship (B.S.), Ministry of Education, China (2023)
- Mingde Scholarship Nomination Award, Chongqing Jiaotong University (2023)
Skills & Research Interests
- Research Interests: Data Agents, Reinforcement Learning for LLM Agents, Post-Training, Cost-Efficient LLM Systems, Autonomous Data Science, Text-to-SQL
- Programming & Tools: Python, LaTeX
- LLM & Agent Systems: Agentic Reasoning, Multi-Agent Systems, Supervised Fine-Tuning (SFT), Multi-turn RL, Prompt Engineering, Execution-Grounded Feedback
- Data Systems: Data Preparation, Entity Resolution, Data Integration, Analytical Workflows
- Aug. 2023 - Current: Guest Scientist
- May 2023 - Jul. 2023: Scientist
- Helmholtz-Centre Potsdam - German Research Centre (GFZ), Potsdam, Brandenburg, Germany
- Duties included:
- Climate-informed Flood Frequency Analysis under a changing climate
- Supervisor: Prof. Dr. Bruno Merz
- Nov. 2015 - Feb. 2018: Research Assistant/Engineer
- Tropical Marine Science Institute, National University of Singapore, Singapore
- Duties included:
- DSSAT crop modeling of future rice yield in Vietnam under climate change, Singapore-MIT Alliance project.
- Development of index-based drought insurance for sovereign disaster risk transfer, International Finance Corporation, World Bank project.
- Impact of climate change on inland and coastal flooding in Singapore, Public Utilities Board (PUB) project.
- Effectiveness of ABC Waters design features in residential developments, PUB-TMSI-Monash University project.
- Supervisor: Prof. Shie-Yui Liong
- Mar. 2015 - Sep. 2015: Intern
- Ingenieurgesellschaft Prof. Dr. Sieker mbH, Berlin, Germany
- Duties included:
- Development of the Time-Area function model based on QGIS environment for stormwater management.
- Flood modelling and mitigation of Hafar Al-Batin city in Saudi Arabia.
- Supervisors: Prof. Frank Molkenthin and Prof. Heiko Sieker
Awards & Honors
- 2021, Faculty of Engineering Postdoctoral Writing Fellowship funded by UNSW
- 2019, OzEWEX Summer Institute Scholarship
- 2018, University International Postgraduate Award (UIPA) funded by UNSW
- 2013, Erasmus Mundus Scholarship Award by European Union (EU)
- 2012, Outstanding graduate of Hohai University, Nanjing
- 2010, National Undergraduate Mathematical Contest in Modeling (Provincial Award)
Publications
Fan, Meihao, Lei Zhang, Siyao Xiao, and Yuru Liang. "Few-shot multi-hop question answering over knowledge base." arXiv preprint arXiv:2112.11909 (2021).
Fan, Meihao, Xiaoyue Han, Ju Fan, Chengliang Chai, Nan Tang, Guoliang Li, and Xiaoyong Du. "Cost-effective in-context learning for entity resolution: A design space exploration." In 2024 IEEE 40th International Conference on Data Engineering (ICDE), pp. 3696-3709. IEEE, 2024.
Meihao Fan, Ju Fan, Nan Tang, Lei Cao, Guoliang Li, Xiaoyong Du. "AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework." VLDB 2025 (Accepted).
Yuxin Zhang, Meihao Fan, Ju Fan, Mingyang Yi, Yuyu Luo, Jian Tan, Guoliang Li. "Reward-SQL: Boosting Text-to-SQL via Stepwise Execution-Aware Reasoning and Process-Supervised Rewards." SIGMOD 2026 (Accepted).
Meihao Fan, Ju Fan, Yuxin Zhang, Shaolei Zhang, Xiaoyong Du, Jie Song, Peng Li, Fuxin Jiang, Tieying Zhang, Jianjun Chen. "DeepPrep: An LLM-Powered Agentic System for Autonomous Data Preparation." VLDB 2026 (Under Review).
Chao Deng, Ju Fan, Yuyu Luo, Qinliang Xue, Meihao Fan, Yuxin Zhang, Min Zhang, Xiaofeng Jia, Jing Zhang, Xiaoyong Du. "TACO: A Benchmark for Open-Domain Text-to-SQL with Ambiguous and Cross-Database Queries." VLDB 2026 (Accepted).
Anonymous Authors (incl. **Meihao Fan**). "DeepAnalyze: Agentic Large Language Models for Autonomous Data Science." ICML 2026 (Under Review).
Anonymous Authors (incl. **Meihao Fan**). "CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?" ICML 2026 (Under Review).
Talks
Teaching
Languages
Mandarin(Native), English(Fluent), German(Basic), French(Basic)
Skills & Expertise
- Statistical hydrology and water resource engineering.
- Strong interpersonal skills with a good sense of teamwork.
- Programming Skills: R, C/C++, and Python in both Unix and Windows systems.
- Rich experience in modeling and GIS, using MIKE, SWMM, DSSAT, and QGIS.
Service & Leadership
Membership
- American Geophysical Union (AGU)
- European Geosciences Union (EGU)
- Asia Oceania Geosciences Society (AOGS)
- International Union of Geodesy and Geophysics (IUGG)
- International Commission of Statistical Hydrology (ICSH-IAHS)
- Modeling and Simulation Society of Australia and New Zealand (MSSANZ)