Posts by Collection

Recommended citation: Fan, Meihao, Lei Zhang, Siyao Xiao, and Yuru Liang. "Few-shot multi-hop question answering over knowledge base." arXiv preprint arXiv:2112.11909 (2021). https://onlinelibrary.wiley.com/doi/abs/10.1155/2022/8045535

Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration

Published in IEEE 40th International Conference on Data Engineering (ICDE), 2024

This paper is about LLM for Entity Resolution.

Recommended citation: Fan, Meihao, Xiaoyue Han, Ju Fan, Chengliang Chai, Nan Tang, Guoliang Li, and Xiaoyong Du. "Cost-effective in-context learning for entity resolution: A design space exploration." In 2024 IEEE 40th International Conference on Data Engineering (ICDE), pp. 3696-3709. IEEE, 2024. https://ieeexplore.ieee.org/abstract/document/10597751

AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework

Published in VLDB 2025 (Accepted), 2025

This paper proposes AutoPrep, a multi-agent framework for data preparation.

Recommended citation: Meihao Fan, Ju Fan, Nan Tang, Lei Cao, Guoliang Li, Xiaoyong Du. "AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework." VLDB 2025 (Accepted). https://arxiv.org/abs/2412.10422

Reward-SQL: Boosting Text-to-SQL via Stepwise Execution-Aware Reasoning and Process-Supervised Rewards

Published in SIGMOD 2026 (Accepted), 2026

This paper proposes Reward-SQL, a framework for improving Text-to-SQL reasoning using process-supervised reward models.

Recommended citation: Yuxin Zhang, Meihao Fan, Ju Fan, Mingyang Yi, Yuyu Luo, Jian Tan, Guoliang Li. "Reward-SQL: Boosting Text-to-SQL via Stepwise Execution-Aware Reasoning and Process-Supervised Rewards." SIGMOD 2026 (Accepted).

DeepPrep: An LLM-Powered Agentic System for Autonomous Data Preparation

Published in VLDB 2026 (Under Review), 2026

This paper proposes DeepPrep, an LLM-powered agentic system for autonomous data preparation.

Recommended citation: Meihao Fan, Ju Fan, Yuxin Zhang, Shaolei Zhang, Xiaoyong Du, Jie Song, Peng Li, Fuxin Jiang, Tieying Zhang, Jianjun Chen. "DeepPrep: An LLM-Powered Agentic System for Autonomous Data Preparation." VLDB 2026 (Under Review).

TACO: A Benchmark for Open-Domain Text-to-SQL with Ambiguous and Cross-Database Queries

Published in VLDB 2026 (Accepted), 2026

This paper proposes TACO, a benchmark for Open-Domain Text-to-SQL.

Recommended citation: Chao Deng, Ju Fan, Yuyu Luo, Qinliang Xue, Meihao Fan, Yuxin Zhang, Min Zhang, Xiaofeng Jia, Jing Zhang, Xiaoyong Du. "TACO: A Benchmark for Open-Domain Text-to-SQL with Ambiguous and Cross-Database Queries." VLDB 2026 (Accepted).

DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Published in ICML 2026 (Under Review), 2026

This paper explores agentic large language models for autonomous data science.

Recommended citation: Anonymous Authors (incl. **Meihao Fan**). "DeepAnalyze: Agentic Large Language Models for Autonomous Data Science." ICML 2026 (Under Review).

CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?

Published in ICML 2026 (Under Review), 2026

This paper introduces CODA-BENCH to evaluate code agents on data-intensive tasks.

Recommended citation: Anonymous Authors (incl. **Meihao Fan**). "CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?" ICML 2026 (Under Review).

Meihao Fan

Posts by Collection

people

Meihao Fan

projects

BATCHER: Cost-Effective LLM Inference for Entity Resolution

AutoPrep: Multi-Agent Data Preparation Framework

DeepPrep: Data Agent System for Autonomous Data Preparation

publications

Few‐Shot Multihop Question Answering over Knowledge Base

Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration

AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework

Reward-SQL: Boosting Text-to-SQL via Stepwise Execution-Aware Reasoning and Process-Supervised Rewards

DeepPrep: An LLM-Powered Agentic System for Autonomous Data Preparation

TACO: A Benchmark for Open-Domain Text-to-SQL with Ambiguous and Cross-Database Queries

DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?

software

Predictor identifier: Nonparametric PREDiction (NPRED)

WASP: Wavelet System Prediction

WQM: Wavelet-based Quantile Mapping

synthesis: Generate Synthetic Data from Statistical Models

talks

teaching