DeepPrep: Data Agent System for Autonomous Data Preparation
Published:
DeepPrep: Data Agent System for Autonomous Data Preparation
March 2025 - March 2026
Tech Stack: Data Agents, Tree-based Reasoning, Post-Training, RL, Qwen
- Developed an execution-grounded environment supporting 31 data preparation operators, enabling agents to materialize intermediate table states and leverage runtime feedback for decision making.
- Implemented a full post-training pipeline for open-source LLMs, covering operator syntax learning, supervised fine-tuning, and multi-turn reinforcement learning, with trained model weights released from 0.5B to 14B.
