DeepPrep: Data Agent System for Autonomous Data Preparation

Published:

DeepPrep: Data Agent System for Autonomous Data Preparation

March 2025 - March 2026

Tech Stack: Data Agents, Tree-based Reasoning, Post-Training, RL, Qwen

  • Developed an execution-grounded environment supporting 31 data preparation operators, enabling agents to materialize intermediate table states and leverage runtime feedback for decision making.
  • Implemented a full post-training pipeline for open-source LLMs, covering operator syntax learning, supervised fine-tuning, and multi-turn reinforcement learning, with trained model weights released from 0.5B to 14B.

GitHub Repository