Peter Baile Chen

ABOUT

RESEARCH

WORK

Hi! I am Peter (陳百樂), a PhD student at MIT CSAIL. My research lies at the intersection of data systems and natural language processing. I am fortunate to work with Mike Cafarella, Mike Stonebraker, Sam Madden, Dan Roth, and Jacob Andreas. I graduated from the University of Pennsylvania with a BSE degree. At Penn, I had the chance to work with Zack Ives, Sebastian Angel, and Vincent Liu. I am supported by the Croucher scholarship and Google PhD Fellowship in collaboration with MIT.

As a researcher working at the crossroads of data management and NLP, my goal is to contribute fresh insights by connecting data management methodologies with recent developments in artificial intelligence.

Conventional data management systems are designed to support querying over large-scale databases and are optimized for efficiency and precision by carefully managing all phases of the query lifecycle—from offline data storage and ingestion to online query execution. These systems, however, often sacrifice versatility and expressiveness for performance.
On the other hand, LLMs have demonstrated capabilities to query diverse and complex data sources with far greater flexibility. Yet, this comes with a significant computational cost. Moreover, they lack the systematic problem-solving strategies typical of traditional data systems, and as a result, exhibit surprising failures.

This contrast presents an intriguing challenge:

Is it possible to blend the strengths of data management and LLM paradigms?

Data systems for LLMs: Can we embed core efficiency and systematic processing principles of data management to enhance LLMs?

Reasoning inefficiency

Embedding view maintenance into LLMs (LAG)

Retrieval inefficiency

Embedding formal data management operators into LLMs (ARM and JAR)
Embedding offline data processing pipelines into LLMs (EnrichIndex)

LLMs for data systems: Can the expressiveness of LLMs be harnessed to enhance data management?

Leveraging LLMs to translate natural language questions into SQL in enterprise contexts (Beaver)

News

(May 2025) Our paper on alignment-oriented retrieval (ARM) was accepted to ACL 2025 (main).
(May 2025) I am honored to be a Schwarzman College of Computing Future Research Cohort fellow, funded by Google.
(September 2024) Our paper on multi-document conditional reasoning (MDCR) was accepted to EMNLP 2024 (findings).
(August 2024) JAR was awarded the outstanding paper at Towards Knowledgeable Language Models @ ACL 2024.
(May 2024) Our paper on join-aware multi-table retrieval (JAR) was accepted to ACL 2024 (main).