As a researcher working at the crossroads of data management and NLP, my goal is to contribute fresh insights by connecting data management methodologies with recent developments in artificial intelligence.
Conventional data management systems are designed to support querying over large-scale databases and are optimized for efficiency and precision by carefully managing all phases of the query lifecycle—from offline data storage and ingestion to online query execution. These systems, however, often sacrifice versatility and expressiveness for performance.
On the other hand, LLMs have demonstrated capabilities to query diverse and complex data sources with far greater flexibility. Yet, this comes with a significant computational cost. Moreover, they lack the systematic problem-solving strategies typical of traditional data systems, and as a result, exhibit surprising failures.
This contrast presents an intriguing challenge:
Is it possible to blend the strengths of data management and LLM paradigms?
Data systems for LLMs: Can we embed core efficiency and systematic processing principles of data management to enhance LLMs?
Embedding formal data management operators into LLMs (ARM and JAR)
Embedding offline data processing pipelines into LLMs (EnrichIndex)
LLMs for data systems: Can the expressiveness of LLMs be harnessed to enhance data management?
Leveraging LLMs to translate natural language questions into SQL in enterprise contexts (Beaver)