Data-Driven Programming By Examples for Data Wrangling

Data wrangling is the tedious task of data cleaning, transforming, reshaping, and integration so that the data becomes usable for analysis. Program synthesis techniques can play a revolutionary role in making these data wrangling tasks simpler and more accessible to data scientists and end-users. The systems below aim to help data scientists and end-users perform data wrangling tasks easily using input-output examples, without the need of writing complex programs/scripts.

Unlike traditional program synthesis techniques that only learn from specifications (examples), one of the key themes of these systems is that they also learn from the input data in addition to the examples. This not only helps in making the learning process significantly more efficient and requiring fewer examples, but also enables learning a richer class of data transformations. Moreover, these systems use proabilistic learning techniques to also handle noisy and inconsistent data.

Publications	[VLDB16]
Demo	Live Interactive Version

Publications	[POPL16]
Demo	Live Interactive Version

Publications	[CAV12], [VLDB12], [CACM12], [CAV15]
Press	[MIT News], [CNET], [CNN Money], [Wired], [More]
Demo	Video1, Video2, Video3, Try it out in Excel 2013!

Data-Driven Programming By Examples for Data Wrangling

BlinkFill

SemFill

FlashFill