Data-Driven Programming By Examples for Data Wrangling

Data wrangling is the tedious task of data cleaning, transforming, reshaping, and integration so that the data becomes usable for analysis. Program synthesis techniques can play a revolutionary role in making these data wrangling tasks simpler and more accessible to data scientists and end-users. The systems below aim to help data scientists and end-users perform data wrangling tasks easily using input-output examples, without the need of writing complex programs/scripts.

Unlike traditional program synthesis techniques that only learn from specifications (examples), one of the key themes of these systems is that they also learn from the input data in addition to the examples. This not only helps in making the learning process significantly more efficient and requiring fewer examples, but also enables learning a richer class of data transformations. Moreover, these systems use proabilistic learning techniques to also handle noisy and inconsistent data.

BlinkFill
Semi-supervised learning of data transformations from both input-output examples and the input data. 1000x faster than FlashFill and learns richer transforamtions!
Publications [VLDB16]
Demo Live Interactive Version
SemFill
Semantic Data Type (Date, Name, Phone Numbers, Address etc.) Transformation in Excel. Probabilistic Learning to handle noisy and inconsistent data.
Publications [POPL16]
Demo Live Interactive Version
FlashFill
Help end-users perform string manipulation tasks in Microsoft Excel using input-output examples.
Publications [CAV12], [VLDB12], [CACM12], [CAV15]
Press [MIT News], [CNET], [CNN Money], [Wired], [More]
Demo Video1, Video2, Video3, Try it out in Excel 2013!