Data wrangling is the tedious task of data cleaning, transforming, reshaping, and integration so that the data becomes usable for analysis. Program synthesis techniques can play a revolutionary role in making
these data wrangling tasks simpler and more accessible to data scientists and end-users. The systems below aim to help data scientists and end-users perform data wrangling tasks easily using input-output examples,
without the need of writing complex programs/scripts.
Unlike traditional program synthesis techniques that only learn from specifications (examples), one of the key themes of these systems is that they also learn from the input data in addition
to the examples. This not only helps in making the learning process significantly more efficient and requiring fewer examples, but also enables
learning a richer class of data transformations. Moreover, these systems use proabilistic learning techniques to also handle noisy and inconsistent data.
BlinkFill
Semi-supervised learning of data transformations from both input-output examples and the input data. 1000x faster than FlashFill and learns richer transforamtions!
SemFill
Semantic Data Type (Date, Name, Phone Numbers, Address etc.) Transformation in Excel. Probabilistic Learning to handle noisy and inconsistent data.
FlashFill
Help end-users perform string manipulation tasks in Microsoft Excel using input-output examples.
Publications |
[CAV12], [VLDB12], [CACM12], [CAV15] |
Press |
[MIT News], [CNET], [CNN Money], [Wired], [More] |
Demo |
Video1, Video2, Video3, Try it out in Excel 2013! |