AlphaClean: Automatic Generation of Data Cleaning Pipelines

26 Apr 2019 Sanjay Krishnan Eugene Wu

The analyst effort in data cleaning is gradually shifting away from the design of hand-written scripts to building and tuning complex pipelines of automated data cleaning libraries. Hyper-parameter tuning for data cleaning is very different than hyper-parameter tuning for machine learning since the pipeline components and objective functions have structure that tuning algorithms can exploit... (read more)

PDF Abstract