Adani President's Distinguished Professor and Department Chair, Computer and Information Science Department
University of Pennsylvania
How do we promote large-scale data science and data sharing, e.g., in the sciences or across organizations? Many modern data science applications have been leveraging data lakes: schema-agnostic repositories of data files and data products, which offer limited organization and management capabilities. There is a need to build a new generation of data science environments, which leverage data lakes so scientists and analysts can find tables, schemas, workflows, and datasets useful to their task at hand. Juneau incorporates search and management solutions into the Jupyter Notebook data science platform, to enable scientists to augment training data, find potential features to extract, clean data, and find joinable or linkable tables. Our core methods also generalize to other settings where computational tasks involve execution of programs or scripts.
Email Dena Peacock for Zoom link.