IBM launches open source tool to help COVID-19 data analysis
IBM's Center for Open-Source Data and AI Technologies (CODAIT) is releasing a new toolkit that helps developers and data scientists answer questions about the pandemic.
COVID notebooks is designed to help with tasks including obtaining authoritative data on the current status of the outbreak, cleaning up the most serious data-quality problems, collating the data into a format amenable to easy analysis with tools like Pandas and Scikit-Learn, and building an initial set of example reports and graphs.
Taking care of these tasks frees developers and data scientists to focus on advanced analysis and modeling tasks instead of worrying about things like data formats and data cleaning. The repository uses developer-friendly Jupyter notebooks to cover each of the initial data analysis steps.
There are also data processing pipelines using the Elyra Notebook Pipelines Visual Editor and KubeFlow Pipelines.
"For data scientists and policy makers who are analyzing the effects of COVID-19 and trying to come up with actionable plans based on data, the information landscape is overwhelming," says Frederick Reiss, chief architect at IBM's Center for Open Source Data and AI Technologies. "A near-constant flow of data from research studies, news outlets, social media, and health organizations make the task of analyzing data into useful action nearly impossible. Developers and data scientists need answers to their questions about data sources, tools, and how to draw meaningful and statistically valid conclusions from the ever-changing data."