Policy Forum Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities
Contributor(s)The Pennsylvania State University CiteSeerX Archives
Full recordShow full item record
AbstractOpen access, freely available online In clinical epidemiological research, errors occur in spite of careful study design, conduct, and implementation of error-prevention strategies. Data cleaning intends to identify and correct these errors or at least to minimize their impact on study results. Little guidance is currently available in the peer-reviewed literature on how to set up and carry out cleaning efforts in an efficient and ethical way. With the growing importance of Good Clinical Practice guidelines and regulations, data cleaning and other aspects of data handling will emerge from being mainly grayliterature subjects to being the focus of comparative methodological studies and process evaluations. We present a brief summary of the scattered information, integrated into a conceptual framework aimed at assisting investigators with planning and implementation. We recommend that scientific reports describe datacleaning methods, error types and rates, error deletion and correction rates, and differences in outcome with and without remaining outliers.