Data dredging and masking

12/27/2023

Another solution to the problem of data dredging is to use the Bonferroni correction. It is now common practice to register clinical trials and specify in advance what the primary endpoints and hypotheses are to avoid the bias of data dredging. They may not be a true relationship and is spurious and any correlation found is by chance.ĭata dredging is also referred to as fishing, p-hacking, significance chasing or data snooping. If you do many and repeated statistical tests (multiple comparisons) on a data set, then some will be statistically significant by chance.

This typically happens when a data set is examined too many times with many statistical tests on the data and then only reporting or paying attention to those results that come back with statistical significance. By replacing key parts of the sensitive data, masked data can be separated from identifiable persons or rendered useless for malicious purposes. This leads to a spurious excess of false-positive and statistically significant results. Data masking, also known as data de-identification, replaces sensitive values with algorithmically determined values to mask the sensitive data. Data dredging is the cherry-picking of multiple statistical tests on a data set to demonstrate a promising or attractive finding.

0 Comments

Data dredging and masking

Leave a Reply.

Author

Archives

Categories