We give confidence intervals without correction for multiple testing. We used 2 to test for differences in proportions. Assessment of the impact of non response used standard methods of multiple imputation that assume the data are missing at random.12 We also investigated how much lower than the (observed) mean for responders the (unobserved) mean for non responders would have to be, in order to remove any intervention effect.13Agreement was good between pairs of raters assessing the quality of reviews (intraclass correlation coefficient for total review quality instrument score 0.65) and the number of deliberate major errors identified (intraclass correlation coefficient 0.91)..

