Scientific evidence must be evaluated by forensic experts through a peer review process, which the courts often rely on to determine validity of scientific methods. Similarly, you must carefully evaluate the validity of the material supporting your work. For this assignment, you must use at least three Scholarly, Peer-Reviewed, and Other Credible Sources (Links to an external site.) in addition to the course text. You may also want to review the recommended resources, which may further support your work on this written assignment.
In your paper, address the following:
- Evaluate the evolution of forensic science.
- Identify examples of scientific methods that have been disproven.
- Explain the peer review process.
- Compare and contrast common perceptions to the realities of forensic science.
- Explain the CSI effect.
- Evaluate what impact the CSI effect has or does not have on the forensic field and the criminal justice system.
- Evaluate the impact of junk science, real or perceived, on the forensic field and criminal justice.
The Is All Good and True? paper
- Must be 750 words in length (not including title and references pages) and formatted according to APA style as outlined in the Ashford Writing Center’s APA Style (Links to an external site.)
- Must include a separate title page with the following:
- Title of paper
- Student’s name
- Course name and number
- Instructor’s name
- Date submitted
- For further assistance with the formatting and the title page, refer to APA Formatting for Word 2013 (Links to an external site.).
- Must utilize academic voice. See the Academic Voice (Links to an external site.) resource for additional guidance.
- Must include an introduction and conclusion paragraph. Your introduction paragraph needs to end with a clear thesis statement that indicates the purpose of your paper.
- For assistance on writing Introductions & Conclusions (Links to an external site.) as well as Writing a Thesis Statement (Links to an external site.), refer to the Ashford Writing Center resources.
- Must use at least three scholarly, peer-reviewed, and/or credible sources in addition to the course text.
Review Article Peer review in forensic science Kaye N. Ballantyne a,b, * , Gary Edmond c, Bryan Found a,c aOfﬁce of the Chief Forensic Scientist, Victoria Police Forensic Services Department, Macleod Victoria, Australia bSchool of Psychology and Public Health, La Trobe University, Bundoora, Victoria, Australia cProgram in Expertise, Evidence and Law, Faculty of Law, University of New South Wales, Kensington 2052, Australia A R T I C L E I N F O Article history:
Received 24 November 2016 Received in revised form 3 April 2017 Accepted 17 May 2017 Available online 25 May 2017 Keywords:
Peer review Veriﬁcation Cognitive factors Expert evidence Report A B S T R A C T Peer review features prominently in the forensic sciences. Drawing on recent research and studies, this article examines different types of peer review, speciﬁcally: editorial peer review; peer review by the scientiﬁc community; technical and administrative review; and veriﬁcation (and replication). The article reviews the different meanings of these quite disparate activities and their utility in relation to enhancing performance and reducing error. It explains how forensic practitioners should approach and use peer review, as well as how it should be described in expert reports and oral testimony. While peer review has considerable potential, and is a key component of modern quality management systems, its actual value in most forensic science settings has yet to be determined. In consequence, forensic practitioners should reﬂect on why they use speciﬁc review procedures and endeavour to make their actual practices and their potential value transparent to consumers; whether investigators, lawyers, jurors or judges. Claims that review increases the validity of a scientiﬁc technique or accuracy of opinions within a particular case should be avoided until empirical evidence is available to support such assertions.
© 2017 Elsevier B.V. All rights reserved.
Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 1.1. What is peer review? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 1.1.1. Editorial peer review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 1.1.2. Peer review by the scientiﬁc community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 1.1.3. Technical and administrative peer review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 1.1.4. Veriﬁcation—checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 1.1.5. Veriﬁcation—replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 1.2. Standards and guidelines around peer review in forensic science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 1.3. Effectiveness of peer review & veriﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 1.3.1. Editorial peer review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 1.3.2. Technical and administrative review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 1.3.3. Veriﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 1.4. Designing ﬁt for purpose peer review systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 1.4.1. Deﬁning the purpose of the review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 1.4.2. Detailing sources of error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 1.4.3. Blinded peer review and cognitive factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 1.4.4. Selection of reviewers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 1.4.5. Maximising efﬁciency and effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 1.4.6. Reporting peer review outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 1.5. Peer review and the courts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 * Corresponding author at: Ofﬁce of the Chief Forensic Scientist, Victoria Police Forensic Services Department, Macleod Victoria, Australia.
E-mail addresses: email@example.com (K.N. Ballantyne), firstname.lastname@example.org (G. Edmond).
http://dx.doi.org/10.1016/j.forsciint.2017.05.020 0379-0738/© 2017 Elsevier B.V. All rights reserved. Forensic Science International 277 (2017) 66–76 Contents lists available at ScienceDirect Forensic Science International journal homepage: www.elsevier.com/locate/forsciint 1.6. Peer review taxonomy—a guide to effectiveness and accuracy claims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 1. 7. Conclusions and recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 1. Introduction Peer review is one of the central components of the scientiﬁc framework underpinning the publication process in journals, the awarding of grants and honours, and promotion of academics. It has long been held up as the premier approach to ensure the validity of methods and conclusions, to detect errors and fraud, and to improve the quality of learned papers . Courts have used peer review as an indicator of ‘good science’ and general acceptance within the relevant communities of experts, with landmark rulings such as Daubert and Kumho deeming peer review as an important factor in determining whether a scientiﬁc method can be accepted as valid [1,2]. The forensic sciences have universally adopted peer review, most conspicuously veriﬁcation, as an essential part of quality management and error mitigation systems. Accrediting bodies have mandated case ﬁle review as part of standard quality control procedures, and professional societies have recommended the use of veriﬁcation or review to ensure the soundness of conclusions drawn, and as a way of reducing error rates inherent in subjective methods.
Notwithstanding its long and widespread use, the value of peer review is frequently exaggerated, an outcome that may be the result of the variety of meanings attributed to the term. There is little evidence of the effectiveness of either peer review or veriﬁcation. Among lawyers and forensic scientists there appears to be limited awareness of concerns about the ability of peer review, in any of its guises, to ensure methodological soundness or detect error and fraud. Indeed, it is seldom appreciated that in many of the high proﬁle cases of known erroneous identiﬁcations or miscarriages of justice, peer review and veriﬁcation failed to detect the error (e.g. [3–5]). Likewise, independent reviews of problematic laboratories and units within the United States have indicated that technical review procedures were inadequate, non- existent or completely undocumented, performed long after the report was issued, or that case ﬁle contents were so incomplete as to make a thorough review impossible [6–8].
There has been concern among many forensic scientists that the error rates cited in the PCAST report  are inaccurate and unrepresentative of true case work error rates, due to an absence of veriﬁcation and review procedures in black box studies. For example, the OSAC Friction Ridge Subcommittee Response to PCAST indicated that the as the quoted black box studies do not contain any veriﬁcation, the error rate “is expected to be lower, perhaps to a substantial degree, than those values reﬂected by the PCAST” . Likewise, the Association of Firearm and Tool Mark Examiners (AFTE) regard the recommendation that court testimony refer to error rates from a single study performed on ﬁrearm examination as “irresponsible and inaccurate”, in part due to the lack of technical and quality review processes in this study . However, the claim that veriﬁcation or review will lower error cannot be substantiated with empirical data in most disciplines, where error rates and distributions are unknown.
The risk of exaggerating the effectiveness of the various forms of peer review encountered across the forensic sciences is serious, and overt reliance on the practice to prevent errors may not be achieving desired aims. We introduce a taxonomy of review and veriﬁcation processes, applicable to both scientiﬁc publications and forensic opinions. Our aim is to encourage transparency in order to facilitate more reliable estimation of the ability of peerreview to contribute to the accuracy of evidence produced by forensic practitioners.
1.1. What is peer review?
The term ‘peer review’ is used to describe a range of different practices, used for a variety of purposes. Scientiﬁc articles are commonly subjected to editorial (pre-publication) review, where works are scrutinised by knowledgeable peers from a relevant ﬁeld. Forensic reports and statements are checked through a process of technical and administrative review, ostensibly to ensure the accuracy and completeness of the opinion and associated documentation. Veriﬁcation, within the forensic sciences, might involve replication (i.e. independent analysis or re-analysis) or just a review of the original examiners’ analysis and opinion(s) to conﬁrm the result and prevent erroneous opinions being reported. While all are collectively referred to as peer review, they have very different aims, involve different methods of review and the evidence of effectiveness varies. Below we describe the broad range of peer review applications in the context of both mainstream academic science and within the forensic sciences, followed by an examination of the evidence for the effectiveness of the forensically relevant review types in relation to the aims of the process.
1.1.1. Editorial peer review Within academic (and some commercial) scientiﬁc domains, peer review is primarily a checking process, where two or three individuals, knowledgeable in the ﬁeld, scrutinise papers to determine if the methodology is sound and applied in an appropriate manner, if the data produced has been correctly analysed with suitable statistical tests, and if the conclusions and recommendations drawn are appropriate to the breadth and depth of the study [12,13]. In most disciplines, reviewers do not, and cannot, replicate experimental methods or data—they must use their professional and scientiﬁc expertise to determine if the documented experimental design, methods, results and conclu- sions appear valid [12,13].
In the majority of cases, where review is for publication (or the award of research grants) reviewers spend less than 10 h reviewing submissions, with a median of 6 h across all disciplines . Whilst reviewers scrutinise technical attributes of the research, as well as scientiﬁc quality, clarity of presentation and ethical validity , the review process does not conclusively authenticate or endorse the validity of the particular methods and conclusions. Instead, editorial peer-review, beginning with the Royal Society of Edinburgh in 17 3 1 , was intended to assist editors in the selection of manuscripts for publication, by distributing material to “those members who are most versed in these matters”. From the start, the ultimate responsibility for the integrity of the article lay with the author:
“Responsibility concerning the truth of facts, the soundness of reasoning, in the accuracy of calculations is wholly disclaimed:
and must rest alone, on the knowledge, judgement, or ability of the authors who have respectfully furnished such communi- cations” .
Despite the early start to editorial peer-review, the practice was not formalised until the mid-20th century, with Science and The K.N. Ballantyne et al. / Forensic Science International 277 (2017) 66–76 67 Journal of the American Medical Association (JAMA) not using independent reviewers until after 1940 . From that time however, peer review was formalised and institutionalised, and is now standard practice for most scientiﬁc and medical journals. The majority of journals continue to rely on single-blind methods of review, where the reviewer is aware of the author’s identity .
However, the use of double-blind reviewing is growing, due to concerns of bias regarding author and institutional prestige. In recent years, post-publication review, where readers can comment on papers after publication, and open review, where reviewers’ names and comments are available to the author and journal readers, have been proposed and trialled, but have not been widely adopted [14,16].
Reviewers are generally unpaid volunteers, with editors soliciting reviews from individuals viewed as knowledgeable and capable with the subject matter . In general, the only pre- requisite to becoming a reviewer is to have published in the area.
Little training on reviewing is provided to scientists, and many reviewers indicate that they would appreciate more guidance, and support from the editors and journals [14,16]. Notwithstanding limitations, when surveyed, the vast majority of scientists believe that without peer review there would be less control in scientiﬁc communication; peer review improves scientiﬁc communication; and peer review had speciﬁcally improved their own manuscripts [14,16].
1.1.2. Peer review by the scientiﬁc community Post-publication assessment of articles by the wider scientiﬁc community, although historically rare, is rapidly increasing in visibility. Traditional post-publication review by the scientiﬁc community was generally slow and fragmented. Assessments of the strengths and weaknesses of particular studies is most commonly embedded within subsequent articles and reviews, rather than in readily accessible commentaries . Citation rates, commonly used to evaluate the impact an article has within the community, are highly stochastic and inﬂuenced by numerous factors (including disagreement). Papers published in higher status journals attract greater numbers of citations regardless of scientiﬁc merit, self-citations may inﬂate the perceived impact, and geographic origin and language can also inﬂuence citation decisions .
In recent years a new form of post-publication interaction has emerged, with the emergence of commenting facilities in multiple forums. Some journals provide post-publication comment by readers for articles, although these are rarely used. More commonly, commentary and robust discussion occur on online repositories and platforms such as F1000, PubMed Commons, ResearchGate and PubPeer [17,19,20]. Such forums allow open discussion regarding ﬁndings, methodological weaknesses and limitations of the research, from anonymous commentators (PubPeer), registered users (F1000, ResearchGate) or selected experts/peers (PubMed Commons). Discussion on such sites between experts and authors have uncovered serious methodological ﬂaws in high impact research, leading to retractions and debunking of ﬂawed science . However, most articles published do not receive critical analysis. A good deal of engagement manifests as non-constructive criticism and ad hominem commentary . Further, discussions in online repositories and scientiﬁc social media are not readily searchable or linked to publications, thereby reducing their visibility to non-specialists.
Technical and administrative peer review Unlike editorial peer review, technical and administrative reviews are used not to check the validity of new methodologies or theories, but the application of existing methods to forensic casework. Such reviews are intended to ensure that examinationsperformed are apposite, results and conclusions are accurate and the documentation complete (e.g. Refs. [21–25]. For some types of report or analysis, there is a division between technical and administrative review, with different individuals conducting each type of review according to expertise and requirements. Individu- als knowledgeable and competent in the technical aspects of procedures conduct reviews of case notes, charts, data records, calculations and photographs to ensure that appropriate inves- tigations have been conducted; the results are scientiﬁcally accurate and complete; and that any opinions tendered are sound and within the constraints of validated scientiﬁc knowledge. In contrast, the administrative review appraises the manner and form in which the results and opinions have been communicated, including sense, consistency, and grammar.
1.1. 4 . Veriﬁcation—checking Within the context of forensic science, there is a further form of peer review – veriﬁcation or the replication of analysis by a second examiner. This most commonly occurs within the pattern identiﬁcation disciplines, formalised within the ACE-V (Analysis, Comparison, Evaluation and Veriﬁcation) process  used by latent ﬁngerprint, shoeprint and ﬁrearm/toolmark examiners .
However, there are no clear guidelines or standards on how veriﬁcation is to be performed, even within the ACE-V process, and there appears to be considerable divergence across forensic science communities on the nature and purpose of veriﬁcation.
Within the latent print community, veriﬁcation has three objectives: “to examine the scientiﬁc validity of a reported conclusion, to examine the scientiﬁc validity of the methodology employed (ACE) to draw a conclusion, and to examine the ability of a conclusion to withstand scientiﬁc scrutiny” . In some laboratories or units veriﬁers may have complete access to the case ﬁle, or simply to the identity and decision of the initial analyst, and thus are not blinded to the opinion reached by the examiner.
Although blind veriﬁcation has been proposed on numerous occasions, many latent print examiners believe that veriﬁcation cannot be performed appropriately without assessment of all the information and features utilised by the original examiner [4,28– 30]. Thus, within the ACE-V framework veriﬁcation is not seen as a true replication, whereby an examiner independently evaluates and interprets the evidence. Instead, veriﬁcation operates in a manner analogous to the more conventional peer review in academic science—a check on the analysis and conclusions drawn by the original examiner.
1.1. 5 . Veriﬁcation—replication Formal replication of analysis and interpretation within cases seems to occur on a relatively ad-hoc basis, and differs between cases, disciplines and laboratories. Some laboratories have formalised programs for replication in all cases, or in cases judged to be complex or difﬁcult. This may be through the application of what is effectively an ACE-ACE procedure, where two examiners independently perform a complete analysis, comparison and evaluation, followed by revelation and, if necessary, harmonisation of opinions (e.g. Refs. [4,31]). However, in the authors’ experience, most laboratories or examiners have a more piecemeal approach to replication. In some instances, replications are conducted as a part of the technical review stage, where the reviewer is not blinded to the outcome achieved by the original reviewer, and are thus analogous to a veriﬁcation check. In others, replication may manifest as an informal check, where the examiner presents material to another authorised examiner for a second opinion, without providing their conclusions. In such instances the second examiner effectively replicates the key stages of analysis (such as determining the number of contributors for a mixed DNA proﬁle, observing the correspondence between paint layers, or examining 68 K.N. Ballantyne et al. / Forensic Science International 277 (2017) 66–76 handwriting samples for indications of forgery or disguise), without having to replicate all preceding preparation and handling steps.
1.2. Standards and guidelines around peer review in forensic science The practice of reviewing work products is mandated within forensic science by numerous standards, codes of conduct and advisory bodies. The ISO/IEC17025 Standard “General require- ments for the competence of testing and calibration laboratories”, to which many forensic laboratories are accredited, does not formally require any form of review of reports or opinions, but does specify the need for periodic audits of all elements of the management system, including testing and/or calibration activi- ties. These audits must conﬁrm the effectiveness of operations, the validity of a laboratory’s tests and the quality control activities undertaken .
Because ISO/IEC17025 is necessarily broad, accreditation providers have developed Forensic Application documents, trans- lating the Standard requirements to the particular needs of forensic casework. As these are interpretative criteria and recommenda- tions, there are variations in approaches among the different accrediting bodies and more speciﬁc standards regarding peer review and veriﬁcation requirements. The ILAC –G19:08/ 2014 Modules in a Forensic Science Process recommends “critical ﬁndings checks” . If this check is the only means of quality control on the results, such as for blood pattern analysis, footwear comparison or damage interpretation, then the check must be performed in a blinded manner without knowledge of the original ﬁndings. In addition, any interpretative opinions or observations must be reviewed. If deemed necessary, reports should also be technically reviewed by a competent authorised person. The National Association of Testing Authorities (NATA, Australia) requires 10 0 % of case ﬁles to be both technically and administra- tively reviewed unless risk assessments have been completed for reducing this percentage . ASCLAD/LAB (USA) suggests that the sampling rate for technical review may vary depending on the requirements for example, new analysts may have all of their cases reviewed, whereas senior analysts may have only a few cases per month reviewed . Administrative reviews are required for every report. Neither NATA nor ASCLAD/LAB specify requirements for veriﬁcation or replication of results for critical ﬁndings, nor mention blinding of the reviewer. Thus, depending on the accrediting body, forensic laboratories may have very different standards and requirements for peer review and veriﬁcation of results, opinions and reports.
1.3. Effectiveness of peer review & veriﬁcation 1.3.1. Editorial peer review Given the ubiquity of peer review across scientiﬁc ﬁelds, it is perhaps surprising that very little data exist regarding the effectiveness of the various processes. It is only in recent years that empirical testing of editorial peer review has been conducted.
The results suggest that the practice encounters real problems achieving the aims of enhancing validity and accuracy of methods and results. With over a million articles reviewed and published per year across all disciplines , along with hundreds of thousands of grant applications and proposals, peer review represents a signiﬁcant investment of researchers’ time and effort—but for what return? Empirical studies suggest that not only is there bias amongst reviewers, but the detection of errors, methodological ﬂaws and fraud is low within current review systems [36–38].
As the vast majority (estimates of around 85%) of reviews are single-blind , where the reviewer knows the identity of theauthor, peer reviewers are subject to a range of social and cognitive biases, which have been shown to impact upon the acceptance and subsequent publication of scientiﬁc articles. Authors from presti- gious institutions are more likely to have papers accepted than those from less prestigious organisations . Reviewers also have a tendency to exhibit high levels of conservatism, with prejudice against work that is ground-breaking or innovative [40–42], or where data contradicts established theories and views held by the reviewer [40,43]. Reviewers nominated by authors give signiﬁ- cantly more favourable recommendations than editor-nominated reviewers [44–49], suggesting a positive bias towards acceptance.
The primary issue with peer review is simply that there is little evidence that it actually works—i.e. does what is claimed or intended. The available evidence suggests that the widespread perception of review validating (or warranting) new scientiﬁc developments does not match what current review systems are achieving. A Cochrane Collaboration review of editorial peer review in biomedical journals  concluded that there is little systematic empirical evidence to support its use. Although there was evidence to suggest that peer review makes papers more readable and improves general quality, study validity was generally only improved if a statistical reviewer is added , or if reviewer attention is speciﬁcally directed to focus on methodology and statistical issues through the use of checklists [50–52]. Analysis of post-publication reviewer reports for the CrimeSolutions.gov evidence-based policing information reposi- tory revealed that ﬁve factors of scientiﬁc validity (design quality, null ﬁndings, negative results, program ﬁdelity and conceptual framework) accounted for only 5% of the variation in reviewer scores, even though the reviewers are carefully selected to be experts in evidence-based policing, are provided with substantial training prior to becoming a reviewer, and evaluate studies using an “elaborate set of rubrics” . This study suggests that important indices of scientiﬁc qualities are not being heavily used by trained and motivated reviewers, who are instead using more intangible or inaccurate proxies to judge an article’s quality.
Analysis of published articles has found serious errors in statistical methodologies, over-claiming and inappropriate analyses in published, peer-reviewed manuscripts, suggesting the need for improvement in statistical review . Further analysis of errata published in the top science journals Nature, Science and PNAS reveals a surprising error rate, with 3.6–6.4% of papers requiring post-publication correction . These errors and oversights were not detected by authors, peer reviewers and editors. Of these, 18 % were considered severe enough to require substantial modiﬁcation to ﬁgures or re-interpretation of results. These ﬁgures reﬂect only errors which were publically corrected, and most likely do not reﬂect the true error rate amongst published papers.
The prevalence of statistical errors and the lack of focus on methodology is commensurate with the few available empirical studies on the accuracy and reliability of review. Studies with ﬁctitious manuscripts with errors deliberately inserted have found that reviewers tend to miss major issues. Within a manuscript containing 10 major and 13 minor errors, 68% of reviewers did not identify that the conclusions were not supported by the data, and 95.5% did not detect incorrect calculations. Unsurprisingly, 41 % of the reviewers recommended publication or publication following revision of the manuscript, having detected only 1. 7–3.0 of 10 major errors . Similar studies have found comparable rates of lack of detection of errors in methodology, with one study ﬁnding only 10 % of reviewers were able to ﬁnd at least half of the major errors introduced , and another with an average of 2.58 major and 0.91 minor errors found from nine and ﬁve introductions respectively . The non-detection of statistical errors, inappro- priate analyses and ﬂawed presentation of results seem to be the most common failures in reviewing . K.N. Ballantyne et al. / Forensic Science International 277 (2017) 66–76 69 It is thus perhaps unsurprising that the inter-rater reliability of peer-reviewers has been found to be low, with single-rater reliabilities between 0.19 to 0.54 for journal articles  and 0.15 for the quality of grant proposals , well below the more acceptable levels of 0.8–0.9. Using reviewers speciﬁcally trained in statistics or epidemiology increases the detection of errors when reviewing medical trial literature [38,50,56], but general training in reviewing articles has not generated signiﬁcant improvements in quality or error detection in the long term [59–61], nor does mentoring  or speciﬁc feedback from editors .
Specialist forensic science journals offer particular challenges within the peer review arena. Many discipline-speciﬁc publica- tions, such as the Association of Firearm and Toolmark Examiners (AFTE) Journal, the Journal of the American Society of Questioned Document Examiners (ASQDE) and the Journal of the International Association of Blood Pattern Analysis (IABPA) restrict reviewers to editorial boards or members — who in most instances must be experienced members of the discipline. External reviewers, such as academic scientists, statisticians or psychologists versed in human factors tend not to be utilised. As such, concerns have been raised regarding the independence of the review process for such journals, and the extent to which articles are evaluated by individuals qualiﬁed and versed in experimental design, research methodology and statistical evaluation .
1.3.2. Technical and administrative review No published, empirically derived reports exist regarding the ability for technical and administrative reviews to detect errors, enhance accuracy or improve the communication of opinions and results, despite their mandated use. Anecdotally, examiners regard technical review as valuable in conﬁrming standard operating procedures were followed, ensuring case ﬁle contents are complete, and that documentation of examinations, methods and ﬁndings is sufﬁcient. The review process might plausibly detect errors or omissions for analytical procedures such as DNA or illicit drug analysis; where the methods used are fully documented and features used to form opinions can be accurately described and compared to well-characterised reference material and databases.
However, there are few reasons to be conﬁdent that technical review can increase the accuracy of the cognitive forensic sciences — i.e. feature comparison procedures. In particular, analyses where the features relied upon, and weight assigned to them, are not explicitly deﬁned and explained by the primary analyst are difﬁcult to review for accuracy and validity, as the decision making process is unclear to the reviewer. In such instances, the reviewer must re- analyse the evidence (thus performing a veriﬁcation), ﬁnding their own features and drawing their own conclusions, to determine if the primary examiner has reached an appropriate conclusion. If such re-analysis does not occur, and the rationale behind the opinion is not explicit, the reviewer can only determine if procedures have been followed, and if the case ﬁle documentation is complete. In these circumstances they cannot assess the accuracy of the opinion.
There are also few standards and guidelines regulating docu- mentation of analyses and case ﬁle contents, and no standards or training on how to conduct technical reviews, what should be checked by reviewers, and to what level any disputes or disagree- ments should be documented. There are requirements in many of the accreditation and guideline documents (e.g. [25,33,34,65,66]) that organisations develop written procedures detailing the type of cases which require review, the personnel authorised to conduct reviews, the percentage of cases requiring review, and the processes to be followed in the event of a disagreement between examiner and reviewer. However, no detailed guidance exists for how to develop these procedures, or how to measure empirically that the reviews fulﬁl the stated aim of enhanced accuracy.1.3.3. Veriﬁcation Few studies have been performed on the ability for the various forms of veriﬁcation to detect errors, and to our knowledge they exist only within the ﬁngerprint comparison domain. The ﬁrst study on the effectiveness of veriﬁcation for latent print comparison, performed in a training environment, found that veriﬁers were largely able to detect erroneous identiﬁcations (1 miss amongst 50 judgements, by a trainee examiner), but misjudged some correct identiﬁcations as non-matches (9 false negatives amongst 200 judgements) . Within a casework environment with six qualiﬁed examiners, a controlled study of the ACE and ACE-V processes demonstrated consensus rates of 94% between examiners’ decisions, with non-blinded veriﬁers correctly detecting all nine ACE false positives, but none of the six false negative opinions . Langenburg et al., while demonstrating the potential for context information to inﬂuence ﬁngerprint exam- iners, showed that inconclusive responses increased when examiners were provided with the opinion of another examiner, for both correct and incorrect original judgements, but errors did not . The largest study to date, on the accuracy and reliability of ﬁngerprint comparisons, obtained a false positive rate of 0.1%, with all errors occurring on different comparisons. False negative decisions were non-randomly distributed, such that veriﬁcation may not detect all such errors—an estimated 0.85% of false negatives would not be detected during veriﬁcation .
Based on studies conducted to date, veriﬁcation appears to be a useful tool for reducing erroneous opinions. However, empirical estimates of the effectiveness of veriﬁcation, with comparison to single-examiner error rates, are available only for ﬁngerprint examination. Other disciplines such as footwear examination, bloodstain pattern analysis, ﬁrearms and toolmark examination or even DNA proﬁle interpretation do not have empirical evidence of error rates from large, well-designed and controlled studies, and are yet to examine the potential (or otherwise) of either blind or non-blind veriﬁcation. As such, claims of increased accuracy or validity through the use of veriﬁcation (or indeed technical and administrative review) cannot be evaluated.
1.4. Designing ﬁt for purpose peer review systems Given the lack of empirical evidence on the effectiveness of peer review in its various forms, it is instructive to consider how new systems may be designed to help to fulﬁl the aims of review— ensuring the validity and accuracy of scientiﬁc analyses conducted and reported within forensic science domains. As extensive discussions and preliminary trials exist on the reform of editorial peer review (e.g. [36,38,40,56,58,60–62,71]), our discussion will be limited to technical and administrative review and veriﬁcation.
1.4.1. Deﬁning the purpose of the review It is imperative, when using or referring to peer review and veriﬁcation, to consider the purpose of the review that is proposed, its capabilities, and the claims that are made in relation to it. This is explored further below in the taxonomy of forensic science reviews, but the importance of matching the claims to the effectiveness of review cannot be underestimated. If, for example, a laboratory desires to have an administrative review system that ensures the completeness and correctness of case ﬁle contents, then this can be achieved via comprehensive checklists for reviewers to follow. In such circumstances, there is no need for re-analysis of the evidence and claims about the process enhancing reliability or accuracy should be avoided. However, if claims of increased accuracy of opinion or decreased risk of error are made, procedures must be in place to conﬁrm appropriate evidence handling, development and use of case-speciﬁc and appropriate propositions, evidence analysis in line with validated protocols, 70 K.N. Ballantyne et al. / Forensic Science International 277 (2017) 66–76 and interpretation and weighting of the opinion being consistent with logical norms and the limits of validated procedures. Simply conﬁrming that the case ﬁle contents are complete does not support increased accuracy, especially where the procedure has not been demonstrated to be foundationally valid . The factors that must be considered in order to meet the aims of peer review, as currently embodied in standards and guidelines, are discussed below.
1.4.2. Detailing sources of error A critical factor in designing ﬁt for purpose review systems is the knowledge of error rates and their distribution. Developing systems that mitigate or detect errors requires detailed knowledge of the type of errors that can occur, the risk factors associated with increased error, and how each type of error happens. If, for example, errors in the comparison and matching of shoeprints are infrequent and randomly distributed, with no particular compari- son or examiner more likely than any other to produce an error, a second examination by a competent examiner should, theoreti- cally at least, ensure that errors will be detected. However, if errors are more likely to occur on particular types of comparison, such as those with fewer features or of higher complexity, then veriﬁcation by a single veriﬁer may not signiﬁcantly reduce the risk of error. In such cases, additional veriﬁers and fully independent replication may be required to maximise the probability of error detection.
Sufﬁcient knowledge of the causes of error can also assist in maximising the efﬁciency of the review process, and prevent unnecessary checks. It has been demonstrated that the accuracy rate of ﬁngerprint examiners is extremely high on high quality latent prints, and that a single veriﬁcation is likely to be sufﬁcient to detect the occasional false positive error that may be made .
Likewise, a single source DNA proﬁle may only need a single examiner’s judgement on the number of contributors and presence/absence of artefact peaks. In contrast, a low quality latent print, or a complex DNA mixture from three or four people, may require more elaborate checking and veriﬁcation to prevent errors in the detection and interpretation of relevant features.
Notably, some forensic laboratories such as the Netherlands Forensic Institute have begun to tailor the veriﬁcation process to the evidence quality, with complex or low quality ﬁngerprints receiving additional (blind) veriﬁcation . Tailored veriﬁcation processes require an initial judgement on quality or complexity, so as to direct the case to the appropriate processing stream. This enables the most efﬁcient use of limited resources while providing an anticipated increase in accuracy for high risk cases.
1.4.3. Blinded peer review and cognitive factors Human factors and potential sources of bias must be considered when designing effective peer review systems. The threat posed by contextual information is well documented within the forensic sciences, and has been shown to impact on the veriﬁcation process as well. Several studies have noted differences in the nature of opinions provided by examiners and veriﬁers in latent ﬁngerprint comparison, and between examiners who are aware that their work will (or may) be reviewed and those unaware [68,70,67].
Examiners who knew their work would be checked by a second examiner displayed greater certainty in their opinions, and had more false negative judgements. In contrast, veriﬁers were more cautious and conservative, rating their conﬁdence in the opinion lower. The veriﬁers missed false negative opinions more frequent- ly, as the veriﬁer may not review exclusions to the same level as identiﬁcations [68,70,67]. Thus, to ensure that all veriﬁcations are performed appropriately, with equivalent search strategies and aims, there are tangible beneﬁts to the veriﬁer not knowing the conclusion when re-examining the evidence. Ideally, the exam- iners should not know whether they are conducting the primaryexamination or a veriﬁcation, to avoid the potential differences in motivation and reasoning mentioned previously. This system has also been incorporated into the Netherlands Forensic Institute complex ﬁngerprint examination process, whereby ‘ordinary’ ﬁngerprints are veriﬁed by one examiner, while for complex marks three examiners will independently examine the prints, effectively providing two veriﬁcations .
The value of blind veriﬁcation has been hotly debated within the ﬁngerprint community. If veriﬁcation is viewed as a form of peer review, then there is a belief that blind veriﬁcation under- mines the process as the veriﬁer cannot determine how the practitioner arrived at the conclusion [29,30]. Indeed, commenta- tors have espoused the view that the use of blind veriﬁcation “seriously compromises” or negates the scientiﬁc method, by testing only the consistency of a conclusion, but not the validity of it . Moreover, there is a view that exposure to the additional information does not compromise the independence of the comparison, as the veriﬁer explicitly seeks to falsify the conclusion of the ﬁrst examiner, and can view and document their own conclusions separately to the ﬁrst examiner [28,29,30]. The latter views, however, are contrary to the wealth of evidence showing that conﬁrmation biases are automatic and operate without conscious awareness [72–74 ].
The former view, that blinding negates the scientiﬁc method and the purpose of peer review, conﬂates the aims of veriﬁcation with the aims of review. Peer review (as editorial peer review) does not aim or claim to examine the “scientiﬁc validity” of opinions or methods. Rather, scrutiny is intended to determine whether practices are consistent with scientiﬁc norms and presented logically. Veriﬁcation (and technical review) do claim to be able to ensure accuracy and validity. In the same way that primary examinations should be free of potentially biasing domain irrelevant information, veriﬁcation should take place wherever possible free of such information. Doing so does not invalidate the veriﬁcation, but provides additional empirical evidence for the validity of the original conclusion. The methods and protocols that are utilised should be empirically validated prior to use on casework (rather than for each case, as the ACE-V method purports to do ), while a technical review should be used to ensure that the case ﬁle documentation is complete and appropriate, the reporting is scientiﬁcally supportable and all examinations deemed appropriate have been conducted.
Beyond blind veriﬁcation, there are a range of additional cognitive elements which should be considered when designing peer review protocols, particularly around decision base rates and choice of reviewer. First, veriﬁcation and review rates are often highly skewed towards positive opinions or identiﬁcation deci- sions, with few inconclusive or exclusion opinions reviewed. For veriﬁcation, this may be for efﬁciency reasons – verifying every opinion produced requires a substantial commitment in resources.
For technical and administrative reviews, this is due to skewed reporting – exclusions or inconclusive judgements may be conveyed via email or short laboratory report, as they are less likely to result in charges or be required at trial. In contrast, identiﬁcation evidence or opinions that are represented as highly probative are more likely to be incorporated in an expert report or oral testimony, and so require additional scrutiny and checking.
However, this can create a system whereby veriﬁers are primed to expect identiﬁcation or positive opinions, even when operating in a purportedly blinded system. A US-based survey of 56 ﬁngerprint laboratories showed that there were large differences between laboratories in their choice of veriﬁcation strategies. A number of laboratories only verify identiﬁcations (18% of laboratories), or perform checks on exclusions or inconclusive prints in a very small percentage of cases (9% of laboratories), or particular types of cases such as homicide or sexual assault (11%). Thus, based on this study K.N. Ballantyne et al. / Forensic Science International 277 (2017) 66–76 71 at least, ﬁngerprint examiners in 38% of federal, state or local laboratories may be at risk of base rate biases, where an examiner can correctly assume that any given veriﬁcation they are asked to perform will follow an identiﬁcation by another examiner .
Introducing a random selection of inconclusive and exclusion decisions, even if of a small proportion of all opinions, is likely to lessen the risk of this biasing of reviewers towards identiﬁcation decisions.
1.4.4. Selection of reviewers It is common practice in many laboratories for the examiner to choose the veriﬁer. While this may be the ﬁrst available person, or an individual speciﬁcally assigned to conduct veriﬁcations, it may also provide an opportunity to select a reviewer known to hold similar viewpoints and ways of reporting or expressing con- clusions . Reviewer ‘shopping’ may create groups with common interpretation and reporting practices within disciplines, where some groups provide conservative opinions and others more liberal interpretations of the weight of evidence. These may produce undesirable relations of trust and deference. Alternatively, in some larger departments and laboratories, senior staff may perform all of the technical and administrative reviews. While this may be appropriate given their expertise, it may be difﬁcult for a junior examiner to question the senior veriﬁer’s opinion. It also assumes that the most senior or experienced practitioners are necessarily the most competent. Ideally, a system should be developed whereby primary examiners cannot select individual reviewers, and where the task of reviewing is matched with the level of expertise and competence required for the task. A veriﬁcation by replication will require a reviewer fully authorised in the technique; an administrative review may instead only require an individual conversant with the technique and its limitations. Regardless of reviewer selection or any power imbalance, any discrepancy or disagreement between the exam- iner and veriﬁer must be documented and mediated to reach a supported conclusion. If agreement cannot be reached, the matter must be escalated, with additional examiners and/or senior staff adjudicating. In the risk-averse forensic community, committing an ‘error’ is generally viewed extremely negatively, and may at times result in corrective action against the examiner judged to be incorrect.
Review and veriﬁcation systems, with non-blinded veriﬁers, highly skewed base rates, power imbalances and penalties for errors can lessen the value of forensic science review systems, with a reduced probability that errors or miscommunication of results will be detected and corrected. There may also be a lack of multi- disciplinary review when required, as most reviews are conducted only within work units or teams. Such consultative review systems have been proposed and successfully trialled within forensic medicine, where pathologists, toxicologists, radiologists and psychiatrists may collaborate to review cases involving drug overdoses, testamentary capacity and causes of particular wounds . Within the forensic sciences, some cases may beneﬁt from the expertise provided by imaging experts, statisticians, or experts in substrate material and trace transfer.
1.4.5. Maximising efﬁciency and effectiveness A ﬁt-for-purpose and supportable peer review system within the forensic sciences would be tailored to the type of analysis or case. Simple cases, such as those with evidentiary material of a quality and quantity identiﬁed in validation studies as sufﬁcient, may only need to undergo a single blinded veriﬁcation check.
Complex cases, where material may be distorted, fragmentary or below optimal levels, may require additional veriﬁcation checks. In all cases, the veriﬁer should be blinded to the original examiner’s decision, and their identity. Veriﬁcation tasks should be assigned ina random manner, without the ability to pre-select veriﬁers, and should be fully documented prior to any consultation with the original examiner. There should also be a documented procedure for the resolution of differences in opinion, and for seeking assistance to review opinions potentially involving issues beyond the practitioner’s area of expertise.
In many disciplines, harnessing the principle of wisdom of the crowds may bring both increased efﬁciency and effectiveness to review processes. This well-known psychological phenomenon describes the increase in accuracy provided by averaging independent decisions across a group, compared to any single decision by a member of the group . Research has shown that aggregating independent facial comparison decisions across experienced facial examiners resulted in substantial increases in identiﬁcation accuracy [78,79], and aggregating markup of ﬁnger- prints prior to AFIS searches improves hit rates . Although objections based on resourcing and time limitations may be raised against using multiple examiners performing blinded, indepen- dent replications on every case, such checks do not necessarily need to fully replicate a detailed analysis. Both ﬁngerprint and document examiners are capable of accurately performing comparisons within seconds [81,82], and it is likely many other pattern comparison specialists are able to do likewise. Obtaining rapid match/non-match judgements from ﬁve examiners may result in much greater accuracy than a single detailed comparison by one examiner followed by detailed checking by a non-blinded veriﬁer. Although such group veriﬁcation systems may be difﬁcult to achieve for some forms of evidence or methods that require greater intervention or analysis, in many instances it is the preparation and handling of the evidence that takes the majority of time, and forming the ﬁnal opinion may be relatively rapid.
Checklists and detailed guidance should be provided for the technical and administrative review of reports and statements, to ensure that the reviewer’s attention is directed to the most appropriate areas. Ensuring that limitations and caveats are detailed, assumptions and propositions are explicitly stated, and the exact methods and procedures used are listed should be a standard part of technical review. Analyses which do not have a separate veriﬁcation check should ensure that the methods used are appropriate given the case and propositions, and that the reported conclusions are scientiﬁcally supported.
Finally, systems should be in place for monitoring the validity and efﬁcacy of veriﬁcation and review. High rates of disagreement between examiners and veriﬁers may indicate emerging issues with the application of the method or in the interpretation of evidence; a complete lack of disagreement may indicate that veriﬁers or reviewers are not detecting the inevitable errors of omission, transcription or reporting which may occur within any process.
1.4.6. Reporting peer review outcomes The discussions between peer reviewers, changes made to opinions and statements as a result of peer review are, to our knowledge, rarely disclosed in reports and oral testimony, although they may be included in the case ﬁle. Alterations to assumptions, statistical weighting or the overall opinion are highly relevant to the trier of fact, and potentially speak to the reliability of the opinion provided, yet may not be disclosed to the court. Anecdotal evidence suggests that complete reversals of opinion (from match to non- match; strong support for one proposition to strong support for the opposing proposition) are rare, and usually associated with detectable and correctable errors in reasoning or data recording.
However, shifts in the level of support from strong to weak, or from inconclusive to matching, appear to occur more frequently. Changes in feature selection, such as in the identiﬁcation of relevant peaks in a DNA proﬁle or whether stria are sufﬁciently clear for a comparison on 72 K.N. Ballantyne et al. / Forensic Science International 277 (2017) 66–76 a ﬁred bullet, may also be frequent, particularly when evidence is of high complexity or low quantity. In such instances, the differences in opinion between examiner and reviewer indicate uncertainty in the analysis and evaluation of the evidence, which should be communi- cated to the court.
Including all peer review information in reports and statements is impractical, and in many cases unnecessary. However, each report should contain a statement regarding the type and nature of review or veriﬁcation performed. If disagreements occurred during these processes, their nature and extent should be noted .
1.5. Peer review and the courts Historically, courts have not been particularly attentive to issues of validity, reliability and performance when considering the admissibility (and even the probative value) of evidence from the forensic sciences. Conventional legal admissibility decision making focused on criteria such as a legally-recognisable ﬁeld, formal training or experience, acceptance, previous admission, use in other jurisdictions, apparent plausibility and even consideration of bias . Wholesale indifference to reliability was modiﬁed by the rapid uptake of DNA proﬁling and the United States Supreme Court’s inﬂuential Daubert v Merrell Dow Pharmaceuticals, Inc.  decision. Daubert also provided one of the most prominent references to peer review by a superior court.
Adjacent to references to testing, error and error rates, the use of standards and general acceptance, the Supreme Court listed peer review as an admissibility criterion for scientiﬁc evidence.
Ordinarily, a key question to be answered in determining whether a theory or technique is scientiﬁc knowledge that will assist the trier of fact will be whether it can be (and has been) tested. . . . Another pertinent consideration is whether the theory or technique has been subjected to peer review and publication.
(Daubert v Merrell Dow Pharmaceuticals Inc. , italics added) Subsequently, in Kumho, the Supreme Court explained that the Daubert criteria might be cautiously applied to ‘technical and other specialised knowledge’ .
Peer review and publication have not featured prominently in post-Daubert jurisprudence and practice and legal understanding remains incipient. Review by ‘peers’ and the fact of publication are rarely decisive in admissibility decision-making in criminal pro- ceedings. Long use and apparent acceptance by isolated communi- ties of forensic practitioners are occasionally substituted for legal interest in formal processes of testing, peer review, and publication.
Where peer review and publication are mentioned in judgements they tend to be treated superﬁcially, often with little consideration of what kind of review transpired and whether it actually enhanced the value of the expert opinion evidence.
Few jurisdictions beyond the United States have focused much attention on peer review in criminal proceedings. Engagement with peer review in English, Australian and Canadian appellate courts has tended to be thin on the ground (see e.g. Refs. [85–96].
Common law courts are more likely to excuse the lack of peer review, assume that some kind of peer review has taken place, or equate the adversarial process with a form of (peer) review than use the absence of peer review and publication to exclude evidence adduced by the state.
Part of the problem for lawyers and courts is the opacity in the way forensic scientists refer to peer review and veriﬁcation in their written reports and testimony. With few exceptions, the nature of peer review and any limitations, are rarely identiﬁed, let alone explained, in reports or explored during proceedings. In general, different types of peer review are not distinguished in criminal proceedings and judicial decisions. Rather, peer review tends to be used naively, to support the accuracy of opinions that have been peerreviewed, as though all peer review procedures provide a rigorous check or guarantee. Where a report or opinion has not been peer reviewed that omission tends to be excused, or perhaps factored-in to the assessment of the probative value of the opinion—in ways that are impressionistic. Lack of disclosure and explanation by forensic practitioners, along with limited resourcing of defendants and the technical deﬁciencies of lawyers, have presumably contributed to courts trivialising peer review (and publication) or conﬂating the potential of stronger incarnations (e.g. replication) with weaker manifestations (such as administrative review).
Some courts have, perhaps unwittingly exposing their naivety, gone as far as suggesting that routine or longstanding use of a procedure conﬁrms the validity of procedures (e.g. [97–99]).
Others have suggested that courts themselves provide a kind of rigorous review. This last contention has been embodied in the work of several sociologists of science. Brian Wynne described adversarial courtrooms as sites of ‘pure institutionalized mistrust’ . Following in these footsteps, Sheila Jasanoff advanced the idea that:
The adversarial structure of litigation is particularly conducive to the deconstruction of scientiﬁc facts, since it provides both the incentive (winning the lawsuit) and the formal means (cross- examination) for challenging the contingencies in their opponents’ scientiﬁc arguments .
While some courtrooms may occasionally provide opportuni- ties for critical engagement and the assessment of (the contingency of) scientiﬁc knowledge, it is signiﬁcant that the studies underpinning analyses by Wynne  and Jasanoff  were based on public inquiries (e.g. Windscale), mass torts (e.g.
breast implant litigation) and high proﬁle criminal cases (e.g. OJ Simpson). These examples, particularly the resources available to the parties and the quality of legal representation are not representative of quotidian trials and plea bargains. Indeed, most of the sociological commentary focuses on non-adversarial and non-criminal proceedings. Most criminal investigations and prosecutions, in contrast, do not seriously engage with, let alone provide any kind of credible check or review on scientiﬁc, medical or technical evidence [102,103]. Most criminal proceedings in the relatively well-resourced courts of advanced social democracies (including non-adversarial jurisdictions) do not provide expert assistance to those accused of serious crimes—the main exception being forensic pathology in serious criminal prosecutions.
1.6. Peer review taxonomy—a guide to effectiveness and accuracy claims Given the wide range of approaches to peer review currently deployed across the forensic sciences, and the difﬁculty in developing truly ﬁt-for-purpose and empirically proven review methods, we offer a taxonomy of peer review systems for forensic science opinions, from which end-users may infer the potential impact that the stated veriﬁcation and review may have on the accuracy, validity and completeness of the opinion provided (Table 1). The taxonomy has been developed based on known peer review practices used by independent practitioners, small forensic groups and large accredited laboratories. The effectiveness of peer review associated with each level of the taxonomy has been extrapolated from veriﬁcation studies described and referenced above, cognitive theory and knowledge of quality assurance and error distributions for a variety of disciplines.
1.7. Conclusions and recommendations Although peer review in all its guises is widespread, and is an integral part of modern quality management and error avoidance strategies, its effectiveness is largely untested, and relies on K.N. Ballantyne et al. / Forensic Science International 277 (2017) 66–76 73 anecdotal accounts and the conﬂation of improved communica- tion with improved accuracy. However, sufﬁcient knowledge exists surrounding error causation, human decision making and strategies for error minimisation that we have been able to develop both a taxonomy of peer review within the forensic sciences, and suggestions for designing and testing optimal peer review systems. Any veriﬁcation and review system which claims increased accuracy should be conducted via blind independent reanalysis of the evidentiary material, performed by a sufﬁciently qualiﬁed and trained examiner who is explicitly aware of the potential for errors and their causes within the discipline.
Potential cognitive biases caused by reviewer shopping, power imbalances or base rates should be mitigated by procedures to ensure random selection of reviewers and checking of exclusions or inconclusive decisions, while methods to resolve disagree- ments between examiner and reviewer should be developed and documented. Training programs or checklists should be devel- oped to ensure reviewer attention is directed to the appropriate areas. Ultimately, forensic laboratories will need to develop procedures which maximise effectiveness of error detection with the greatest possible efﬁciency, and with maximal transparency to the courts.
In one of the earliest iterations of peer-review, the Journal Des Scavans (Journal of the Learned) introduced a system of peer review where Academie members would ensure the content was both logical and reasonable, but if they could not take responsibility for the validity of the content, the paper was published with the note “sit penes authorum ﬁdes” (let the author take responsibility for it) . Perhaps, within the forensic sciences we need to return to this principle. The author, and primary examiner, should ensure that the interpretations, opinions and conclusions are valid and generated using reliable methods. Peer review should be viewed as a means of enhancing the logical ﬂow and presentation of the data and methods, but not necessarily as a means of guaranteeing the validity of the science or the accuracy of an opinion.Acknowledgements We thank three anonymous reviewers for their comments on the manuscript. Each provided a different perspective on peer review, leading to clariﬁcation and improved communication of our perspective on this topic.
This paper arose from discussions amongst the Evidence-Based Forensic Initiative, a multidisciplinary group of scholars and practitioners with an interest in forensic science and the law. For more information on the group, see http://evidencebasedforensics.
Edmond’s research was supported by the Australian Research Council (LP16010000).
References  Daubert v Merrell Dow Pharmaceuticals Inc 509 U.S. 579, 593 (1993).
 Kumho Tire Co., Ltd. v. Carmichael, 526 U.S. 137 (1999).
 R.B. Stacey, A report on the erroneous ﬁngerprint identiﬁcation in the Madrid train bombing case, J. Forensic Identif. 54 (2005) 706–718.
 A. Campbell, The Fingerprint Inquiry Report, APS Group Scotland, Edinburgh Scotland, 2011.
 C.T. Oien, Forensic hair comparison: background information for interpreta- tion, Forensic Sci. Commun. 11 (2009) 2.
 Michigan State Police Forensic Science Division, Audit of the Detroit Police Department Forensic Services Laboratory Firearms Unit, (2008) Last accessed 27/3/17 from http://www.sado.org/content/pub/10559_MSP-DCL- Audit.pdf.
 M.R. Bromwich, Final Report of the Independent Investigator for the Houston Police Department Crime Laboratory and Property Room, (2007) Last accessed 27/3/17 from http://www.hpdlabinvestigation.org/reports/ 070613report.pdf.
 B.E. Turvey, Forensic Fraud: Evaluating Law Enforcement and Forensic Science Cultures in the Context of Examiner Misconduct. (PhD Thesis), Bond University, 2012.
 PCAST, Forensic Science in Criminal Proceedings: Ensuring Scientiﬁc Validity of Feature-Comparison Methods. Executive Ofﬁce of the President of the United States, (2016) .
 Organisation of Scientiﬁc Area Committees Friction Ridge Subcommittee, Response to call for additional references regarding: President’s Council of Advisors on Science and Technology Report to the President Forensic Science Table 1 Taxonomy of peer review types.
Type of veriﬁcation/review Supportable claims No veriﬁcation, no review None No veriﬁcation; review by non-expert Improvement in grammar, consistency and coherence possible. No anticipated increase in accuracy or validity of opinion No veriﬁcation or technical review; administrative review by authorised individual onlyImprovement in grammar, consistency and coherence possible. Errors in documentation and case ﬁle contents may be detected. No anticipated increase in accuracy or validity of opinion No veriﬁcation; technical and administrative review conducted by authorised, competent individualsAnalytical or cognitive procedures with complete documentation of features and comparison:
Errors in reasoning, detection of critical features and strength of support provided by the evidence may be detected. Incorrect application or omission of relevant procedures may be identiﬁed. Documentation and case ﬁle contents may be more complete; grammar, consistency and coherence improved Cognitive procedures without documentation of features and comparison:
No increase in validity or accuracy of opinion is predicted. Incorrect application or omission of relevant procedures may be identiﬁed. Documentation and case ﬁle contents may be more complete; grammar, consistency and coherence improved Veriﬁcation performed in a non-blind manner; technical and administrative review conducted by authorised, competent individualsAnalytical or cognitive procedures with complete documentation of features and comparison:
Errors in reasoning, detection of critical features and strength of support provided by the evidence may be detected. Incorrect application or omission of relevant procedures may be identiﬁed. Documentation and case ﬁle contents may be more complete; grammar, consistency and coherence improved Cognitive procedures without documentation of features and comparison:
Obvious errors in reasoning, detection of critical features and strength of support provided by the evidence may be detected. Subtle errors, or those occurring on difﬁcult/complex comparisons may not be detected. Incorrect application or omission of relevant procedures may be identiﬁed. Documentation and case ﬁle contents may be more complete; grammar, consistency and coherence improved Blind veriﬁcation or dual (blinded) examination; technical and administrative review conducted by authorised, competent individualsErrors in reasoning, detection of critical features and strength of support provided by the evidence may be detected. Incorrect application or omission of relevant procedures may be identiﬁed. Documentation and case ﬁle contents may be more complete; grammar, consistency and coherence improved 74 K.N. Ballantyne et al. / Forensic Science International 277 (2017) 66–76 in Criminal Courts: Ensuring Scientiﬁc Validity of Feature-comparison Methods, (2016) Last accessed 27/3/17 https://www.theiai.org/president/ 20161214_PSAC-FR_PCAST_response.pdf.
 Association of Firearm and Tool Mark Examiners, Response to PCAST Report on Forensic Science, (2016) Last accessed 27/3/17 https://afte.org/uploads/ documents/AFTE-PCAST-Response.pdf.
 D.J. Benos, E. Bashari, J.M. Chaves, A. Gaggar, N. Kapoor, M. LaFrance, R. Mans, et al., The ups and downs of peer review, Adv. Physiol. Educ. 31 (2007) 14 5– 15 2.
 D. Rennie, Editorial peer review: its development and rationale, in: F.
Goodlee, T. Jefferson (Eds.), Peer Review in Health Sciences, BMJ Books, London, 1999, pp. 3–13.
 Sense About Science, Peer Review Survey 2009: The Full Report (London), (2009) .
 R. Spier, The history of the peer-review process, Trends Biotechnol. 20 (2002) 357–358.
 M. Ware, M. Monkman, Peer Review in Scholarly Journals: Perspective of the Scholarly Community—an International Study, Publishing Research Consor- tium, Bristol, 2008.
 H. Bastian, A stronger post-publication culture is needed for better science, PLoS Med. 11 (2014) e1001772.
 A. Eyre-Walker, N. Stoletzki, The assessment of science: the relative merits of post-publication review, the impact factor, and the number of citations, PLoS Biol. 11 (2013) e1001675.
 J. Hunter, Post-publication peer review: opening up scientiﬁc conversation, Front. Comput. Neurosci. 6 (2012) 63.
 P. Knoepﬂer, Reviewing post-publication peer review, Trends Genet. TIG 31 (2015) 221–223.
 SWGFAST, Standard for the Technical Review of Friction Ridge Examinations (Latent/Tenprint), (2012) .
 J.M. Butler, Advanced Topics in Forensic DNA Typing: Methodology, Academic Press Cambridge, Massachusetts, 2011.
 International Forensic Strategic Alliance, Minimum Requirements for Crime Scene Investigation. IFSA MRD 1 October 2014, (2014) .
 Federal Bureau of Investigation, Quality assurance standards for forensic DNA testing laboratories, Forensic Sci. Commun. 2 (2000) 3.
 National Association of Testing Authorities, Forensic Science ISO/IEC 17025 Application Document, (2014) .
 D.R. Ashbaugh, Ridgeology, J. Forensic Identif. 41 (1991) 16–64.
 L. Tierney, Analysis, Comparison, Evaluation and Veriﬁcation (ACE-V), in: M.
M. Houck (Ed.), Firearm and Toolmark Examination and Identiﬁcation, Elsevier, 2015, pp. 25–30.
 J.P. Black, Is there a need for 10 0 % veriﬁcation (review) of latent print examination conclusions? J. Forensic Identif. 62 (2012) 80–10 0.
 A. Mankevich, Blind veriﬁcation: Does it compromise the conformance of ACE-V methodology to the scientiﬁc method, Chesap. Exam. 45 (2007) 22– 29.
 M. Triplett, L. Cooney, The etiology of ACE-V and its proper use: an exploration of the relationship between ACE-V and the scientiﬁc method of hypothesis testing, J. Forensic Identif. 56 (2006) 345–355.
 P.E. Peterson, C.B. Dreyfuss, M.R. Gische, M. Hollars, M.A. Roberts, R.M. Ruth, H.M. Webster, G.L. Soltis, Latent prints: a perspective on the state of the science, Forensic Sci. Commun. 11 (2009) 4.
 International Organization for Standardization, ISO/IEC 17025 General Requirements for the Competence of Testing and Calibration Laboratories, (2015) .
 ILAC, ILAC-G19:08/2014 Modules in a Forensic Science Process, (2014) .
 ASCLD/LAB, Supplemental Requirements for the Accreditation of Forensic Science Testing Laboratories, (2011) .
 B.-C. Bjork, A. Roos, M. Lauri, Scientiﬁc journal publishing: yearly volume and open access availability, Inf. Res. 14 (2009) 391.
 T. Jefferson, M. Rudin, S. Brodney Folse, F. Davidoff, Editorial peer review for improving the quality of reports of biomedical studies, Cochrane Database Syst. (2007) 1–39 Rev. no. 2: MR000016.
 J.L. Worrall, Validating peer review in criminal justice evaluation research:
evidence from CrimeSolutions.gov, J. Crim. Justice Educ. 26 (2015) 507–529.
 E. Cobo, A. Selva-O’Callagham, J.-M. Ribera, F. Cardellach, R. Dominguez, M.
Vilardell, Statistical reviewers improve reporting in biomedical articles: a randomized trial, PLoS One 2 (2007) e332.
 D.P. Peters, S.J. Ceci, Peer-review practices of psychological journals: the fate of published articles, submitted again, Behav. Brain Sci. 5 (1982) 18 7–19 5.
 C.J. Lee, C.R. Sugimoto, G. Zhang, B. Cronin, Bias in peer review, J. Am. Soc. Inf.
Sci. Tech. 64 (2013) 2–17.
 K.I. Resch, E. Ernst, J. Garrow, A randomized controlled study of reviewer bias against an unconventional therapy, J. R. Soc. Med. 93 (2000) 16 4–16 7.
 S. Wessely, Peer review of grant applications: what do we know? Lancet 352 (1998) 301–305.
 M.J. Mahoney, Publication prejudices: an experimental study of conﬁrmatory bias in the peer review system, Cognit. Ther. Res. 1 (1977) 161–17 5.
 L. Bornmann, H.-D. Daniel, Reviewer and editor biases in journal peer review:
an investigation of manuscript refereeing at Angewandte Chemie Interna- tional Edition, Res. Eval. 18 (2009) 262–272.
 L. Bornmann, H.-D. Daniel, Do author-suggested reviewers rate submissions more favorably than editor-suggested reviewers? A study on atmospheric chemistry and physics, PLoS One 5 (2010) e13345. J.J. Earnshaw, J.R. Farndon, P.J. Guillou, C.D. Johnson, J.A. Murie, G.D. Murray, A comparison of reports from referees chosen by authors or journal editors in the peer review process, Ann. R. Coll. Surg. Engl. 82 (Suppl) (2000) 13 3–13 5.
 F.P. Rivara, P. Cummings, S. Ringold, A.B. Bergman, A. Joffe, D.A. Christakis, A comparison of reviewers selected by editors and reviewers suggested by authors, J. Pediatr. 151 (2007) 202–205.
 S. Schroter, L. Tite, A. Hutchings, N. Black, Differences in review quality and recommendations for publication between peer reviewers suggested by authors or by editors, JAMA 295 (2006) 314–317.
 E. Wager, E.C. Parkin, P.S. Tamber, Are reviewers suggested by authors as good as those chosen by editors? Results of a rater-blinded, retrospective study, BMC Med. 4 (2006) 1.
 F.C. Day, D.L. Schriger, C. Todd, R.L. Wears, The use of dedicated methodology and statistical reviewers for peer review: a content analysis of comments to authors made by methodology and regular reviewers, Ann. Emerg. Med. 40 (2002) 329–333.
 M.J. Gardner, J. Bond, An exploratory study of statistical assessment of papers published in the British Medical Journal, JAMA 26 (1990) 1355–13 57.
 J. Strayhorn Jr., J.F. McDermott Jr., P. Tanguay, An intervention to improve the reliability of manuscript reviews for the Journal of the American Academy of Child and Adolescent Psychiatry, Am. J. Psychiatr. 15 0 (1993) 947–952.
 N.R. Parsons, C.L. Price, R. Hiskens, J. Achten, M.L. Costa, An evaluation of the quality of statistical design and analysis of published medical research:
results from a systematic survey of general orthopaedic journals, BMC Med.
Res. Methodol. 12 (2012) 60.
 A. Margalida, M.À. Colomer, Improving the peer-review process and editorial quality: key errors escaping the review and editorial process in top scientiﬁc journals, PeerJ 4 (2016) e1670.
 W.G. Baxt, J.F. Waeckerle, J.A. Berlin, M.L. Callaham, Who reviews the reviewers? Feasibility of using a ﬁctitious manuscript to evaluate peer reviewer performance, Ann. Emerg. Med. 32 (1998) 310–317.
 F. Godlee, C.R. Gale, C.N. Martyn, Effect on the quality of peer review of blinding reviewers and asking them to sign their reports: a randomized controlled trial, JAMA 280 (1998) 237–240.
 D.V. Cicchetti, The reliability of peer review for manuscript and grant submissions: a cross-disciplinary investigation, Behav. Brain Sci. 14 (1991) 119–13 5.
 H.W. Marsh, U.W. Jayasinghe, N.W. Bond, Improving the peer-review process for grant applications reliability, validity, bias, and generalizability, Am.
Psychol. 63 (2008) 16 0–16 8.
 M.L. Callaham, J. Tercier, The relationship of previous training and experience of journal peer reviewers to subsequent review quality, PLoS Med. 4 (2007) e40.
 M.L. Callaham, R.L. Wears, J.F. Waeckerle, Effect of attendance at a training session on peer reviewer quality and performance, Ann. Emerg. Med. 32 (1998) 318–322.
 S. Schroter, N. Black, S. Evans, J. Carpenter, F. Godlee, R. Smith, Effects of training on quality of peer review: randomised controlled trial, BMJ 328 (2004) 673.
 D. Houry, S. Green, M. Callaham, Does mentoring new peer reviewers improve review quality? A randomized trial, BMC Med. Educ. 12 (2012) 83.
 M.L. Callaham, R.K. Knopp, E. Gallagher, Effect of written feedback by editors on quality of reviews: two randomized trials, JAMA 287 (2002) 2781–2783.
 J. Mnookin, S.A. Cole, I. Dror, B.A.J. Fisher, M. Houk, K. Inman, D.H. Kaye, J.J.
Koehler, G. Langenburg, D.M. Risinger, et al., The Need for a Research Culture in the Forensic Sciences, Social Science Research Network, Rochester, NY, 2011.
 Forensic Science Regulator, Codes of Practice and Conduct for Forensic Science Providers and Practitioners in the Criminal Justice System, (2014) .
 SWGFAST, Quality Assurance Guidelines for Latent Print Examiners, (2006) .
 K. Wertheim, G. Langenburg, A. Moenssens, Report of latent print examiner accuracy during comparison training exercises, J. Forensic Identif. 56 (2006) 55–127.
 G. Langenburg, A performance study of the ACE-V process: a pilot study to measure the accuracy, precision, reproducibility, repeatability, and bias- ability of conclusions resulting from the ACE-V process, J. Forensic Identif. 59 (2009) 219–257.
 G. Langenburg, C. Champod, P. Wertheim, Testing for potential contextual bias effects during the veriﬁcation stage of the ACE-V methodology when conducting ﬁngerprint comparisons, J. Forensic Sci. 54 (2009) 571–582.
 B.T. Ulery, R.A. Hicklin, J. Buscaglia, M.A. Roberts, Accuracy and reliability of forensic lLatent ﬁngerprint decisions, Proc. Natl. Acad. Sci. 10 8 (2011) 7733– 7738.
 B.P. Larson, K.C. Chung, A systematic review of peer review for scientiﬁc manuscripts, Hand 7 (2012) 37–44.
 I.E. Dror, D. Charlton, A.E. Péron, Contextual information renders experts vulnerable to making erroneous identiﬁcations, Forensic Sci. Int. 15 6 (2006) 74–78.
 S.M. Kassin, I.E. Dror, J. Kukucka, The forensic conﬁrmation bias Problems, perspectives, and proposed solutions, J. Appl. Res. Mem. Cogn. 2 (2013) 42– 52.
 R.S. Nickerson, Conﬁrmation bias: a ubiquitous phenomenon in many guises, Rev. Gen. Psychol. 2 (1998) 17 5.
 I.E. Dror, Cognitive neuroscience in forensic science: understanding and utilizing the human element, Phil. Trans. R. Soc. B 370 (2015) 20140255. K.N. Ballantyne et al. / Forensic Science International 277 (2017) 66–76 75  M. Welner, E.E. Davey, A. Bernstein, Peer-reviewed forensic consultation in practice: multidisciplinary oversight in common expertise, J. Forensic Sci. 59 (2014) 1254–1259.
 J. Surowiecki, The Wisdom of Crowds, Anchor, New York, 2005.
 A. Towler, D. White, R.I. Kemp, Evaluating the feature comparison strategy for forensic face identiﬁcation, J. Exp. Psychol. Appl. 23 (2017) 47–58.
 D. White, P.J. Phillips, C.A. Hahn, M. Hill, A.J. O’Toole, Perceptual expertise in forensic facial image comparison, Proc. R. Soc. B 282 (2015) 20151292.
 S.S. Arora, K. Cao, A.K. Jain, G. Michaud, Crowd powered latent ﬁngerprint identiﬁcation: fusing aﬁs with examiner markups, Proceedings of the 8th International Conference on Biometrics (ICB) (2015).
 A.G. Dyer, B. Found, D. Rogers, An insight into forensic document examiner expertise for discriminating between forged and disguised signatures, J.
Forensic Sci. 53 (2008) 11 5 4–115 9.
 M.B. Thompson, J.M. Tangen, The nature of expertise in ﬁngerprint matching:
experts can do a lot with a little, PLoS One 9 (2014) e114759.
 National Commission of Forensic Science, View of the Commission on Report and Case Record Contents., (2015) .
 G. Edmond, Legal and non-legal approaches to forensic science evidence, Int.
J. Evid. Proof 20 (2016) 3–28.
 Osland v R, 1998, HCA 75.
 R v Karger, 2001, SASC 64.
 Dallager, 2002, EWCA Crim 1903.
 Mallard v The Queen, 2003, WASCA 296.
 R v Parenzee, 2007, SASC 143.
 Otway v R, 2011, EWCA Crim 3. IR & TR v R, 2012, EWCA Crim 1288.
 Williams v R, 2012, EWCA Crim 2516.
 R v Opuku-Mensah, 2012, ONSC 7146.
 Xie v The Crown, 2014, EWCA Crim715.
 R v Natsis, 2014, ONSC 532.
 Tuite v The Queen, 2015, VSCA 148.
 United States v. Havvard, 260 F.3d 597 (7th Cir. 2001).
 State of Washington v. Piggott, 2014 WL 1286564 (Wash. App. Div. 1), 2 (2014).
 United States v. Stone, 848 F. Supp.2d 714, 717-18 (2012).
 B. Wynne, Establishing the rules of laws: constructing expert authority, in: R.
Smith, B. Wynne (Eds.), In Expert Evidence: Interpreting Science in the Law, Routledge, London, 1989, pp. 23–55.
 S. Jasanoff, What judges should know about the sociology of science, Jurimetrics 32 (1992) 345–359.
 G. Edmond, Science in court: negotiating the meaning of a scientiﬁc ‘experiment’ during a murder trial and some limits to legal deconstruction for the public understanding of law and science, Syd. Law Rev. 20 (1998) 361– 401.
 G. Edmond, The building blocks of forensic science and law: recent work on DNA proﬁling (and photo comparison), Soc. Stud. Sci. 41 (2011) 127–15 2.
 S.S. Siegelman, The genesis of modern science: contributions of scientiﬁc societies and scientiﬁc journals, Radiology 208 (1998) 9–16.
 L. Bornmann, ‘What is societal impact of research and how can it be assessed?
A literature survey’, J. Am. Soc. Inf. Sci. Technol. 64 (2) (2013) 217–233. 76 K.N. Ballantyne et al. / Forensic Science International 277 (2017) 66–76 Reproduced with permission of copyright owner.
Further reproduction prohibited without permission.