Mariana Lourenço, FreireMaria Clara de Oliveira, GonçalvesAllana Carolina, Marques da SilvaGláucia, CotaAna RabelloTália, Santana Machado de Assis
Timely and accurate diagnosis is one of the strategies for managing visceral leishmaniasis (VL). Given the specificities of this infection, which affects different vulnerable populations, the local assessment of the accuracy of the available diagnostic test is a requirement for the good use of resources. In Brazil, performance data are required for test registration with the National Regulatory Agency (ANVISA), but there are no minimum requirements established for performance evaluation. Here, we compared the accuracy reported in the manufacturer’s instructions of commercially available VL-diagnostic tests in Brazil, and the accuracies reported in the scientific literature which were obtained after test commercialization. The tests were identified via the electronic database of ANVISA, and their accuracy was obtained from the manufacturer’s instructions. A literature search for test accuracy was performed using two databases. A total of 28 VL diagnostic tests were identified through the ANVISA database. However, only 13 presented performance data in the manufacturer’s instructions, with five immunoenzymatic tests, three indirect immunofluorescence tests, one chemiluminescence test, and four rapid tests. For most tests, the manufacturers did not provide the relevant information, such as sample size, reference standards, and study site. The literature review identified accuracy data for only 61.5% of diagnostic tests registered in Brazil. These observations confirmed that there are significant flaws in the process of registering health technologies and highlighted one of the reasons for the insufficient control of policies, namely, the use of potentially inaccurate and inappropriate diagnostic tools for a given scenario.
Keywords:
Visceral leishmaniasis; Diagnostic tests; Performance; Accuracy; Validation
Visceral leishmaniasis (VL) is a neglected tropical disease caused by Leishmania, a protozoan parasite. The disease is considered a worldwide public health problem, with 12 to 65 thousand new cases reported each year between 1998 and 2021 in 80 endemic countries1. In Brazil, 38,634 cases of VL were reported in the last 10 years, with an annual average of 277 deaths each2. The infection has a high fatality rate (estimated at approximately 6% annually)2 and is often related to several socioeconomic indicators. Moreover, restricted access to early and accurate diagnosis and difficulties in clinical suspicions has added to the problem3. To manage VL and increase the efficiency of disease control, the availability of diagnostic tests that are simple to perform, accessible, inexpensive, sensitive, and specific is crucial4.
Parasitological confirmation remains the reference standard test for VL. However, the invasive nature of the procedure for obtaining a clinical specimen, the need for a specialized health professional, and only moderate sensitivity of the test are limiting factors. Serological tests, with their advantages of high accuracy, simplicity to use, and low cost, are now recognized as a cost-effective strategy for VL diagnosis, at least among patients who are not immunosuppressed. However, significant variations in test accuracy depending on the endemic region, the antigen used, and the age and immune status of the patient must be considered5,6. Therefore, studies that evaluate the accuracy of these tests in different clinical scenarios are essential before the new health technology can be implemented in clinical practice7,8.
Ideally, the routine use and development of a new diagnostic test should be supported by a sequence of properly planned studies. A proof-of-concept study should be followed by analytical and precision parameter assessments, followed by clinical performance validation in real-life scenarios, which must include patients with the suspected target condition and submitted to the index and the reference standard tests in parallel in a blinded design (Leeflang et al., 2019).
In Brazil, the registration of diagnostic tests is regulated by The National Health Surveillance Agency (Agência Nacional de Vigilância Sanitária, ANVISA). This requires an accuracy evaluation study before the commercialization of the test. However, the minimum parameters for validation studies are not established which leads to the commercialization of tests without a proper accuracy assessment9. Indeed, for VL, a potentially fatal disease if untreated, the use of a rapid diagnosis strategy is crucial. As such, a guideline with the necessary requirements for conducting validation studies to be adopted by the manufacturers must be developed. With concerns about the inaccuracy of the information provided by the manufacturers and, ultimately, of the commercially available tests themselves, we compared the accuracy of commercial VL tests reported by the manufacturers with the tests reported in the scientific literature.
VL diagnostic tests registered in Brazil were identified using the electronic platform of the Brazilian Health Regulatory Agency, ANVISA (https://www.gov.br/anvisa/pt-br), which provides free access to the registered product database. The search ended in June 2021 and was oriented towards registered diagnostic products for VL (identified by the following keywords: Leish, Leishmania, Leishmaniasis, Kala-azar, Kalazar, and visceral leishmaniasis). Once the registered tests were identified, information about accuracy was obtained from the manufacturer’s instructions available on the ANVISA website or requested directly from the manufacturer/distributor.
The accuracies of the registered tests were also recovered from the scientific literature on the American Health Library, Medline database (accessed via PubMed, https://www.ncbi.nlm.nih.gov/), and Google Scholar indexer (accessed via https://scholar.google.com.br/). The search strategy was based on the commercial names of the registered tests. Initially, the titles and abstracts of all recovered articles, up to July 2021, were independently read by two researchers. Studies reporting the sensitivity and/or specificity values of any of the registered tests were included. Studies using only non-human samples and/or those published in languages other than English, Portuguese, or Spanish were excluded. Papers were selected for full-text reading based on the inclusion and exclusion criteria. All discrepancies were resolved by consensus after discussion between the two researchers or by consideration of a third researcher if necessary. In this step, duplicates were removed manually. The full read of the selected studies was performed by the same two researchers to confirm their eligibility and extract data, or to exclude if exclusion criteria were identified at that time. All references cited in the included articles were assessed to identify other potential articles. In addition to the sensitivity and specificity values, information such as the type of biological sample used to perform the test (blood or serum), the sample size, the reference standard test, and the country where the study was conducted were extracted from scientific articles and from the manufacturer’s instructions.
The data were compiled in Microsoft Excel spreadsheets, and statistical analyses were performed using MedCalc Statistical Software (MedCalc Software Ltd., Ostend, Belgium)10. Accuracy, expressed by sensibility and specificity, presented in the instruction manual of each test, was compared to the performance reported in the literature using a comparison of proportions (chi-squared test). Statistical significance was set at p < 0.05 significance level11,12.
A total of 28 records referring to 26 tests registered for VL were identified in the ANVISA database: nine rapid diagnostic tests (RDT), five indirect immunofluorescent reactions (IIF), nine immunoenzymatic tests (ELISA), one chemiluminescence test (CH) and two tests with unidentified methodologies (Table 1). Ten out of the total (38.5%) were manufactured in Brazil, and the other 16 were produced in Germany (5), Spain (4), the USA (3), Australia (2), China (1), and France (1).
For the 14 tests, an instruction manual was obtained: only one was available on the ANVISA website, and the others were recovered through direct contact with the manufacturer/distributor. After individual analysis of the obtained instruction manuals, the IFI Human Leishmaniasis test (Fundação Oswaldo Cruz – Biomanguinhos) was excluded because of the absence of data regarding the performance study in the instruction manual. Therefore, a total of 13 VL diagnosis tests fulfilled the criteria and were included, with five ELISAs, three IIF, one chemiluminescence test, and four RDT (Figure 1).
Regarding the 13 diagnostic tests included in this review, only eight (61.5%) had validation analyses available in the scientific literature. RDTs, especially IT LEISH and Kalazar Detect (Table 2), were the most evaluated in validation studies worldwide. For the IT LEISH, the validation study reported in the manufacturer’s instruction manual was performed in India, and 99% and 100% sensitivity and specificity were reported, respectively. Scientific studies carried out in this same region also reported a sensitivity of 96.2 to 100%. On the other hand, in African countries, such as Sudan, Ethiopia, and Kenya, lower sensitivity rates have been reported, between 83.8 and 96.8%. In Brazil, the reported sensitivity rates ranged from 93.3 to 100%. The accuracy of IT LEISH seems to be independent of the biological sample used, if serum or blood samples were used. For Kalazar Detect, a sensitivity rate of 89.8% was observed in a validation study performed in Brazil. No statistical differences were detected in relation to other validation studies carried out in American regions, with sensitivities ranging from 85.5 to 90%, except for the study conducted by Moura et al. (2013)13 in Brazil, in which the sensitivity was 72.4%. However, it is important to note that in this specific study, several reference tests were used, such as direct test or culture and/or IFA and/or test therapy (presumption of diagnosis based on the response after the instruction of specific therapy). Regarding Onsite Leishmania IgG/IgM Combo, only four studies performed in different countries were retrieved. For this test, two performances are informed in the manufacturer’s instructions based on two different reference test criteria: IgG or IgM positivity in another serology.
Among the four IIF tests included in this study, only one study assessed the accuracy of Leishmania IFA IgG (Table 3). Overall, the manufacturer’s instructions lacked relevant information regarding how validation studies were conducted. In some cases, as observed for Leishmania IFA IgG and Leishmania VIRCLIA IgG+IgM MONOTEST, data about the population, such as the sample size and the country from which the samples come, are missing. For other tests, such as IF Leishmania donovani IgG and IgM, there is no information about the reference standard tests. Similarly, some ELISAs, such as NovaLisa Leishmania infantum IgG and RIDASCREEN Leishmania Ab, lack information about the reference standard and population included (Table 4). For the other ELISAs, Leishmania ELISA IgG + IgM, SERION ELISA classic Leishmania IgG, and Biolisa Leishmaniose Visceral, there was no information about the country where the manufacturing study was conducted. These limitations hamper critical evaluation by comparing the manufacturer and literature accuracy.
The accuracy of serological tests for VL is determined by factors related to the patients, such as their immune status and age, disease severity, and other factors, such as the Leishmania species involved, the test technique, and antigens used as targets. In addition, the adopted reference standard test5,8,37 and other methodological aspects of the validation study may also influence the accuracy estimation. There are many requirements for producing reliable estimates of test accuracy. Indeed, the process of validation and registration with regulatory agencies must be carefully evaluated. Comparisons between the accuracy reported by the manufacturer and those observed in clinical studies are essential to confirm the diagnostic accuracy under real conditions in the field, identify technologies with accuracies lower than expected prior to incorporation in clinical practice, and reduce diagnostic inaccuracy and public health risks.
In Brazil, manufacturers must follow a specific resolution before submitting a registration request to ANVISA. Among the requirements are the presentations of the analytical and clinical accuracy data, included in a technical dossier and in the test leaflet9. These studies should provide accurate information such as sensitivity, specificity, accuracy, and diagnostic precision. However, the minimum criteria defining methodological requirements, such as sample size, sample characterization, and reference test, have not been established, allowing the registration of poorly evaluated tests. In addition, several methodological information regarding the validation study were missing, such as the reference standard, and the number of included and excluded cases were not included in the manufacturer’s instructions, hampering the correct interpretation of the results. Overall, the sensitivity and specificity rates reported by the manufacturer were obtained from analytical validation studies based on uncalculated samples composed of selected cases and controls, which do not represent the clinical diversity (clinical spectrum) of real scenarios, tending to overestimate performance.
The validation of a test should qualify for use in clinical decision-making. After analytical validation, true characterization of the performance of the test regarding its intended use (clinical validation) should be carried out following the Standards for Reporting Diagnostic Accuracy Studies (STARD)56. Analytical validity is the test’s ability to measure the status of a sample accurately and reliably in the laboratory, and it includes three different phases of test development: pre-analytical, analytical, and post-analytical phase57. Clinical validation should demonstrate how robust and reliable the test results correlate with the clinical outcomes of interest. In addition to clinical validity, which implies the appropriate distinction of cases and not cases, new perspectives have been raised as equally important in evaluating the usefulness of a test: the concept of the fit-for-purpose. This concept ensures that the test performs robustly according to predefined epidemiological and clinical parameters and facilitates the establishment of definitive acceptance criteria for clinical use (validation of clinical utility)58.
The difference in accuracy among regions has been widely verified for VL, generally associated with the diversity of parasite species and/or title antibodies, which has been related to different genetic factors, age patterns, immune response, and nutritional status of patients5,32. Mainly for IT LEISH and Kalazar Detect, the highest rates of sensitivity and specificity were observed for studies conducted in India when compared to other endemic regions, like Brazil and East Africa. This finding confirms the importance and necessity of local validation studies prior to the commercially available VL-test, preventing them from being used in clinical decision-making.
It is important to highlight the limitations of studies evaluating IFIs registered in Brazil, especially considering that this technique has been available and recommended for VL diagnosis for a long time by the Brazilian Visceral Leishmaniasis Surveillance and Control Program of the Ministry of Health (MS)59. Although some studies describing the accuracy of this IFI are available8,22, a comparative analysis of the sensitivity and specificity rates described by the manufacturer was not possible because of the unavailability of these parameters in the manufacturer’s instructions. ELISAs are generally used in private laboratories in Brazil, with few local validation studies corroborating their use.
Regardless of region, estimates of sensitivity and specificity may often vary between studies due to differences in the study population as a result of demographic or other covariate factors, such as disease stage and the presence of comorbidities. Thus, there were two main sources of bias related to the population evaluated: selection and confounding bias61. More importantly, the diagnostic test performance may vary with the prevalence of the disease in the evaluated population. Based on mathematical definitions, sensitivity and specificity do not depend on disease prevalence; however, this is an outdated paradigm60. The influence of prevalence can occur due to intervenient features, such as patient spectrum, referral filter, reader expectation, and artifactual mechanisms, which include distorted inclusion of participants, verification bias, and reference standard misclassification or misuse.
In fact, the selection of reference standards is a crucial but challenging element that influences test performance. Generally, the gold standard test is nonexistent, and consequently, the sensitivity and specificity rates can be over-or underestimated according to the frequency of misclassifications made by the reference standard and the degree of correlation of errors between the index test and reference standard61. For VL, a parasitological test is generally used because of its high specificity. However, the variable and usually lower sensitivity can affect the accuracy of the index test. The use of an index and reference test of the same methodology, such as immunological methods, presents a tendency to have concordant errors, and in this way, may act by overestimating the accuracy of the evaluated test. To minimize the impact of this error, because a gold standard is not available, it is possible to consider the results of multiple imperfect tests using latent class analysis, as reported by Boelaert et al. (2008)37 and Machado de Assis et al. (2012)23.
In general, the commercialization of VL diagnostic tests supported by less rigorous validation studies may lead to the availability of poorly performing tests, with serious implications for the diagnosis and prognosis of patients. For VL, this fact causes concerns because false-negative results may delay the treatment of the disease, which may lead to fatality if left untreated. Conversely, false-positive results are also of great concern because of the high toxicity of the available treatments. Importantly, economic losses to the public health systems and to patients may result due to a lack of accuracy60,61.
The limited information provided by manufacturers regarding the accuracy studies conducted prior commercialization of the tests in Brazil was the major lacuna observed in this review. It is important to highlight that it was not one of our goals to summarize the “correct” sensitivity-specificity of the tests, but rather to verify how different these measures can be. Therefore, we did not perform a systematic review. Instead, we conducted an extensive and careful search using various scientific databases and the reference list of each included article. The results of our data analyses revealed how the accuracy reported by the manufacturers differed from local studies, and how it is necessary to perform a validation study before the use of a VL test in clinical practice. Given the importance of a diagnosis for correct treatment, the establishment of a guideline with minimum criteria for test registration by all regulatory agencies is encouraged. This practice can also be useful for test developers. Indeed, the obligation for local studies with sample calculations supported by the number of participants and the selection of a robust reference standard test may be the preferred way of selecting VL tests with higher accuracy in each endemic area.