Since 1997, several meta-analyses (MAs) of placebo-controlled randomised efficacy trials of homoeopathy for any indication (PRETHAIs) have been published with different methods, results and conclusions. To date, a formal assessment of these MAs has not been performed. The main objective of this systematic review of MAs of PRETHAIs was to evaluate the efficacy of homoeopathic treatment.
The inclusion criteria were as follows: MAs of PRETHAIs in humans; all ages, countries, settings, publication languages; and MAs published from 1 Jan. 1990 to 30 Apr. 2023. The exclusion criteria were as follows: systematic reviews without MAs; MAs restricted to age or gender groups, specific indications, or specific homoeopathic treatments; and MAs that did not assess efficacy. We searched 8 electronic databases up to 14 Dec. 2020, with an update search in 6 databases up to 30 April 2023.
The primary outcome was the effect estimate for all included trials in each MA and after restricting the sample to trials with high methodological quality, according to predefined criteria. The risk of bias for each MA was assessed by the ROBIS (Risk Of Bias In Systematic reviews) tool. The quality of evidence was assessed by the GRADE framework. Statistical analyses were performed to determine the proportion of MAs showing a significant positive effect of homoeopathy vs. no significant difference.
Six MAs were included, covering individualised homoeopathy (I-HOM, n = 2), nonindividualised homoeopathy (NI-HOM, n = 1) and all homoeopathy types (ALL-HOM = I-HOM + NI-HOM, n = 3). The MAs comprised between 16 and 110 trials, and the included trials were published from 1943–2014. The median trial sample size ranged from 45 to 97 patients. The risk of bias (low/unclear/high) was rated as low for three MAs and high for three MAs.
Effect estimates for all trials in each MA showed a significant positive effect of homoeopathy compared to placebo (5 of 5 MAs, no data in 1 MA). Sensitivity analyses with sample restriction to high-quality trials were available from 4 MAs; the effect remained significant in 3 of the MAs (2 MAs assessed ALL-HOM, 1 MA assessed I-HOM) and was no longer significant in 1 MA (which assessed NI-HOM).
The quality of evidence for positive effects of homoeopathy beyond placebo (high/moderate/low/very low) was high for I-HOM and moderate for ALL-HOM and NI-HOM. There was no support for the alternative hypothesis of no outcome difference between homoeopathy and placebo.
The available MAs of PRETHAIs reveal significant positive effects of homoeopathy beyond placebo. This is in accordance with laboratory experiments showing partially replicable effects of homoeopathically potentised preparations in physico-chemical, in vitro, plant-based and animal-based test systems.
PROSPERO CRD42020209661. The protocol for this SR was finalised and submitted on 25 Nov. 2020 and registered on 26 Dec. 2020.
Homoeopathy is a therapy system widely used in Europe, India and other countries [1]. Core features of homoeopathy include drug provings (observation of symptoms occurring in healthy persons exposed to substances of mineral, botanical or zoological origin), simile principle (similarity between symptom patterns in drug provings and the symptoms to be treated with the same substance) and potentization (successive dilution of the homoeopathic substance, with each dilution step involving repeated shaking of liquids or grinding of solids into lactose) [2].
The clinical effects of homoeopathic treatment have been investigated in several hundred randomised controlled trials [3] and in systematic reviews (SRs). Among the SRs, two contrasting approaches can be discerned.
One approach is to focus on a specific indication (e.g., depression [4], acute respiratory tract infections in children [5]) while often including open-label trials and observational studies. In this approach, data synthesis is grouped by design, thus yielding information about homoeopathy in patient care.
The opposite approach is to include all indications while restricting study designs to placebo-controlled trials and aggregating results in an MAs, thus yielding information about the specific effects of homoeopathy beyond those of placebo. A major reason for using this approach has been the claim that ‘homoeopathy violates natural laws and thus any effect must be a placebo effect’ [6].
Since 1997, at least six MAs of placebo-controlled homoeopathy trials for any condition have been published [6,7,8,9,10,11]. These MAs have differed in their methods for trial inclusion, data synthesis and assessment of risk of bias; furthermore, their results and conclusions have been inconsistent. During this period, there have been substantial advancements in methodology and quality standards for MAs and other SRs [12,13,14,15], including SRs of SRs (also called overviews or umbrella reviews) [16,17,18]. To our knowledge, a formal SR of MAs of randomised placebo-controlled homoeopathy trials for any condition has not been performed. Herein, we report such an SR.
The eligibility criteria are presented in Table 1.
By 30 April 2023, a period of 30 months had passed after the end of the report time frame according to the original eligibility criteria (reports published up to 31 Oct. 2020). We therefore conducted an updated search of reports published in the period from 01 Nov. 2020 to 30 April 2023. We searched databases A–C, E, G–H (Table 2; D was no longer available, and F was omitted for budget reasons, having yielded no nonduplicate records in the primary search) and the database of reviewer HJH. The updated search yielded 13 records, of which 11 were excluded and 2 were assessed for eligibility. Of these, 1 report had already been included on 04 July 2022 (Gartlehner 2022 cf. Section 'Additional data: Gartlehner 2022'), and 1 was excluded (PRISMA 2020 flow diagram for the update in Additional file 4).
A list of the 14 excluded publications (original search: n = 13, update n = 1) with reasons for exclusions is presented in Suppl. Table 2.
The 16 reports consisted of 6 primary publications of one [6,7,8, 10, 11] or two [9] MAs, 2 published MA protocols [28, 29], 7 publications of additional analyses [3, 30,31,32,33,34] and 1 error correction [35] (Table 3).
Table 3 Overview of included meta-analyses and publicationsThe six MAs were published in the period 1997–2017. The two first (Linde 1997 [6] and 1998 [7]) and the two most recent (Mathie 2014 [10] and 2017 [11]) MAs were MA ‘pairs’, i.e. they were conducted and published by the same first author with overlapping co-authorships. The other two MAs (Cucherat 2000 [8], Shang 2005 [9]) were published by different author groups.
The MA conducted by Linde (1997) [6] was the first MA of placebo-controlled homoeopathy trials for any condition worldwide. The primary publication was followed by a detailed assessment of the relation between study quality (risk of bias) and effect estimates (Linde 1999) [30]. The MA conducted by Linde (1998) [7] was an updated subgroup analysis of Linde (1997) [6], restricted to I-HOM.
The MA conducted by Cucherat (2000) [8] originated from a homoeopathy report prepared for the European Parliament by the Homoeopathic Medicine Research Group (Boissel 1996) [31]. Compared to the Boissel report, the MA conducted by Cucherat [8] had modifications in some analyses. We considered this MA the definitive work, but we also consulted the Boissel report as an additional source of details on the methods and conduct of the MA.
The MA conducted by Shang [9] was designed as a prospective comparison of two MAs of placebo-controlled trials: one MA of any type of homoeopathic treatment for any disorder and one MA with matched trials on conventional treatment. According to the protocol for the present SR [37], the results of the latter MA were beyond the scope of this SR. However, the authors of the MA conducted by Shang [9] used the results of the MA on conventional treatment to draw inferences about the homoeopathy MA results. We therefore included comparative data on the two MAs (presented in Additional file 2).
The MAs conducted by Mathie (2014, 2017) [10, 11] were part of a comprehensive MA program (Mathie 2013) [3], covering placebo-controlled trials of individualised [10] and nonindividualised [11] homoeopathy, respectively.
The main research objective concerned the efficacy of homoeopathic products vs. placebo in all six MAs: generally stated [7, 8] or in terms of outcome difference between homoeopathy and placebo [6, 10, 11] (full text excerpts in Suppl. Table 3). In the MA conducted by Shang [9], the research hypothesis was further specified: ‘We assumed that the effects observed in placebo-controlled trials of homoeopathy could be explained by a combination of methodological deficiencies and biased reporting’ (Discussion, p.730).
In all six MAs, parallel group randomised trials were included, while crossover trials were excluded from four MAs [6, 9,10,11], included in the MA conducted by Linde (1998) [7] and not mentioned in the MA conducted by Cucherat [8]. Four MAs had no restrictions regarding publication format, while two (Mathie 2014 and 2017) [10, 11] were restricted to peer-reviewed journal articles of at least 500 words (Suppl. Table 4).
Restriction to disease groups as such was not applied in any MA (Suppl. Table 5). Notably, in the MA conducted by Shang [9], the homoeopathy trials were compared to placebo-controlled trials of interventions used in conventional medicine, matched for indication. For 94.0% (n = 110/117) of otherwise eligible homoeopathy trials, a trial of conventional medicine for the respective indication could be found, while 7 unmatchable homoeopathy trials were excluded.
In the MAs conducted by Mathie (2014 and 2017) [10, 11], the homoeopathic intervention types were restricted as follows: radionically prepared medicines, anthroposophic medicine, homotoxicology, and homoeopathy combined with other (complementary or conventional) treatments were excluded (Suppl. Table 6).
In the meta-analysis conducted by Cucherat [8], ‘only trials with a clearly defined primary outcome’ were included (Suppl. Table 7).
For all six MAs, previously published MAs or SRs [38] were consulted. Between 4 [6] and 19 [9] online databases were researched. For all MAs, experts in the field were contacted for information on additional trials; manual searches of reference lists were used in five MAs but not in the MA conducted by Linde (1998) [7], which was largely an update on their previous MA from 1997 (Suppl. Table 8). Screening of titles and abstracts was performed independently by two reviewers in the MA conducted by Linde (1997) [6] and by one reviewer in the MA conducted by Cucherat [8]. The screening approach was not reported in the four other MAs. Full-text assessments were performed independently by two persons in the MA conducted by Linde (1997) [6]; by one person and checked in part by another person in the MA conducted by Cucherat [8]; and by one person in the MA conducted by Linde (1998) [7]. The full text assessment approach was not reported in three MAs.
Data extraction was performed independently by two persons in five MAs and by one person in the MA conducted by Linde (1998 [7]). Risk of bias assessments were performed independently by two persons in three MAs [6, 10, 11] and by one person in the MA conducted by Linde (1998 [7]). The number of persons performing risk of bias assessment was not reported in two MAs. Lists of excluded trials were available in three MAs [9,10,11]. The reasons for exclusion of trials were provided in all MAs except the one conducted by Linde (1998) [7] (Table 4).
Table 4 Quality of trial data handlingAll six MAs used one main clinical outcome for each trial or trial comparison. For the MA conducted by Cucherat [8], this was the primary outcome as reported in the trials (cf. Section 'Eligibility criteria', above); for the other MAs, a predefined hierarchical list of criteria for extraction of the main outcome was used (Suppl. Table 9).
For two MAs (Mathie 2014 and 2017) [10, 11], a prepublished protocol was available; for two MAs (Linde 1997. Cucherat [6, 8]), a protocol was referred to in the publication; and for two MAs (Linde 1998, Shang 2005 [7, 9]), a protocol was not mentioned in the publication, while one single design criterion (outcome extraction in both cases) was explicitly stated as predefined.
High-quality trials according to our criteria (cf. Section 'Data items' / 'Primary outcome', above) were performed in four MAs [6, 9,10,11]. The criteria for high-quality trials were described as predefined (Linde 1997) [6] or fully (Mathie 2017) [11] or partially (Mathie 2014) [10] defined in a prepublished protocol. One MA did not mention this aspect (Shang [9]). The criteria for high-quality trials were as follows:
The MA conducted by Linde (1997) [6] used a combination of two score-based instruments:
The instruments used in the following MAs consisted of sets of mandatory criteria, all of which were to be fulfilled.
The MAs conducted by Mathie (2014 and 2017) [10, 11] used the Cochrane risk-of-bias tool (RoB, version 2011) [40]: low risk of bias for items 1–2 and 4–5 in Table 5, low risk for two of the three items 8 and 12–13 and low or uncertain risk for one of the latter four items.
In the MA conducted by Shang [9], the number of quality components used was variously described as 3 or 4, corresponding to fulfilment of items (1–3) or (1–3 + 10) in Table 5. Lüdtke [32] interpreted Shang [9] as having used 3 components (Suppl. Table 29). Details in support of either 3 or 4 components are presented in Suppl. Table 11.
The high-quality criteria were based on 8 [6], 7 [10, 11] and either 3 or 4 quality components [9] (Table 5).
The total number of methodological quality components assessed in each MA (including components of high-quality criteria as well as other components) ranged from 3 [8] to 10 [6, 7], details in Suppl. Table 12.
Associations between quality components and outcome were analysed with hypothesis testing in four MAs (not in the MA conducted by Linde (1998) [7] and Cucherat [8]).
Cumulative MA with stepwise removal of trials according to increasing quality categories was performed in four MAs using interval-scaled [7, 10, 11] or rank-ordered [8] categories. Of the two other MAs, one [7] had outcome analysis in 4 ranked quality subgroups instead of cumulative MA.
Statistical heterogeneity testing was performed in four MAs (not in the MAs conducted by Linde (1998) [7] and Cucherat [8]); all but one MA [7] included an assessment of publication bias/small study bias (Suppl. Table 14).
Potential conflicts of interest were stated and explained for at least one author in two MAs (Mathie 2014 and 2017) [10, 11]; a statement of no conflicts of interest for any author was included in one MA (Shang) [9], while this issue was not addressed in the three other MAs.
For each MA, between 150 and 359 full-text records were assessed for eligibility (data available for four MAs) and between 16 and 119 trials were eligible for SR, including 16–110 trials with extractable data for MA. Altogether, 182 different trials (or in some cases, trial comparisons) reported in 165 different publications or other trial reports were included in the 6 MAs. Of these, n = 88 trials were included in 1 MA, 65 trials in 2 MA, 24 trials in 3 MA and 5 trials in 4 MA, with a total of 310 trials or trial comparisons (Suppl. Table 15). All following descriptions refer to these 310 trials.
Summary descriptive data on 12 different trial properties (excluding design, trial quality and results) were presented, ranging from 3 [8] to 9 [7] items per MA (Suppl. Table 16).
All six MAs had at least one table with characteristics of individual trials. A total of 38 different items were presented (or summarily stated as present/absent in all trials), ranging from 8 (Shang [9]) to 33 items (Mathie 2017 [11]) per MA (Suppl. Table 17). The most frequently reported items were as follows:
The trials were published in the period 1943–2014 (Table 6). The median trial sample size per trial was in the range of 45–97 patients with a minimum sample size of 5–28 and a maximum size of 175–1573 patients. The trials of each MA had been performed in 11–15 countries (data available for four MAs). The countries where each trial was performed was reported in three MAs [7, 10, 11]; the most common countries were the UK (n = 18 trials among the three MAs, multiple responses possible), Germany (n = 17), USA (n = 9) and France and India (both with n = 6 trials) (Suppl. Table 18). The most common languages of trial publications were English (range 39–95% of trials), German (5–29%) and French (0–28%) (Table 6).
Table 6 Literature searches, characteristics of trials with extractable data for meta-analysisData on age groups and gender were available in three MAs [7, 10, 11] with a total of 94 trials (multiple responses possible). A total of 14.9% (n = 14/94) of all trials included children only, 55.3% (n = 52) included adults only and 29.8% (n = 28) included both adults and children or unknown. A total of 14.9% (n = 14/94) of trials included only females; 2.1% (n = 2) of trials included only males; and 83.0% (n = 78) of trials included both genders or did not report these data (data on individual MAs in Suppl. Table 19).
Indications for all 310 trials (multiple responses possible) were coded according to ICD-10:
The intervention was I-HOM in all trials for 2 MAs [7, 10] and in 0–18% of trials of the four other MAs. In these four MAs, the NI-HOM intervention was clinical homoeopathy in 44–71% of trials, complex homoeopathy in 6–44% (Mathie 2017 [11]: including ‘combination products’) and isopathy in 6–13% (Table 7). The homoeopathic products used were high potencies only (≥ C12 or ≥ D24) in 29–39% of trials.
Table 7 Interventions, metric of main outcome, trial results aThe main outcome was binary in 43–89% of trials. The main outcome analysis showed a significant positive effect of homoeopathy compared to placebo in 14–65% (weighted mean 36.5% (n = 113 of 310 trials), a nonsignificant superiority of homoeopathy in 18–55% (weighted mean 44.2%), a nonsignificant superiority of placebo in 16–32% (mean 19.0%) and a significant positive effect of placebo compared to homoeopathy in 0–1% (0.3%, n = 1 trial) (Table 7).
Risk of bias (methodological quality) of trials
For 10 different methodological quality components, the number of trials fulfilling the respective criterion was assessed in at least two MAs, with a total of 43 analyses (Table 8, components 1–10). Fulfilment rates ranged from 17% (allocation concealment adequate in the MAs conducted by Mathie (2017) [11]) to 100% (8 cases); 44% (n = 19/43) of analyses showed a fulfilment rate of ≥ 50%. Weighted mean fulfilment rates for each of the 10 components (multiple responses possible, as trials could be included in more than one MA) ranged from 20% (no funding-related vested interests in the MAs conducted by Mathie (2014) [10] and (2017) [11]) to 89% (publication format = journal article in all six MAs). Three components (journal article, double blinding adequate, no selective outcome reporting) had weighted average fulfilment rates above 75%.
Table 8 Risk of bias (methodological quality) of trials: criteria used in ≥ 2 meta-analysesIn the MA conducted by Linde (1997) [6], 23.6% (n = 21/89) of trials had a predefined primary outcome (effect estimate after sample restriction to these trials reported in Suppl. Table 28). In the MA conducted by Cucherat [8], only trials with one single ‘clearly defined’ primary outcome were eligible.
In the MAs conducted by Mathie (2014 and 2017) [10, 11], the risk of outcome reporting bias was assessed in Domain V of the Cochrane RoB tool by comparison of the results section with the protocol or, if no protocol was available, with the methods section of publications. In the MA conducted by Mathie (2014) [10], freedom from risk of outcome reporting bias was rated as ‘yes’ in 86.4% (n = 19/22) of trials in the MA, ‘uncertain’ in 4.5% (n = 1) and ‘no’ in 9.1% (n = 2). In the MA conducted by Mathie (2017) [11], the corresponding ratings were ‘yes’ in 74.1% (n = 40/54) of the trials in the MA, ‘uncertain’ in 9.3% (n = 5) and ‘no’ in 16.7% (n = 9) (Table 8, component no. 5). Effect estimates for the 19 and 40 ‘yes’-rated trials, respectively, were not published.
The proportion of high-quality trials ranged from 6% (n = 3/54) of trials analysed by Mathie (2017) [11] to 29% (n = 26/89) of trials analysed by Linde (1997) [6] (Table 8). Notably, the criteria for ‘high quality’ differed widely among the MAs:
For the three MAs using a set of mandatory criteria for ‘high-quality’ (Shang with 3 or 4 criteria; Mathie (2014) [10] and (2017) [11] with 7 criteria each), methodological quality was compared with the quality of other trials, assessed according to identical criteria:
Table 9 Risk of bias of trials of systematic reviews, evaluated with the Cochrane RoB tool (2011), domains I, II, IIIa, IIIb, IV and V
Significant statistical heterogeneity across trials was found in 3 MAs [6, 9, 11, 30] and was not found in 1 MA (Mathie 2014) [10], while heterogeneity was not assessed in 2 MAs [7, 8] (Suppl. Table 23). Notably, in the MA conducted by Cucherat [8], the likelihood of statistical heterogeneity because of clinical heterogeneity was stated as a major reason for choosing p value combination instead of meta-analytic effect estimation.
In the MA conducted by Linde (1997/1999) [6, 30], heterogeneity was τ-squared 0.43 in the full sample (n = 89 trials). After sample restriction to trials with higher methodological quality, heterogeneity was reduced in 6 of 7 univariate analyses, with τ-squared ranging from 0.31 for double-blind trials (n = 81) to 0.41 for explicitly randomised trials (n = 64). In one multivariate analysis, heterogeneity was reduced to τ-squared = 0.28 for explicitly randomised trials (Suppl. Table 23).
In the MA conducted by Mathie (2017) [11], heterogeneity (I-squared 65%) was not reduced after the ‘trim-and-fill’ procedure for funnel plot asymmetry (FPA, I-squared 79%).
Extensive searches for potentially eligible trials were performed for five MAs (not Linde 1998) [7], and unpublished trials were eligible for three MAs [6, 8, 9] but not for the two MAs conducted by Mathie [10, 11].
Data on unavailable trials were reported for three MAs:
Mathie (2013) [3] identified the following:
Funnel plot inspection was performed in four MAs. Funnel plots were constructed by plotting the effect estimate for each trial—expressed as the log odds ratio [6, 9, 10] or standardised mean difference (Mathie 2017 [11])—against the standard error. In three MAs [6, 9, 11], FPA was found, with trials with higher standard error having larger effects. In one MA (Mathie 2014 [10]), the funnel plot was symmetric. Egger’s test was significant in the first three MAs but not in the MA conducted by Mathie (2014) [10] (Suppl. Table 25).
Trim-and-fill tests were performed in three MAs [6, 8, 11]. Random effects and nonparametric selection models to assess possible missing trials were used in the MA conducted by Linde (1997) [6]. Under different conditions, the number of fictive additional trials with zero effect required to change results from a significant to a nonsignificant superiority of homoeopathy ranged from 11 (Mathie (2017) [11]) to 4511 (Linde (1997) [6], fixed effects model) (Suppl. Table 26).
Sterne (2001) [36] constructed a funnel plot of n = 34 trials with ‘adequate concealment’ + ‘double-blinding’ from the MA conducted by Linde (1997) [6] (not the n = 26 high-quality trials according to Linde (1997) [6]). On inspection, FPA was found, and the corresponding tests were significant (rank correlation: p = 0.014; regression: p < 0.001).
Lüdtke (2008) [32] constructed a funnel plot of the 21 high-quality trials analysed by Shang [9] by plotting the log odds ratio against the standard error. The plot showed a cluster of 18 largely symmetric trials and 3 extreme outliers, with 2 strongly favouring homoeopathy and 1 strongly favouring placebo. Egger’s test showed a large but not significant FPA (asymmetry coefficient 0.40, p = 0.17); this was also the case for the 8 largest high-quality trials (1.15, p = 0.94, funnel plot not shown) [32] (Suppl. Table 25).
Associations between methodological quality or other subgroups and effect estimates were analysed in 4 MAs (Linde 1997 [6], Shang [9], Mathie 2014 [10] and 2017 [11], Suppl. Table 27).
Linde (1997 [6] and 1999 [30]): The authors analysed uni- and multivariate associations between four single quality components and the effect estimate and found significant associations for ‘double blinding’ (uni- and multivariate) and ‘explicitly randomised’ (multivariate) but not for ‘adequate concealment of random allocation’ nor ‘complete follow-up’ (neither uni- nor multivariate). Univariate analyses showed significant associations between three composite quality measures (A: Jadad scale > 2; B: Internal validity score > 4.5; C: A and B) and effect estimate. On the other hand, scatter plots of the Jadad scale and internal validity score against odds ratios showed no clear linear relationships (Suppl. Table 27).
Linde (1997) [6] / Sterne [36]: The authors analysed uni- and multivariate associations between ‘English language publication’ and ‘Medline-indexed publication’, respectively, and effect estimates: two of four analyses showed significant associations (‘English language’, univariate + ‘Medline-indexed’, multivariate Suppl. Table 27).
Shang [9] analysed univariate associations between six single quality components and effect estimates, and significant associations were found for three (‘Medline-indexed’, ‘double-blinding’, ‘adequate generation of allocation sequence’). Likewise, a significant association was found for high-quality trials (Suppl. Table 27). In multivariate analyses, as summarised by the authors ‘the standard error of the log odds ratio (asymmetry coefficient) was the dominant variable. Coefficients of other variables, including study quality, were attenuated and became non-significant’ (Shang [9], pp.929-930).
The MAs conducted by Mathie (2014 [10] and 2017 [11]) revealed no significant associations between ‘publication free of vested interest’ and effect estimates (both MAs, Suppl. Table 27).
According to our ROBIS [13] assessments, the risk of bias was low in three MAs (Linde 1997, Mathie 2014 & 2017 [6, 10, 11]) and high in three MAs (Linde 1998, Cucherat, Shang [7,8,9]) (Table 10). ROBIS assessments of each MA with our comments on individual items are presented in Additional file 1.
Table 10 Risk of bias of meta-analyses: ROBIS assessments of individual items, domains and overall risk
AMSTAR [14] items 7 (list of excluded studies), 10 (funding sources for included studies) and 16 (conflict of interest of review authors) received the poorest ratings possible (0) for the first three MAs (Linde 1997 & 1998, Cucherat [6,7,8]) and the best ratings possible (1 or 2) in the most recent MAs (Mathie 2014 [10] and 2017 [11]). The MA conducted by Shang [9] had two ‘0’ ratings and one ‘1’ (0–2 possible) (Table 11).
Table 11 Risk of bias of meta-analyses: AMSTAR items 7, 10, 16Effect estimates—or for the MA conducted by Cucherat [8]: combined p values—for all trials with extractable data were reported in five MAs (not from Shang [9]). All analyses showed a significant positive effect of homoeopathy compared to placebo (Table 12).
Effect estimates for high-quality trials Data items / Primary outcome were available for four MAs (not for the MAs conducted by Linde (1998) [7] and Cucherat [8]). Three MAs (Linde 1997, Shang/Lüdtke, Mathie 2014 [6, 9, 10, 32]) showed a significant positive effect of homoeopathy compared to placebo, and one MA (Mathie 2017) [11] showed no significant difference between homoeopathy and placebo (Table 12).
Table 12 Primary outcomes of systematic review: effect estimates for all trials and for high-quality trials
Sample restriction to trials fulfilling 1 quality criterion
Sensitivity analyses with sample restriction to trials fulfilling 1 quality criterion were reported in four MAs [6, 7, 10, 11], with a total of 12 analyses based on 7 different single quality components (‘explicitly randomised’, ‘adequate concealment of random allocation’, ‘double-blinding stated’, ‘follow-up adequate/complete’, ‘main outcome predefined’, ‘Medline-listed’, ‘free of [funding-related] vested interest’). Of the 12 analyses, 11 showed a significant positive effect of homoeopathy compared to placebo (Suppl. Table 28).
Sample restriction regarding 2–4 quality components
Sensitivity analyses with sample restriction regarding 2–4 quality components were reported in 3 MAs. In the MA conducted by Linde (1997) [6], trials with a Jadad score > 2 had a significant positive effect of homoeopathy. In the MA conducted by Linde (1998) [7], the effect estimate for trials fulfilling 3 criteria (Medline-indexed + double-blind + “no other obvious relevant flaws”) did not differ significantly from placebo. In the MA conducted by Shang [9] and analysed by Lüdtke [32], the effect estimates for high-quality trials (interpreted as based on 3 components) fulfilling one additional criterion (Medline-listed, English language, Intention-to-treat principle, respectively) analysed with random-effects or meta-regression did not differ significantly from placebo (Suppl. Table 29).
Sample restriction regarding ≥ 5 quality components
Sensitivity analyses with sample restriction regarding 5 or more quality components were reported in 3 MAs with one analysis each. In the MA conducted by Linde (1997) [6], trials with an internal validity score > 4.5 (n = 7 components) had a significant positive effect of homoeopathy. In the MAs conducted by Mathie (2014 and 2017) [10, 11], high-quality trials and A- and B-rated trials (trials rated as having low or uncertain risk of bias in all seven domains of Cochrane RoB), respectively, both sets in addition rated as free from publication-rated vested interests (n = 8 components each) showed no significant effect differences between homoeopathy and placebo (Suppl. Table 29).
Cumulative MA with stepwise removal of trials by risk-of-bias ratings
Cumulative MA with stepwise removal of trials by risk-of-bias ratings was performed in four MAs, including three (Linde 1997/1999, Mathie 2014 and 2017 [6, 7, 10, 11]) using incremental removal according to interval-scaled instruments and one (Cucherat [8]) using a rank-ordered scale. The scales used by Linde (1997/1999 [6, 30]) were additive (sum of score points), while the remaining scales were in part [10, 11] or fully [8] hierarchically constructed.
In the MA conducted by Linde (1997/1999) [6, 30], two cumulative MAs were performed: (1) For the Jadad score (range 0–5, 5 points indicating highest possible quality), a significant positive effect of homoeopathy was retained with a score of 5 points (n = 10 trials). For the internal validity score (range 1–7, 7.0 points indicating highest possible quality), significant positive effects of homoeopathy were retained up to 6.5 points (n = 7 trials), while no significant difference was observed for 7.0 points (n = 5 trials) (Suppl. Table 31).
In the MA conducted by Cucherat [8], a cumulative MA was performed using a rank-ordered scale, with step 4 indicating the highest possible quality assessed by the authors. Significant positive effects of homoeopathy were retained up to step 3 (double-blind + dropout rate < 10%, n = 9 trials), while no significant difference was observed at step 4 (double-blind + dropout rate < 5%, n = 5 trials) (Suppl. Table 33).
In the MAs conducted by Mathie (2013/2014 [10, 28] and Mathie (2017) [11]), one cumulative MA was performed based on the Cochrane RoB tool (2011 version), with 7 items for which the risk of bias was rated as low (A), uncertain (B) or high (C). Trials with 7 × A were rated A, trials with 7x (A or B) were rated as B and trials with ≥ 1 × C were rated as C. In addition to this hierarchical classification, Mathie counted the number of A- and B-rated items for each trial, allowing for a more differentiated assessment.
Statistical adjustment for possible publication bias or other small trial effects
Statistical adjustment for possible publication bias or small trial bias—without any additional sensitivity analysis—was performed for two MAs (Linde 1997, Mathie 2017 [6, 11]). In both cases, a significant positive effect of homoeopathy was retained after adjustment (Suppl. Table 34).
Sensitivity analyses with sample restriction to trials with a higher sample size
Sample restriction to trials with a higher sample size—without any additional sensitivity analysis—was performed for two MAs (Mathie 2014 and 2017) [10, 11]. In both cases, the sample was restricted to trials with a sample size above the median, and in both cases, a significant positive effect of homoeopathy was retained (Suppl. Table 30).
Sample restriction regarding methodological quality + restriction to trials with a higher sample size was performed in two MAs (Shang [9]: high-quality trials + “large” trials; Mathie (2017) [11]: A- and B-rated trials + sample size above the median for all trials). In both cases, no significant difference between homoeopathy and placebo was observed (Suppl. Table 35).
Lüdtke [32] performed a cumulative analysis, varying the cut-off point for ‘large trials’ among the 21 high-quality trials included in the MA conducted by Shang [9]: a significant effect of homoeopathy compared to placebo was observed with a sample restriction to the 20, 19, 18, 16, 15 and 14 largest trials, respectively, while no significant difference was found with a sample restriction to the 17, 13 and 1–12 largest trials, respectively [32].
In the MA conducted by Shang [9], meta-regression analyses of ‘predicted effect in trials as large as the largest trials included in the study’ (without further specification; we assume the authors meant the intercept from the regression of odds ratios on the standard error) showed no significant difference between homoeopathy and placebo (Additional file 2).
Subgroup interactions were analysed in 3 MAs (Shang, Mathie 2014 and 2017 [9,10,11]). No significant associations were found for duration of follow-up, indication type (acute/chronic/prophylaxis) or type of homoeopathy (4 groups) (Suppl. Table 36).
Effect estimates were analysed in a total of 23 subgroups, pertaining to indication (acute or chronic), type of homoeopathy (n = 10 subgroups), homoeopathic potency (n = 6) and outcome metric in trials (n = 5) (Suppl. Table 37). Of these 23 analyses, 21 showed a significant positive effect of homoeopathy, while two showed no significant difference from placebo: potencies < 12C in the MA conducted by Mathie (2014) [10], which was restricted to I-HOM; homoeopathic combination products in the MA conducted by Mathie (2017) [11] (a category only described and evaluated in this MA, cf. Suppl. Table 10). No subgroup analyses were performed on patient age groups.
Neither statistical homogeneity/heterogeneity nor funnel plot inspection with related statistical tests were reported in any subgroup as defined in Section 'Methods / Subgroup analyses'. However, withstanding that Mathie (2014) [10] and Mathie (2017) [11] were part of one MA programme, these two MAs can be considered subgroup analyses pertaining to the type of homoeopathy. For I-HOM (Mathie 2014 [10], n = 22 trials), neither heterogeneity nor FPA was found. For NI-HOM (Mathie 2017 [11], n = 54 trials), significant heterogeneity as well as FPA were found (cf. Section 'Assessments of bias and heterogeneity', above).
Of the 23 subgroup analyses, 15 were specified in a prepublished protocol (Mathie 2014 and 2017 [10, 11]), while 8 analyses—albeit from MAs based on predefined protocols—were not explicitly stated to be prespecified (Linde 1997 [6], Cucherat 2000 [8]). Of the 15 former analyses, 14 showed a significant positive effect of homoeopathy, while 1 did not (Mathie 2014 [10], see above).
Data for the comparison of MAs of placebo-controlled trials of homoeopathic and conventional treatment in Shang [9] are presented in Additional file 2.
After literature searches and data collection for this SR had been completed, an additional subgroup analysis of the MA conducted by Mathie (2017) [11] was published, which we decided to include, as it concerned an item that had not been analysed for any of the MAs: trial registration (Gartlehner 2022) [34]).
The 54 trials included in the MA conducted by Mathie (2017) [11] were published in the period from 1976 to 2014, and 20 of those trials were published from 2002 to 2014. Of this group, Gartlehner et al. analysed 19 trials, stratified according to clinical trial registration, which had been available at ClinicalTrials.gov since 2000. A random effects MA showed a positive significant effect of homoeopathy compared to placebo in n = 6 registered trials (SMD 0.53, 95% CI 0.20–0.87) and no significant difference from placebo in n= 13 unregistered trials (SMD 0.14, 95% CI − 0.07 to + 0.35). However, the between-group difference in effect estimates was not significant (meta-regression: SMD 0.39, 95% CI − 0.09 to + 0.87) [34]. It is not clear why trial #A93 of the MA conducted by Mathie (2017 [11], Lewith 2002, listed in Gartlehner [34], Supplement Table 3 as ‘not registered’) was not included in these analyses.
The proportion of registered trials was 100% (n = 3/3) among high-quality trials and 19% (n = 3/16) among the other trials (Suppl. Table 38).
The assessment of confidence in cumulative evidence for research questions 1 and 2 (cf. Section 'Research questions', above) according to the GRADE framework (cf. Section 'Confidence in cumulative evidence/Certainty assessment') is presented in Additional file 3. Conclusions are summarised in the following Sections:
The quality of evidence (high/moderate/low/very low) for significant positive effects of homoeopathy beyond placebo is moderate for ALL-HOM and NI-HOM and high for I-HOM.
If the data sources were restricted to MAs with a low risk of bias [6, 10, 11], the quality of evidence would be changed to high for ALL-HOM and remain high for I-HOM and moderate for NI-HOM.
The available data yield no support for the alternative hypothesis of no outcome difference between homoeopathy and placebo.
The notion of a common positive effect is
As the MA of NI-HOM (Mathie 2017 [11]) comprised different indications treated with different homoeopathic products, the latter finding suggests that the effects of NI-HOM may differ across different indications and/or different homoeopathic products used. Such effect differences may include significant positive effects of NI-HOM as well as no significant difference between NI-HOM and placebo in different subgroups.
The limited data available support the notion of a common positive effect of homoeopathy for acute as well as chronic indications. The issue of effect differences among different diagnoses or diagnosis groups is outside the scope of this SR.
In this first SR of MAs of placebo-controlled randomised trials of homoeopathy for any disorder in humans, homoeopathy had a significant positive effect compared to placebo for all eligible trials in 5 of 5 evaluable MAs and for high-quality trials in 3 of 4 MAs. Assessed by the GRADE system, the quality of evidence for positive effects (high/moderate/low/very low) was high for I-HOM and moderate for ALL-HOM as well as for NI-HOM. There was no support for the alternative hypothesis of no outcome difference between homoeopathy and placebo.
The strengths of this SR include a detailed, prepublished PRISMA-P [12] -compliant protocol with two focused research questions, comprehensive presentation of findings, the use of well-established assessment instruments (ROBIS [13], GRADE [20]) and adherence to standard reporting guidelines (PRISMA 2020 [27]).
The scope of this review had two clear limitations: it was restricted to efficacy in placebo-controlled trials and did not address results for specific indications or indication groups.
We used the GRADE system to assess confidence in the cumulative evidence and found it very helpful. Nonetheless, there are three relevant differences between the GRADE approach and this SR: (1) The GRADE approach is indication- and outcome-specific, while we studied MAs with effect estimates for trials with different indications and outcomes. (2) The GRADE framework is tailored to comparative effectiveness, while we assessed MAs of placebo-controlled trials. (3) The GRADE assessment of confidence in cumulative evidence refers to the magnitude of effects, while our research question concerned the existence of significant effects of homoeopathy beyond placebo (yes/no). Accordingly, our conclusions on confidence in the cumulative evidence may not be directly comparable to those of other SRs in the same research field.
According to the ROBIS framework, the risk of bias of the six included MAs was rated as low for Linde (1997) [6], Mathie (2014 [10]) and Mathie (2017 [11]) and high for Linde (1998) [7], Cucherat [8] and Shang [9].
The evidence generated in this SR is based on 6 MAs, of which the risk of bias was rated as low for 3 and high for 3. If the data were restricted to the 3 MAs with a low risk of bias, the quality of evidence would be rated high for ALL-HOM and I-HOM and moderate for NI-HOM (Additional file 3).
Compared with trials of nonhomoeopathic interventions, which were assessed with identical rating instruments, the methodological quality of the homoeopathy trials in the MAs of this SR was similar for the MAs conducted by Mathie (2014 and 2017 [10, 11]) and higher for the MA conducted by Shang [9]. Significant associations between methodological quality and effect estimates were found in 12 of 24 analyses. After restricting the sample to high-quality trials according to predefined criteria, effect estimates were reduced [6, 11] or increased [10], with 3 of 4 MAs showing significant effects of homoeopathy compared to placebo. When adding a 5 th MA (Cucherat [8]) to the assessment and applying the same high-quality criteria as in the 3-component model of Shang [9], 4 of 5 MAs showed significant benefit of homoeopathy.
As assessed by the GRADE system, the quality of evidence for positive effects (high/moderate/low/very low) was high for I-HOM and moderate for NI-HOM and ALL-HOM. In comparison, among 608 Cochrane reviews published from January 2013 to June 2014, the GRADE-assessed quality of evidence for the primary outcome was high in only 13% of reviews, moderate in 31%, low in 32% and very low in 24% [44]. In a randomised sample of Cochrane reviews up until 2021, 90% of 1567 GRADE-assessed interventions were not supported by evidence of high quality [45].
This SR had two limitations. (1) As this was a SR of MAs rather than of individual trials, the trials examined herein were limited to those included in the MAs. Thus, eligible trials published after 2011 and 2014 for I-HOM and NI-HOM, respectively, could not be included. (2) Differential effects of homoeopathy on different indications and patient groups were only assessed for acute and chronic indications and for adults and children, with very limited data available.
According to this SR, homoeopathy can have positive effects beyond placebo on disease in humans. This is in accordance with laboratory experiments showing partially replicable effects of homoeopathically potentised preparations in physico-chemical [46], in vitro [47], plant-based [48, 49] and animal-based [50,51,52] test systems.
In contrast to frequent claims, the available MAs of homoeopathy in placebo-controlled randomised trials for any indication show significant positive effects beyond placebo. Compared to other medical interventions, the quality of evidence for efficacy of homoeopathy was similar or higher than for 90% of interventions across medicine [45]. Accordingly, the efficacy evidence from placebo-controlled randomised trials provides no justification for regulatory or political actions against homoeopathy in health-care systems.
For I-HOM, an update of the MA conducted by Mathie (2014 [10]) would be warranted to reassess efficacy evidence after inclusion of trials published after 2011. For NI-HOM, the results of the MA conducted by Mathie (2017 [11]) with 54 trials were heterogeneous. Accordingly, future research on the efficacy of NI-HOM should focus on specific nonindividualised forms of homoeopathic therapy or specific interventions therein for specific indications. Recommendations for comparative effectiveness research on homoeopathy are beyond the scope of this review.
The complete protocol is permanently available on the website of the institution of the corresponding author: https://www.ifaemm.de/Abstract/PDFs/SMAP-HOM_Protocol_2020_11_25.pdf. All data extracted from the MA publications as well as analyses performed by the authors of this SR are presented in Tables 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 and Additional files 1, 2, 3, 4, 5.
Amendments, additional analyses and data
Amendments to the protocol from 25 Nov. 2020 are listed and explained in Suppl. Table 39. Additional analyses and data, not described in the protocol, are listed and explained in Suppl. Table 40.
The content of the manuscript has not been published or submitted for publication elsewhere.
We thank Gunver S. Kienle (GSK) for the assistance with data extraction and assessment of risk of bias of the MAs.
Open Access funding enabled and organized by Projekt DEAL. Funding specifically for this SR was provided by Christophorus-Stiftung (No. 393 CST), Stiftung Marion Meyenburg (Date 24.09.2020), Dr. Hauschka Stiftung (Date 16.11.2020) and Gesellschaft für Pluralität im Gesundheitswesen (Dates 11.06.2021, 22.06.2021). General funding for IFAEMM was provided by the Software-AG Stiftung (SE-P 13544). The funders had no influence on the writing of the protocol or on the planning, conduct and publication of this SR.