Missing data mechanisms and terminology
Analysis of studies with missing outcome data involves untestable assumptions about the reasons the participants dropped out early from the study (the mechanism of dropout). Wrong assumptions may lead to potentially biased effect sizes. To implement an appropriate analytical approach in studies with missing outcome data, we need to understand the missing outcome data mechanism. Little and Rubin1 mechanism into three categories:
a. Missing at random (MAR)
Under MAR assumption the probability of dropout depends on the fully observed covariates (e.g. intervention, baseline), but not of the unobserved ones. If we identify fully observed covariates that are associated with high risk of dropout (as well with the outcome), then conditional on these covariates, we assume that the distribution of the response is the same between those who dropped out and those who remained in the study. For instance, elder patients seem to have lower response to a particular intervention than younger patients and hence, they are in higher risk of leaving the study early.
b. Missing completely at random (MCAR)
The probability of dropout (also known as missingness) depends neither on the observed measurements (e.g. baseline covariates, observed responses) and unobserved measurements (those that would have been observed if the patient had stayed in the study). This means that the reason for dropout is entirely unrelated to the study. The effect of an intervention will thus be the same on average among those who remained and those who dropped out. For instance, if the outcome of a participant is missing due to a car accident or relocation without informing his doctor, then the mechanism is MCAR.
c. Missing not at random (MNAR) classified the missing outcome data
Under this assumption the probability of dropout depends on some unobserved covariates or on the outcome. For instance, patients without health improvement are in higher risk of dropout (missingness depends on the side-effects). Similarly, patients with side-effects tend to have low responses and higher risk of leaving the study early (missingness depends on the outcome).
1.Little RJA, Rubin DB. Statistical Analysis With Missing Data. Wiley: 1987.
Methods to handle missing outcome data
Within a study data are commonly available on a patient level allowing for the missing outcome data mechanism to be explored. A great variety of statistical techniques to handle missing outcome data, such as, last observation carried forward (LOCF), multiple imputations and maximum likelihood methods have been suggested in the literature. It is important however that the researcher is familiar with the advantages and the disadvantages, as well as the assumptions and the complexity underlying the suggested techniques so as to select the most appropriate one.
Within the meta-analysis framework however, data are usually available on a study level and in particular in the form of a summary effect (e.g. mean responses along with their standard deviation or number of successes and failures for each intervention). Therefore, the missing outcome data mechanisms cannot be explored. Although individual patient data (IPD) can be elicited, these are not always available to the meta-analysts. As a result, untestable assumptions on the missing data mechanism need to be taken into account in order to synthesize the available information and perform the meta-analysis. However, the available techniques to address missingness are very limited, setting the manipulation of missing outcome data in a meta-analysis level a real challenge.
The majority of the strategies proposed on addressing missingness within a meta-analysis have been developed mostly for binary outcome. Below we briefly introduce the suggested methods along with their advantages and disadvantages.
a. Available Case Analysis (ACA) or Complete Case analysis (CC)
This is the simplest and most common approach in meta-analysis with missing outcome data. Under CC analysis, participants with missing outcome data are viewed as carrying no information and only the data from patients who have completed the study are included in the analysis. This approach requires that the missing outcome data are ignorable (MCAR or MAR), otherwise it can lead to biased estimates. However, as remarked earlier, methods like CC analysis that totally ignore missing data, yield imprecise results of low statistical power. Moreover, this approach runs against the basic principles of the Intention To Treat (ITT) analysis, which is the preferred analysis from many clinical trials.
b. Last Observation Carried Forward (LOCF)
LOCF is a widely used approach on handling missing outcome data in longitudinal studies. This approach can be implemented for participants who left the study before the final measurement but provided one or more intermediate measurements. The LOCF imputation scheme assumes that for patients dropping prematurely out of the study the last observed measurement is representative and can be used instead of an actual observation at the end of the study.
LOCF is an easy and extensively used approach; in a recent meta-analysis of 212 trials comparing antipsychotics, the vast majority reported results according to the LOCF1 However, it has been widely critiqued2 remains constant after its last measurement and till the end of the study introducing systematic error and further leading to incorrect estimates for the treatment effects.
c. Imputed case analysis (ICA)
Under this analysis, the missing responses in each intervention are imputed using a specific assumption on the outcome that the missing participants could have provided if they had never left the study. The most frequently applied assumptions are the following:
1. All missing outcome data are non-events (ICA-0) assumes that all missing participants in both interventions have not experienced the event;
2. All missing outcome data are events (ICA-1) assumes that all missing participants in both interventions have experienced the event;
3. Best case scenario for the experimental intervention (ICA-b) assumes that all missing participants in the experimental intervention have experienced the event, whereas all missing participants in the control intervention have not experienced the event;
4. Worst case scenario for the experimental intervention (ICA-w) assumes that all missing participants in the experimental intervention have not experienced the event, whereas all missing participants in the control intervention have experienced the event;
5. Same risk as in the control intervention (ICA-pC) assumes that both interventions have same the risk of event as calculated in the control intervention;
6. Same risk as in the experimental intervention (ICA-pE) assumes that both interventions have same the risk of event as calculated in the experimental intervention;
7. Intervention-specific risk (ICA-p) where the missing outcome in the experimental intervention is imputed by using the estimated risk of event in the experimental intervention, whereas the missing outcome in the control intervention is imputed by using the estimated risk of events in the control intervention. This approach corresponds to the MAR assumption. For each assumption a meta-analysis is implemented. The range of the meta-analysis effect sizes shows the robustness of the conclusions and the relevance of the MAR assumption. Inconsistent results are a strong indication that data are MNAR.
The main advantage of this method is that it preserves the original number of randomized patients, while if the assumptions are reasonable, ICA tends to yield unbiased estimates. However, the ICA method tends to provide spuriously increased precision, since it treats the . and can be restrictive as it assumes that the outcome imputed values as observed and hence, it ignores uncertainty about these imputed values. Finally, another caveat of this method is that only a limited number of assumptions are usually applied as sensitivity analysis.
d. Uncertainty intervals
Gamble and Hollis3 extreme imputation assumptions; in a meta-analysis of studies ICA-b and ICA-w are implemented in each study and the most extreme lower and upper confidence interval limits from these imputations are used to form a so-called uncertainty interval for each study. As a result the standard errors that are extracted from the uncertainty intervals are inflated, leading to reduced weights. These weights reflect the added uncertainty one might expect due to missing outcome data; the higher the missing rate the smaller the weights.
e. Informative missingness odds ratio (IMOR)
IMOR4,5 is a parameter that is calculated for each intervention included in the analysis and reflects how informative missing outcome data are; IMORE and IMORC correspond to the IMOR in the experimental and control intervention respectively. Letting be the risk of event in the experimental intervention and the risk of event in the control intervention, IMOR in each intervention is defined as: proposed an approach that incorporates both ACA and the two most
means that the odds of missing an event is equal to the odds of missing a non-event in the experimental intervention and hence, it implies the MAR assumption. IMOR values larger than 1 suggest that missing participants are more likely to experience the event, whereas the opposite occurs for IMOR values lower than 1. Under the ICA-0 and ICA-1 assumptions it holds that , while under the assumptions ICA-b and ICA-w: and , respectively. The IMOR method is a special case of the imputation methods described above. The analogy of these two methods is presented in the following table.
|ICA-0||All missing outcome data are non-events|
|ICA-1||All missing outcome data are events|
|ICA-b||Best case scenario for the experimental intervention|
|ICA-w||Worst case scenario for the experimental intervention|
|ICA-pC||Same risk as in the control intervention|
|ICA-pE||Same risk as in the experimental intervention|
A sophisticated extension of the meta-analysis model combines the IMOR along with the intervention effects derived from the observed individuals so as to obtain a ‘missingness- adjusted’ meta-analysis result for the entire randomized population. A prior assumption can be made for the IMOR parameter in each study and each intervention. For instance, one may assume that IMORs differ between the interventions if there is evidence that participants allocated to a more intensive intervention tend to provide worse outcomes and leave the study early. Or, it can be assumed that IMORs differ among the studies if there is evidence that studies with longer flow-up duration tend to have higher dropout rate.
Contrary to ICA methods, the IMOR approach accounts for the uncertainty due to missing outcome data, while it provides estimates based on assumptions about the degree of informative missingness. The IMOR approach could be extended to assess missingness when the outcome is continuous. Such an extension is not straightforward for the ICA methods. Despite the advantageous IMOR approach, it is implemented in a Bayesian framework which requires that the researcher is familiar with the notion of the Bayesian analysis.
As already discussed, the choice of a sensible assumption regarding the underlying missingness is of crucial importance on performing an analysis with missing outcome data. However, assumptions about the missing mechanism are typically untestable and they are subject to a researcher’s knowledge/opinion. Obviously, this can lead to assumptions that are not strongly supported by the data, which further result to wrong inferences regarding the object under study, consequently affecting the meta-analytical results. Therefore a sensitivity analysis is usually suggested.
The main idea of performing a sensitivity analysis is to consider what alternative assumptions might be true and examine whether they lead to different conclusions. Exploring various different scenarios, inference can be made regarding the robustness of the conclusions drawn.
Sensitivity analysis must consider a range of plausible alternative assumptions about the missing data, which must contradict the main assumption. The simplest form of a sensitivity analysis is an ad hoc analysis, according to which one can perform different analyses for missing outcome data (LOCF, CC etc) and compare their results. From a different perspective, one can examine a set of different values for the parameter(s) “causing” the sensitivity, i.e. the parameter that controls the departure from the main assumption (principled sensitivity analysis). The choice of possible values for the “sensitivity parameter” can be elicited either using experts’ opinion or subjective knowledge, or via a collection of data. Finally, prior distributions can be used to embed the assumption regarding the nature of the missing data (IMOR).
Contributing to the elimination of attrition in meta-analysis and as an extent to the reliability of the meta-analytical inferences, sensitivity analysis can be proved to be very advantageous within the meta-analysis framework. Also, while it is not mathematically difficult to perform, sensitivity analysis provides a concrete and coherent image for the object under study. However, one of the most common pitfalls of sensitivity analysis is that one can choose to examine two analyses each of which makes different assumptions, but, they can both be equally wrong. While finally, a set of different values for the sensitivity parameter(s) can be difficult to be agreed on.
Challenges in handling missing outcome data
The most challenging aspects of meta-analysis with missing outcome data relate to the limited sample size, the design of RCTs, the assumption underlying the meta-analysis and the method chosen as the most appropriate one. In particular:
- Assumptions regarding the missing mechanism cannot be verified by the data.
- Ambiguity of results cannot be overcome by increasing the sample size, since this would subsequently increase the number of missing observations.
- There is no such thing as an optimum strategy/methodology one could follow to handle missing data. Strategically design trials and in particular the data collection process so as to minimize the occurrence of missing data. Properly account for the various aspects of the study and consider:
- population under study/target group (e.g elderly people need simplified questions)
- easily obtained outcomes
- alternative data collection schemes
- possible sources of missing data (e.g due to the disease itself or the studied population).
- Leucht S, Cipriani A, Spineli L, Mavridis D, Orey D, Richter F, Samara M, Barbui C, Engel RR, Geddes JR, Kissling W, Stapf MP, Lassig B, Salanti G, Davis JM. Comparative efficacy and tolerability of 15 antipsychotic drugs in schizophrenia: a multiple- treatments meta-analysis. Lancet 2013;382(9896):951-62.
- Mallinckrodt CH, Watkin JG, Molenberghs G, Carroll RJ. Choice of the primary analysis in longitudinal clinical trials. Pharmaceut Statist 2004;3(3):161-9.
- Gamble C, Hollis S. Uncertainty method improved on best-worst case analysis in a binary meta-analysis. J Clin Epidemiol 2005;58(6):579-88.
- Higgins JP, White IR, Wood AM. Imputation methods for missing outcome data in meta-analysis of clinical trial. Clin Trials. 2008;5(3):225-39.
- White IR, Higgins JPT, Wood AM. Allowing for uncertainty due to missing data in meta-analysis--part 1: two-stage methods. Stat Med 2008;27(5):711-27.
- White IR, Welton NJ, Wood AM, Ades AE, Higgins JPT. Allowing for uncertainty due to missing data in meta-analysis--part 2: hierarchical models. Stat Med 2008;27(5):728-45.