Time Series Analysis of Malaria Trends — Burkina Faso (2016–2023)

R
Epidemiology
Malaria
DHS
climate
Time series
Quantify the trend in malaria incidence at health district level and identify the factors associated with malaria incidence in Burkina Faso from 2016-2023 using routine cases data
Published

August 30, 2025

Period: 2016–2023 · Role: Lead Data Analyst · Location: Burkina Faso

Note

View Code
← Back to Projects

Overview

The aim of this study was to assess the malaria trends in Burkina Faso at the health district level to help understand the malaria transmission level between 2016 and 2023 and to determine the factors associated with malaria incidence to better understand the drivers of malaria in Burkina Faso. This analysis combined publicly available DHS/MIS and climate data with restricted HMIS surveillance data obtained from PNLP under data sharing agreement.

Role: Sole author responsible for data cleaning, quality assessment, analysis, modeling, mapping, and reporting.

Methods

Data sources

Variable Definition Source Temporal Period
Suspected cases Clinical malaria suspicion (fever) HMIS Monthly 2016–2023 (excl. 2019)
Tested cases Received RDT or microscopy HMIS Monthly 2016–2023
Confirmed cases Positive RDT or microscopy HMIS Monthly 2016–2023
Presumed cases Treated without positive test HMIS Monthly 2016–2023
Population District population HMIS Annual (Monthy est) 2016–2023
Treatment-seeking (U5) Public/private/no care DHS/MIS Every 3 years 2014, 2017–18, 2021

Data management

I applied a standard data quality workflow before analysis:

  • Standardized district and facility names across years.

  • Removed duplicates/inconsistencies; harmonized facility identifiers.

  • Classified facility reporting activity (active vs. inactive).

  • Detected and imputed outliers using mean ± 3 SD, Median absolute deviation (MAD), and Interquartile range (IQR) rules.

  • Managed missingness; validated reporting rates.

Incidence adjustments

Crude malaria incidence was calculated by dividing the number of reported confirmed cases for each health district and month by the district population and multiplying by 1000. Then, crude incidence was adjusted for each factor in accordance with the WHO framework.

Trend quantification

  • STL decomposition (LOESS) to separate seasonal, trend, residual components.
  • Sen’s slope for monotonic trend magnitude.
  • Mann–Kendall test for trend significance (α = 0.05).

Factors associated with incidence

Outcome: annual incidence adjusted for testing, reporting, treatment-seeking (Adjustment 3).
Covariates (2016–2023): climate, Insecticide Treated nets access/use, Seasonal Malaria Chemoprevention coverage, stockouts (Rapid Diagnostic Test/ACT/IPTp).
Model: Generalized Additive Models (GAMs) for nonlinear effects.

Key Deliverables

  • Malaria burden estimated under four incidence definitions (2016–2023).
  • Trend analysis (seasonal + long-term) by district for each incidence definition.
  • Significant covariates associated with adjusted incidence.

Visual Highlights

1. Reporting rate

Fig. Monthly reporting rates per health district
Note

Key Insight: Data quality varied considerably from district to district. Some districts reported less than 50% during certain periods, which significantly skewed the crude incidence estimates.


2. Incidence estimates

Fig. Incidence estimates following WHO framework. A-D: Temporal trend for the four incidences (crude incidence, incidence adjusted by testing rate, incidence adjusted by testing rate and reporting rate, incidence adjusted by testing rate, reporting rate and care-seeking rate; E-D: Spatial analysis for the four incidences from 2023.
Note

Key Insight: Crude incidence and Adjustment 1 were nearly identical across districts, reflecting strong case management policies that ensure treatment follows testing. However, Adjustments 2 and 3 revealed hidden burden: poor reporting completeness in 2023 inflated incidence in several northern and eastern districts, while accounting for care-seeking (Adjustment 3) added ~450 cases per 1,000 in Gorom-Gorom, Gaoua, Kaya, Boussouma, and Fada. This underscores the importance of integrating care-seeking behavior into malaria burden estimates.


Code example

Normalize Function

This function standardizes the four malaria incidence indicators so they can be compared on the same scale across districts.

Tip

Reproducibility Note:
All analyses are fully reproducible in R. This snippet shows how indicators were normalized prior to trend analysis.

Code
## Function for normalize the four incidences rates (comparaison purpose)
getNormalized <- function(vec) {
  if (!is.numeric(vec) || all(is.na(vec))) {
    warning("Input vector is non-numeric or all NA; returning original vector")
    return(vec)
  } 
  vec_mean <- mean(vec, na.rm = TRUE) 
  vec_sd <- sd(vec, na.rm = TRUE) 
  if (is.na(vec_sd) || vec_sd == 0) {
    warning("Standard deviation is 0 or NA; returning original vector")
    return(vec)
  }
  (vec - vec_mean) / vec_sd 
}

## Create new variables to use in the trend analysis

monthly_DS_incidence <- monthly_DS_incide %>%
  dplyr::mutate(
    mal_cases_norm = getNormalized(`Incidence brute`),
    incidence_adj_presumed_cases_norm = getNormalized(Adj1),
    incidence_adj_presumed_cases_RR_norm = getNormalized(Adj2),
    incidence_adj_presumed_cases_RR_TSR_norm = getNormalized(Adj3)
  )

STL Decomposition

This function decomposes the normalized time series into seasonal, trend, and remainder components, and applies Mann-Kendall and Sen’s slope tests to assess statistical significance.

Tip

Key Insight:
STL decomposition helped disentangle seasonal variation from long-term malaria trends, allowing robust detection of increasing or decreasing trajectories at the district level.

Code
## STL decomposition

STL_result_DF_norm_list <- list()
indicators <- list(
  list(col = "mal_cases_norm", type = "Crude Incidence"),
  list(col = "incidence_adj_presumed_cases_norm", type = "Adjusted Presumed Cases"),
  list(col = "incidence_adj_presumed_cases_RR_norm", type = "Adjusted Presumed Cases RR"),
  list(col = "incidence_adj_presumed_cases_RR_TSR_norm", type = "Adjusted Presumed Cases RR TSR")
)

## Monthly data
monthly_DS_incidence = monthly_DS_incidence 

## loop
for (DS in base::sort(base::unique(monthly_DS_incidence$districts))) {
  cases_dist <- monthly_DS_incidence %>%
    dplyr::filter(districts == DS) %>%
    dplyr::arrange(Date)
  
  if (nrow(cases_dist) == 0) {
    warning(paste("No data for district:", DS, "- skipping"))
    next
  }
  
  for (ind in indicators) {
    if (!ind$col %in% names(cases_dist)) {
      warning(paste("Column", ind$col, "not found in district", DS, "- skipping"))
      next
    }
    
    ind_values <- cases_dist[[ind$col]]
    valid_idx <- !is.na(ind_values)
    valid_ind_norm <- ind_values[valid_idx]
    valid_dates <- cases_dist$Date[valid_idx]
    
    if (length(valid_ind_norm) < 2) {
      warning(paste("Fewer than 2 valid observations for", ind$col, "in district", DS, "- skipping"))
      next
    }
    
    start_year <- base::as.numeric(format(valid_dates[1], "%Y"))
    start_month <- base::as.numeric(format(valid_dates[1], "%m"))
    ind_ts <- stats::ts(valid_ind_norm, start = c(start_year, start_month), deltat = 1/12)
    ind_stl <- stlplus::stlplus(ind_ts, s.window = "periodic")
    
    ind_stl_ts <- base::as.data.frame(ind_stl$data[, 1:4])
    ind_stl_ts$type <- ind$type
    ind_stl_ts$dates <- valid_dates
    ind_stl_ts$MK_p <- trend::smk.test(ind_ts)$p.value
    ind_stl_ts$sens_slope <- trend::sea.sens.slope(ind_ts)
    ind_stl_ts$District <- DS
    
    STL_result_DF_norm_list[[base::paste(DS, ind$col, sep = "_")]] <- ind_stl_ts
  }
}

STL_result_DF_norm <- data.table::rbindlist(STL_result_DF_norm_list, fill = TRUE)
STL_result_DF_slopes <- base::unique(STL_result_DF_norm[, c("type", "MK_p", "sens_slope", "District")])

Repository & Reproducibility

Note
  • Repository: Full code

  • Data:

    • Public datasets: DHS, MIS, and climate datasets (CHIRPS rainfall) are fully open-access.
    • Restricted datasets: Routine HMIS surveillance data and intervention coverage data (ITN, SMC, RDT, ACT, IPTp stockouts) are not publicly available and were accessed under agreement with the Burkina Faso National Malaria Control Programme (PNLP).

    For reproducibility, all data management scripts and statistical analysis code are provided.

Public Health Relevance

  • Demonstrated how data quality issues (reporting gaps, inactive facilities) bias malaria burden estimation.
  • Provided robust adjusted incidence trends to better inform malaria control strategy in Burkina Faso.
  • Delivered reproducible R workflow (cleaning, validation, modeling, mapping) for future NMCP analyses.