Analysis Data Subject Level (ADSL) Generation with SAS

Automated creation of ADSL dataset for clinical trials analysis using ADaM standards and multi-domain SDTM integration
Author

Ousmane DIALLO

Published

August 30, 2025

Keywords

ADaM, ADSL, CDISC, SAS, clinical trials, biostatistics, regulatory

Analysis Data Subject Level (ADSL) Generation with SAS

Automated creation of ADSL dataset for clinical trials analysis using ADaM standards and multi-domain SDTM integration.

View Code on Github

Introduction

The Analysis Data Subject Level (ADSL) dataset is the cornerstone of clinical trial statistical analysis under the CDISC Analysis Data Model (ADaM) standard. ADSL provides one record per subject containing all subject-level information needed for efficacy and safety analyses.

What is ADSL?

ADSL (Analysis Data Subject-Level) dataset is the foundation dataset in ADaM.

It:

  • Contains one record per subject

  • Includes required subject-level variables (demographic, disposition, baseline values, key date, etc..) as well as other subject-level analysis variables

  • Defines analysis populations (Safety, Intention-To-Treat, Per-Protocol)

  • Provides treatment information (planned and actual treatment) and study periods variables

  • Links to all other ADaM datasets through USUBJID (unique subject identifier)

  • Ensures consistency and traceability across all statistical analyses and reporting

Regulatory Importance

  • FDA Requirement: Essential for regulatory submissions

  • Analysis Foundation: Referenced by all Basic Data Structure (BSD) and occurrence Data Structure (OCCDS) datasets

  • Population Definition: Critical for primary endpoint analyses

  • Traceability: Links analysis results back to SDTM source

Project Overview

Objective

Create a fully compliant ADSL dataset from multiple SDTM domains, implementing analysis populations, treatment variables, and derived endpoints following ADaM Implementation Guide principles.

Key Benefits

  • ADSL Dataset: Subject-level analysis file with proper ADaM structure

  • Population Flags: SAFFL, ITTFL, COMPLFL, MITT1FL definitions

  • Treatment Variables: Planned and actual treatment assignments

  • Timing Variables: Study milestones and treatment periods

  • Validation Reports: QC checks ensuring data integrity

Technical Approach

Multi-domain integration workflow:

SDTM (DM+DS+EX+AE) → Business Logic → Population Flags → Treatment Periods → ADSL

Data Integration Strategy

SDTM Source Domains

Domain Purpose Key Variables
DM Demographics & baseline USUBJID, AGE, SEX, RACE, ARM
DS Disposition events DSSTDTC, DSTERM, DSDECOD
EX Exposure/treatment EXSTDTC, EXENDTC, EXDOSE
AE Adverse events AESTDTC, AETERM, AESEV
VS Vital signs (if needed) VSSTDTC, VSTEST, VSORRES

ADaM Variables to Derive

Variable Class Examples Source Logic
Required ADaM STUDYID, USUBJID, SUBJID Direct from DM
Population Flags SAFFL, ITTFL, COMPLFL Business rules from DS/EX
Treatment TRT01P, TRT01A, TRT01PN From DM.ARM + actual exposure
Timing TRTSDT, TRTEDT, LSTALVDT From EX + DS disposition
Demographics AGE, AGEGR1, SEX, RACE Enhanced from DM

Technical Implementation

1. Library Setup

/* Library and macro setup */
%include "/home/u64197545/ADaM/programs/setup.sas";

/* Expected libraries:
   - SDTM: Source SDTM datasets
   - ADAM: Output ADaM datasets  
   - LIBRARY: Formats and macros
*/

libname sdtm "/home/u64197545/data/sdtm";
libname adam "/home/u64197545/data/adam";
libname library "/home/u64197545/data/library";

/* Load standard ADaM formats */
%include library(adamfmt);

2. Population Definition Logic

/* Define analysis populations based on business rules */
%macro define_populations();
    
    /* Safety Population: Any exposure to study drug */
    proc sql;
        create table safety_pop as
        select distinct usubjid, 'Y' as saffl
        from sdtm.ex
        where exdose > 0 and exstdtc ne '';
    quit;
    
    /* ITT Population: Randomized subjects */
    proc sql;
        create table itt_pop as
        select distinct usubjid, 'Y' as ittfl
        from sdtm.dm
        where arm ne '' and upcase(arm) not in ('SCREEN FAILURE', 'NOT TREATED');
    quit;
    
    /* Completed Population: Finished study per protocol */
    proc sql;
        create table completed_pop as
        select distinct usubjid, 'Y' as complfl
        from sdtm.ds
        where upcase(dsdecod) in ('COMPLETED', 'STUDY COMPLETED')
          and dscat = 'DISPOSITION EVENT';
    quit;

%mend define_populations;

%define_populations();

3. Treatment Period Derivation

/* Calculate treatment start and end dates */
proc sql;
    create table treatment_period as
    select usubjid,
           min(exstdtc) as first_dose_dt format=yymmdd10.,
           max(case when exendtc ne '' then exendtc else exstdtc end) as last_dose_dt format=yymmdd10.,
           sum(case when exdose > 0 then 1 else 0 end) as exposure_days
    from sdtm.ex
    where exstdtc ne ''
    group by usubjid;
quit;

/* Derive analysis treatment dates */
data treatment_analysis;
    set treatment_period;
    
    /* Treatment start/end dates for analysis */
    trtsdt = first_dose_dt;
    trtedt = last_dose_dt;
    
    /* Treatment duration */
    if trtsdt ne . and trtedt ne . then
        trtdurd = trtedt - trtsdt + 1;
    
    format trtsdt trtedt yymmdd10.;
run;

4. Study Disposition Integration

/* Get key disposition dates and reasons */
proc sql;
    create table disposition as
    select usubjid,
           case 
               when upcase(dsdecod) = 'INFORMED CONSENT' then dsstdtc
           end as rficdt format=yymmdd10.,
           case 
               when upcase(dsdecod) in ('COMPLETED', 'DISCONTINUED') then dsstdtc
           end as rfendt format=yymmdd10.,
           case 
               when upcase(dsdecod) = 'DISCONTINUED' then dsterm
           end as dcsreas
    from sdtm.ds
    where dscat = 'DISPOSITION EVENT' and dsstdtc ne '';
quit;

/* Consolidate to one record per subject */
proc sql;
    create table disposition_final as
    select usubjid,
           max(rficdt) as rficdt format=yymmdd10.,
           max(rfendt) as rfendt format=yymmdd10.,
           max(dcsreas) as dcsreas
    from disposition
    group by usubjid;
quit;

5. Last Alive Date Calculation

/* Calculate last known alive date from multiple sources */
proc sql;
    create table last_alive as
    select usubjid,
           max(aestdtc) as last_ae_dt format=yymmdd10.
    from sdtm.ae
    where aestdtc ne ''
    group by usubjid;
quit;

data last_alive_calc;
    merge treatment_analysis
          disposition_final  
          last_alive;
    by usubjid;
    
    /* Last alive date = latest of multiple sources */
    lstalvdt = max(trtedt, rfendt, last_ae_dt);
    format lstalvdt yymmdd10.;
run;

6. Combine All ADSL Dataset

/* Merge all components to create ADSL */
proc sql;
    create table pre_adsl as
    select dm.*,
           sp.saffl,
           itt.ittfl, 
           cp.complfl,
           ta.trtsdt,
           ta.trtedt,
           ta.trtdurd,
           lac.lstalvdt,
           disp.rficdt,
           disp.rfendt,
           disp.dcsreas
    from sdtm.dm as dm
    left join safety_pop as sp on dm.usubjid = sp.usubjid
    left join itt_pop as itt on dm.usubjid = itt.usubjid  
    left join completed_pop as cp on dm.usubjid = cp.usubjid
    left join treatment_analysis as ta on dm.usubjid = ta.usubjid
    left join last_alive_calc as lac on dm.usubjid = lac.usubjid
    left join disposition_final as disp on dm.usubjid = disp.usubjid;
quit;

/* Apply ADaM standards and derive final variables */
data adam.adsl;
    set pre_adsl;
    
    /* Required ADaM variables */
    studyid = upcase(studyid);
    
    /* Population flags - default to 'N' if missing */
    if saffl = '' then saffl = 'N';
    if ittfl = '' then ittfl = 'N'; 
    if complfl = '' then complfl = 'N';
    
    /* Treatment variables */
    trt01p = strip(arm);        /* Planned treatment */
    trt01pn = input(armcd, best.); /* Planned treatment (N) */
    trt01a = strip(actarm);     /* Actual treatment */
    trt01an = input(actarmcd, best.); /* Actual treatment (N) */
    
    /* Age groupings for analysis */
    if age ne . then do;
        if age < 65 then agegr1 = '<65';
        else if age >= 65 then agegr1 = '>=65';
        
        if age < 18 then agegr1n = 1;
        else if 18 <= age < 65 then agegr1n = 2;
        else if age >= 65 then agegr1n = 3;
    end;
    
    /* Race groupings */
    if upcase(race) = 'WHITE' then racen = 1;
    else if upcase(race) = 'BLACK OR AFRICAN AMERICAN' then racen = 2;
    else if upcase(race) = 'ASIAN' then racen = 3;
    else racen = 4;
    
    /* Sex numeric */
    if upcase(sex) = 'M' then sexn = 1;
    else if upcase(sex) = 'F' then sexn = 2;
    
    /* Study day calculations */
    if trtsdt ne . and rfstdtc ne '' then
        trtsdt = input(rfstdtc, yymmdd10.);
    
    if trtedt ne . and rfendtc ne '' then
        trtedt = input(rfendtc, yymmdd10.);
    
    /* Format assignments */
    format trtsdt trtedt lstalvdt rficdt rfendt yymmdd10.;
    
    /* Variable labels */
    label studyid = 'Study Identifier'
          usubjid = 'Unique Subject Identifier'
          subjid = 'Subject Identifier for the Study'
          siteid = 'Study Site Identifier'
          age = 'Age'
          agegr1 = 'Pooled Age Group 1'
          agegr1n = 'Pooled Age Group 1 (N)'
          sex = 'Sex'
          sexn = 'Sex (N)'
          race = 'Race'
          racen = 'Race (N)'
          saffl = 'Safety Population Flag'
          ittfl = 'Intent-To-Treat Population Flag'
          complfl = 'Completers Population Flag'
          trt01p = 'Planned Treatment for Period 01'
          trt01pn = 'Planned Treatment for Period 01 (N)'
          trt01a = 'Actual Treatment for Period 01'
          trt01an = 'Actual Treatment for Period 01 (N)'
          trtsdt = 'Date of First Exposure to Treatment'
          trtedt = 'Date of Last Exposure to Treatment'
          trtdurd = 'Total Treatment Duration (Days)'
          lstalvdt = 'Last Known Alive Date'
          dcsreas = 'Reason for Discontinuation from Study';
run;

Quality Control & Validation

1. Population Reconciliation

/* Verify population counts */
proc freq data=adam.adsl;
    title "Analysis Population Counts";
    tables saffl ittfl complfl / missing;
run;

/* Cross-tabulation of populations */
proc freq data=adam.adsl;
    title "Population Overlap Analysis";
    tables saffl*ittfl*complfl / missing list;
run;

2. Treatment Assigment Validation

/* Planned vs Actual treatment comparison */
proc freq data=adam.adsl;
    title "Planned vs Actual Treatment";
    tables trt01p*trt01a / missing list;
run;

/* Treatment duration summary */
proc means data=adam.adsl n mean std min max;
    title "Treatment Duration Summary";
    var trtdurd;
    class trt01p;
run;

3. Data Integrity Checks

/* Check for missing key variables */
proc sql;
    title "Missing Key Variables Check";
    select 
        sum(case when usubjid = '' then 1 else 0 end) as missing_usubjid,
        sum(case when studyid = '' then 1 else 0 end) as missing_studyid,
        sum(case when saffl = '' then 1 else 0 end) as missing_saffl,
        sum(case when ittfl = '' then 1 else 0 end) as missing_ittfl
    from adam.adsl;
quit;

/* Validate date logic */
proc sql;
    title "Date Logic Validation";
    select usubjid, trtsdt, trtedt, trtdurd
    from adam.adsl
    where trtsdt > trtedt or (trtsdt ne . and trtedt ne . and trtdurd ne (trtedt-trtsdt+1));
quit;

4. Cross-Domain Consistency

/* Compare ADSL counts with SDTM DM */
proc sql;
    title "Subject Count Reconciliation";
    select 'SDTM.DM' as dataset, count(*) as n_subjects from sdtm.dm
    union
    select 'ADAM.ADSL' as dataset, count(*) as n_subjects from adam.adsl;
quit;

Results

Sample output:

SUBJID STUDYID TRT01P SAFFL ITTFL TRTSDT TRTEDT AGE SEX
001-001 XYZ123 Placebo Y Y 2023-01-15 2023-03-15 45 M
001-002 XYZ123 Active Y Y 2023-01-16 2023-03-14 52 F

Dataset Specifications

Aspect Specification
Structure One record per subject
Key Variable USUBJID (unique identifier)
Records N = total randomized subjects
Variables 25+ analysis variables
Populations Safety, ITT, Completers defined
Standards ADaM IG v1.3 compliant

Business Impact

Analytical Foundation

  • Primary Analysis: Enables all efficacy endpoints

  • Safety Analysis: Population definitions for AE analysis

  • Regulatory Submission: Core dataset for FDA/EMA review

  • Study Operations: Treatment duration and compliance metrics

Technical Achievements

  • Multi-domain Integration: Seamless SDTM to ADaM workflow

  • Business Logic Implementation: Complex population definitions

  • Data Quality: Comprehensive validation framework

  • Regulatory Compliance: 100% ADaM IG adherence

Contact

Interested in this project or my expertise in biostatistics and CDISC standards?

Send an Email | LinkedIn