Analysis Data Subject Level (ADSL) Generation with SAS
ADaM, ADSL, CDISC, SAS, clinical trials, biostatistics, regulatory
Analysis Data Subject Level (ADSL) Generation with SAS
Automated creation of ADSL dataset for clinical trials analysis using ADaM standards and multi-domain SDTM integration.
Introduction
The Analysis Data Subject Level (ADSL) dataset is the cornerstone of clinical trial statistical analysis under the CDISC Analysis Data Model (ADaM) standard. ADSL provides one record per subject containing all subject-level information needed for efficacy and safety analyses.
What is ADSL?
ADSL (Analysis Data Subject-Level) dataset is the foundation dataset in ADaM.
It:
Contains one record per subject
Includes required subject-level variables (demographic, disposition, baseline values, key date, etc..) as well as other subject-level analysis variables
Defines analysis populations (Safety, Intention-To-Treat, Per-Protocol)
Provides treatment information (planned and actual treatment) and study periods variables
Links to all other ADaM datasets through USUBJID (unique subject identifier)
Ensures consistency and traceability across all statistical analyses and reporting
Regulatory Importance
FDA Requirement: Essential for regulatory submissions
Analysis Foundation: Referenced by all Basic Data Structure (BSD) and occurrence Data Structure (OCCDS) datasets
Population Definition: Critical for primary endpoint analyses
Traceability: Links analysis results back to SDTM source
Project Overview
Objective
Create a fully compliant ADSL dataset from multiple SDTM domains, implementing analysis populations, treatment variables, and derived endpoints following ADaM Implementation Guide principles.
Key Benefits
ADSL Dataset: Subject-level analysis file with proper ADaM structure
Population Flags: SAFFL, ITTFL, COMPLFL, MITT1FL definitions
Treatment Variables: Planned and actual treatment assignments
Timing Variables: Study milestones and treatment periods
Validation Reports: QC checks ensuring data integrity
Technical Approach
Multi-domain integration workflow:
SDTM (DM+DS+EX+AE) → Business Logic → Population Flags → Treatment Periods → ADSL
Data Integration Strategy
SDTM Source Domains
Domain | Purpose | Key Variables |
---|---|---|
DM | Demographics & baseline | USUBJID, AGE, SEX, RACE, ARM |
DS | Disposition events | DSSTDTC, DSTERM, DSDECOD |
EX | Exposure/treatment | EXSTDTC, EXENDTC, EXDOSE |
AE | Adverse events | AESTDTC, AETERM, AESEV |
VS | Vital signs (if needed) | VSSTDTC, VSTEST, VSORRES |
ADaM Variables to Derive
Variable Class | Examples | Source Logic |
---|---|---|
Required ADaM | STUDYID, USUBJID, SUBJID | Direct from DM |
Population Flags | SAFFL, ITTFL, COMPLFL | Business rules from DS/EX |
Treatment | TRT01P, TRT01A, TRT01PN | From DM.ARM + actual exposure |
Timing | TRTSDT, TRTEDT, LSTALVDT | From EX + DS disposition |
Demographics | AGE, AGEGR1, SEX, RACE | Enhanced from DM |
Technical Implementation
1. Library Setup
/* Library and macro setup */
%include "/home/u64197545/ADaM/programs/setup.sas";
/* Expected libraries:
- SDTM: Source SDTM datasets
- ADAM: Output ADaM datasets
- LIBRARY: Formats and macros
*/
libname sdtm "/home/u64197545/data/sdtm";
libname adam "/home/u64197545/data/adam";
libname library "/home/u64197545/data/library";
/* Load standard ADaM formats */
%include library(adamfmt);
2. Population Definition Logic
/* Define analysis populations based on business rules */
%macro define_populations();
/* Safety Population: Any exposure to study drug */
proc sql;
create table safety_pop as
select distinct usubjid, 'Y' as saffl
from sdtm.ex
where exdose > 0 and exstdtc ne '';
quit;
/* ITT Population: Randomized subjects */
proc sql;
create table itt_pop as
select distinct usubjid, 'Y' as ittfl
from sdtm.dm
where arm ne '' and upcase(arm) not in ('SCREEN FAILURE', 'NOT TREATED');
quit;
/* Completed Population: Finished study per protocol */
proc sql;
create table completed_pop as
select distinct usubjid, 'Y' as complfl
from sdtm.ds
where upcase(dsdecod) in ('COMPLETED', 'STUDY COMPLETED')
and dscat = 'DISPOSITION EVENT';
quit;
%mend define_populations;
%define_populations();
3. Treatment Period Derivation
/* Calculate treatment start and end dates */
proc sql;
create table treatment_period as
select usubjid,
min(exstdtc) as first_dose_dt format=yymmdd10.,
max(case when exendtc ne '' then exendtc else exstdtc end) as last_dose_dt format=yymmdd10.,
sum(case when exdose > 0 then 1 else 0 end) as exposure_days
from sdtm.ex
where exstdtc ne ''
group by usubjid;
quit;
/* Derive analysis treatment dates */
data treatment_analysis;
set treatment_period;
/* Treatment start/end dates for analysis */
trtsdt = first_dose_dt;
trtedt = last_dose_dt;
/* Treatment duration */
if trtsdt ne . and trtedt ne . then
trtdurd = trtedt - trtsdt + 1;
format trtsdt trtedt yymmdd10.;
run;
4. Study Disposition Integration
/* Get key disposition dates and reasons */
proc sql;
create table disposition as
select usubjid,
case
when upcase(dsdecod) = 'INFORMED CONSENT' then dsstdtc
end as rficdt format=yymmdd10.,
case
when upcase(dsdecod) in ('COMPLETED', 'DISCONTINUED') then dsstdtc
end as rfendt format=yymmdd10.,
case
when upcase(dsdecod) = 'DISCONTINUED' then dsterm
end as dcsreas
from sdtm.ds
where dscat = 'DISPOSITION EVENT' and dsstdtc ne '';
quit;
/* Consolidate to one record per subject */
proc sql;
create table disposition_final as
select usubjid,
max(rficdt) as rficdt format=yymmdd10.,
max(rfendt) as rfendt format=yymmdd10.,
max(dcsreas) as dcsreas
from disposition
group by usubjid;
quit;
5. Last Alive Date Calculation
/* Calculate last known alive date from multiple sources */
proc sql;
create table last_alive as
select usubjid,
max(aestdtc) as last_ae_dt format=yymmdd10.
from sdtm.ae
where aestdtc ne ''
group by usubjid;
quit;
data last_alive_calc;
merge treatment_analysis
disposition_final
last_alive;
by usubjid;
/* Last alive date = latest of multiple sources */
lstalvdt = max(trtedt, rfendt, last_ae_dt);
format lstalvdt yymmdd10.;
run;
6. Combine All ADSL Dataset
/* Merge all components to create ADSL */
proc sql;
create table pre_adsl as
select dm.*,
sp.saffl,
itt.ittfl,
cp.complfl,
ta.trtsdt,
ta.trtedt,
ta.trtdurd,
lac.lstalvdt,
disp.rficdt,
disp.rfendt,
disp.dcsreas
from sdtm.dm as dm
left join safety_pop as sp on dm.usubjid = sp.usubjid
left join itt_pop as itt on dm.usubjid = itt.usubjid
left join completed_pop as cp on dm.usubjid = cp.usubjid
left join treatment_analysis as ta on dm.usubjid = ta.usubjid
left join last_alive_calc as lac on dm.usubjid = lac.usubjid
left join disposition_final as disp on dm.usubjid = disp.usubjid;
quit;
/* Apply ADaM standards and derive final variables */
data adam.adsl;
set pre_adsl;
/* Required ADaM variables */
studyid = upcase(studyid);
/* Population flags - default to 'N' if missing */
if saffl = '' then saffl = 'N';
if ittfl = '' then ittfl = 'N';
if complfl = '' then complfl = 'N';
/* Treatment variables */
trt01p = strip(arm); /* Planned treatment */
trt01pn = input(armcd, best.); /* Planned treatment (N) */
trt01a = strip(actarm); /* Actual treatment */
trt01an = input(actarmcd, best.); /* Actual treatment (N) */
/* Age groupings for analysis */
if age ne . then do;
if age < 65 then agegr1 = '<65';
else if age >= 65 then agegr1 = '>=65';
if age < 18 then agegr1n = 1;
else if 18 <= age < 65 then agegr1n = 2;
else if age >= 65 then agegr1n = 3;
end;
/* Race groupings */
if upcase(race) = 'WHITE' then racen = 1;
else if upcase(race) = 'BLACK OR AFRICAN AMERICAN' then racen = 2;
else if upcase(race) = 'ASIAN' then racen = 3;
else racen = 4;
/* Sex numeric */
if upcase(sex) = 'M' then sexn = 1;
else if upcase(sex) = 'F' then sexn = 2;
/* Study day calculations */
if trtsdt ne . and rfstdtc ne '' then
trtsdt = input(rfstdtc, yymmdd10.);
if trtedt ne . and rfendtc ne '' then
trtedt = input(rfendtc, yymmdd10.);
/* Format assignments */
format trtsdt trtedt lstalvdt rficdt rfendt yymmdd10.;
/* Variable labels */
label studyid = 'Study Identifier'
usubjid = 'Unique Subject Identifier'
subjid = 'Subject Identifier for the Study'
siteid = 'Study Site Identifier'
age = 'Age'
agegr1 = 'Pooled Age Group 1'
agegr1n = 'Pooled Age Group 1 (N)'
sex = 'Sex'
sexn = 'Sex (N)'
race = 'Race'
racen = 'Race (N)'
saffl = 'Safety Population Flag'
ittfl = 'Intent-To-Treat Population Flag'
complfl = 'Completers Population Flag'
trt01p = 'Planned Treatment for Period 01'
trt01pn = 'Planned Treatment for Period 01 (N)'
trt01a = 'Actual Treatment for Period 01'
trt01an = 'Actual Treatment for Period 01 (N)'
trtsdt = 'Date of First Exposure to Treatment'
trtedt = 'Date of Last Exposure to Treatment'
trtdurd = 'Total Treatment Duration (Days)'
lstalvdt = 'Last Known Alive Date'
dcsreas = 'Reason for Discontinuation from Study';
run;
Quality Control & Validation
1. Population Reconciliation
/* Verify population counts */
proc freq data=adam.adsl;
title "Analysis Population Counts";
tables saffl ittfl complfl / missing;
run;
/* Cross-tabulation of populations */
proc freq data=adam.adsl;
title "Population Overlap Analysis";
tables saffl*ittfl*complfl / missing list;
run;
2. Treatment Assigment Validation
/* Planned vs Actual treatment comparison */
proc freq data=adam.adsl;
title "Planned vs Actual Treatment";
tables trt01p*trt01a / missing list;
run;
/* Treatment duration summary */
proc means data=adam.adsl n mean std min max;
title "Treatment Duration Summary";
var trtdurd;
class trt01p;
run;
3. Data Integrity Checks
/* Check for missing key variables */
proc sql;
title "Missing Key Variables Check";
select
sum(case when usubjid = '' then 1 else 0 end) as missing_usubjid,
sum(case when studyid = '' then 1 else 0 end) as missing_studyid,
sum(case when saffl = '' then 1 else 0 end) as missing_saffl,
sum(case when ittfl = '' then 1 else 0 end) as missing_ittfl
from adam.adsl;
quit;
/* Validate date logic */
proc sql;
title "Date Logic Validation";
select usubjid, trtsdt, trtedt, trtdurd
from adam.adsl
where trtsdt > trtedt or (trtsdt ne . and trtedt ne . and trtdurd ne (trtedt-trtsdt+1));
quit;
4. Cross-Domain Consistency
/* Compare ADSL counts with SDTM DM */
proc sql;
title "Subject Count Reconciliation";
select 'SDTM.DM' as dataset, count(*) as n_subjects from sdtm.dm
union
select 'ADAM.ADSL' as dataset, count(*) as n_subjects from adam.adsl;
quit;
Results
Sample output:
SUBJID | STUDYID | TRT01P | SAFFL | ITTFL | TRTSDT | TRTEDT | AGE | SEX |
---|---|---|---|---|---|---|---|---|
001-001 | XYZ123 | Placebo | Y | Y | 2023-01-15 | 2023-03-15 | 45 | M |
001-002 | XYZ123 | Active | Y | Y | 2023-01-16 | 2023-03-14 | 52 | F |
Dataset Specifications
Aspect | Specification |
---|---|
Structure | One record per subject |
Key Variable | USUBJID (unique identifier) |
Records | N = total randomized subjects |
Variables | 25+ analysis variables |
Populations | Safety, ITT, Completers defined |
Standards | ADaM IG v1.3 compliant |
Business Impact
Analytical Foundation
Primary Analysis: Enables all efficacy endpoints
Safety Analysis: Population definitions for AE analysis
Regulatory Submission: Core dataset for FDA/EMA review
Study Operations: Treatment duration and compliance metrics
Technical Achievements
Multi-domain Integration: Seamless SDTM to ADaM workflow
Business Logic Implementation: Complex population definitions
Data Quality: Comprehensive validation framework
Regulatory Compliance: 100% ADaM IG adherence
Contact
Interested in this project or my expertise in biostatistics and CDISC standards?