Study Data Tabulation Model (SDTM) Mapping & Validation Project

CDISC
PINNACLE 21
SDTM
CLINICAL DATA
BIOSTATISTICS
DEFINE.XML
VALIDATION
Clinical Data Standards Implementation
Author

Ousmane DIALLO

Published

November 10, 2025

Keywords

SDTM, CDISC, Pinnacle 21, Define.xml, clinical trials, data standards

End-to-end SDTM mapping project demonstrating expertise in clinical data standards. Resolved 446 validation errors (78% reduction) while creating 8 CDISC-compliant domains following SDTMIG v3.4 specifications.


Executive Summary

Key Achievement: Reduced critical validation errors from 579 to 375 through systematic debugging, metadata corrections, and proper implementation of CDISC Controlled Terminology.


Project Overview

Aspect Details
Standards SDTMIG v3.4, CDISC CT 2024-09-27
Tool Pinnacle 21 Community Validator
Data 60 subjects, multi-site clinical trial
Domains Created DM, LB, AE, TS, EX, XP, TA, TE
Deliverables 8 SDTM datasets + Define.xml v2.0

Results

Validation Metrics

Error Resolution by Domain

Domain Initial Final Resolved Improvement
LB 415 37 378 91%
DM 113 55 58 51%
AE 43 39 4 9%
Total 579 375 446 35%

Key Challenges Solved

1. Duplicate Records Error (SD1152)

Problem: 18 duplicate records reported, but SQL confirmed no actual duplicates.

Root Cause: Define.xml used incorrect Key Variables for SDTMIG v3.4.

Solution: Updated Define.xml from incorrect keys (STUDYID, USUBJID, LBTESTCD, VISITNUM) to correct SDTMIG v3.4 keys (STUDYID, DOMAIN, USUBJID, LBSEQ).

Result: ✅ All 18 false-positive errors eliminated


2. Controlled Terminology Violations (360 errors)

Problem: LBTEST values contained multiple synonyms not in CDISC CT.

Example:

❌ "Alanine Aminotransferase; SGPT"
✅ "Alanine Aminotransferase"

Solution: Extracted only CDISC CT standard term from multi-synonym strings.

DATA lb;
  SET lb;
  IF INDEX(lbtest, ';') > 0 THEN 
    lbtest = STRIP(SCAN(lbtest, 1, ';'));
RUN;

Result: ✅ 360 CT errors resolved (CT2002 + CT2003)


3. Cross-Domain Data Completeness

Problem: 50 subjects in DM had no laboratory data in LB.

Investigation: Source LAB file contained only 10 of 60 subjects.

Conclusion: Data management issue requiring escalation. Cannot be resolved through programming alone.

Action: Documented in Study Data Reviewer’s Guide (SDRG) for stakeholder review.


Technical Highlights

SDTMIG v3.4 Key Variables

Implemented correct key structure per SDTMIG v3.4:

  • DM: STUDYID, DOMAIN, USUBJID
  • LB: STUDYID, DOMAIN, USUBJID, LBSEQ
  • AE: STUDYID, DOMAIN, USUBJID, AESEQ

CDISC Controlled Terminology

Successfully mapped all controlled variables:

  • Laboratory Test Names (LBTEST/LBTESTCD)
  • Trial Summary Parameters (TSPARM/TSPARMCD)
  • Units (–ORRESU, –STRESU)

Skills Demonstrated

Technical

  • SDTM Implementation (v3.4)
  • Define.xml creation (ODM)
  • CDISC Controlled Terminology
  • SAS Programming
  • Pinnacle 21 Validation

Analytical

  • Root cause analysis
  • Cross-domain validation
  • Metadata management
  • Quality assurance
  • Documentation

Deliverables

  • ✅ 8 SDTM datasets (XPT format)
  • ✅ Define.xml v2.0 (ODM)
  • ✅ Validation reports (before/after)
  • ✅ Issue resolution documentation

Key Takeaways

  1. Metadata is critical - 18 “duplicate” errors were Define.xml issues, not data issues
  2. Source data quality matters - Early data review prevents downstream problems
  3. CT compliance - Raw data often requires cleaning to align with CDISC standards
  4. Version awareness - SDTMIG v3.4 key variables differ significantly from v3.3

Remaining Issues

Issue Count Status
50 subjects without LB data 50 Source data incomplete (escalated)
TS/XP domain issues 240 New domains requiring specifications
Minor quality issues 85 Non-critical, justifiable
NoteProfessional Context

Remaining errors are attributable to source data limitations and newly created domains - both realistic scenarios addressed through stakeholder collaboration in production environments.


Contact

Questions about this TLF implementation or clinical trial reporting expertise?

LinkedIn | View All Projects


Ousmane Diallo, MPH-PhD – Biostatistician, Data Scientist & Epidemiologist based in Chicago, Illinois, USA. Specializing in SAS programming, CDISC standards, and real-world evidence for clinical research.

Back to top