
Study Data Tabulation Model (SDTM) Mapping & Validation Project
SDTM, CDISC, Pinnacle 21, Define.xml, clinical trials, data standards
End-to-end SDTM mapping project demonstrating expertise in clinical data standards. Resolved 446 validation errors (78% reduction) while creating 8 CDISC-compliant domains following SDTMIG v3.4 specifications.
Executive Summary
Key Achievement: Reduced critical validation errors from 579 to 375 through systematic debugging, metadata corrections, and proper implementation of CDISC Controlled Terminology.
Project Overview
| Aspect | Details |
|---|---|
| Standards | SDTMIG v3.4, CDISC CT 2024-09-27 |
| Tool | Pinnacle 21 Community Validator |
| Data | 60 subjects, multi-site clinical trial |
| Domains Created | DM, LB, AE, TS, EX, XP, TA, TE |
| Deliverables | 8 SDTM datasets + Define.xml v2.0 |
Results
Validation Metrics
Error Resolution by Domain
| Domain | Initial | Final | Resolved | Improvement |
|---|---|---|---|---|
| LB | 415 | 37 | 378 | 91% |
| DM | 113 | 55 | 58 | 51% |
| AE | 43 | 39 | 4 | 9% |
| Total | 579 | 375 | 446 | 35% |
Key Challenges Solved
1. Duplicate Records Error (SD1152)
Problem: 18 duplicate records reported, but SQL confirmed no actual duplicates.
Root Cause: Define.xml used incorrect Key Variables for SDTMIG v3.4.
Solution: Updated Define.xml from incorrect keys (STUDYID, USUBJID, LBTESTCD, VISITNUM) to correct SDTMIG v3.4 keys (STUDYID, DOMAIN, USUBJID, LBSEQ).
Result: ✅ All 18 false-positive errors eliminated
2. Controlled Terminology Violations (360 errors)
Problem: LBTEST values contained multiple synonyms not in CDISC CT.
Example:
❌ "Alanine Aminotransferase; SGPT"
✅ "Alanine Aminotransferase"
Solution: Extracted only CDISC CT standard term from multi-synonym strings.
DATA lb;
SET lb;
IF INDEX(lbtest, ';') > 0 THEN
lbtest = STRIP(SCAN(lbtest, 1, ';'));
RUN;
Result: ✅ 360 CT errors resolved (CT2002 + CT2003)
3. Cross-Domain Data Completeness
Problem: 50 subjects in DM had no laboratory data in LB.
Investigation: Source LAB file contained only 10 of 60 subjects.
Conclusion: Data management issue requiring escalation. Cannot be resolved through programming alone.
Action: Documented in Study Data Reviewer’s Guide (SDRG) for stakeholder review.
Technical Highlights
SDTMIG v3.4 Key Variables
Implemented correct key structure per SDTMIG v3.4:
- DM: STUDYID, DOMAIN, USUBJID
- LB: STUDYID, DOMAIN, USUBJID, LBSEQ
- AE: STUDYID, DOMAIN, USUBJID, AESEQ
CDISC Controlled Terminology
Successfully mapped all controlled variables:
- Laboratory Test Names (LBTEST/LBTESTCD)
- Trial Summary Parameters (TSPARM/TSPARMCD)
- Units (–ORRESU, –STRESU)
Skills Demonstrated
Technical
- SDTM Implementation (v3.4)
- Define.xml creation (ODM)
- CDISC Controlled Terminology
- SAS Programming
- Pinnacle 21 Validation
Analytical
- Root cause analysis
- Cross-domain validation
- Metadata management
- Quality assurance
- Documentation
Deliverables
- ✅ 8 SDTM datasets (XPT format)
- ✅ Define.xml v2.0 (ODM)
- ✅ Validation reports (before/after)
- ✅ Issue resolution documentation
Key Takeaways
- Metadata is critical - 18 “duplicate” errors were Define.xml issues, not data issues
- Source data quality matters - Early data review prevents downstream problems
- CT compliance - Raw data often requires cleaning to align with CDISC standards
- Version awareness - SDTMIG v3.4 key variables differ significantly from v3.3
Remaining Issues
| Issue | Count | Status |
|---|---|---|
| 50 subjects without LB data | 50 | Source data incomplete (escalated) |
| TS/XP domain issues | 240 | New domains requiring specifications |
| Minor quality issues | 85 | Non-critical, justifiable |
Remaining errors are attributable to source data limitations and newly created domains - both realistic scenarios addressed through stakeholder collaboration in production environments.
Contact
Questions about this TLF implementation or clinical trial reporting expertise?
Ousmane Diallo, MPH-PhD – Biostatistician, Data Scientist & Epidemiologist based in Chicago, Illinois, USA. Specializing in SAS programming, CDISC standards, and real-world evidence for clinical research.
Back to top