Thursday, December 29, 2011

One step further - SDTM in XML

We do already submit the metadata of our SDTM submission in XML format (define.xml).
So why not submit the SDTM/SEND/ADaM data themselves as XML?
It would have so many advantages!
Only a few ones (I could name about 100):
  • really vendor-neutral (SAS XPT is not vendor-neutral at all)
  • vendors can much easier develop tools for working with the submissions, as there are so many libraries for working with XML in Java, C#, C++, Python, PHP, Perl, ... (I do not know of a single library in any of these languages to use with SAS XPT).
  • easier to validate (e.g. using XML-Schema and Schematron)
  • easy to display (through XSLT stylesheets)
  • easy to develop different "views" on the data (through stylesheets)
  • extremely easy to combine different datasets / studies etc.
  • display stylesheets can mark (e.g. by background color) violations to the SDTM-IG rules
  • much more compact than SAS XPT (one of the major complaints of the FDA is that they cannot open large SAS XPT files)
  • ...
But how could such an SDTM dataset look like?
Here is an example:


This is just simple ODM, and can be generated from the source ODM "ClinicalData" that every modern EDC system generates by a simple transformation - no expensive statistical software necessary. It can however also be generated very simply by using statistical software (if that is your preference).

Just a few remarks about the advantages (I will discuss many more in later posts):
  • the first line states 'TransactionType="Insert"'.
    One of the FDA complaints is that they obtain updates of submissions and then must load the complete data sets again into their tools, without any possibilities to compare the old data sets with the new ones. The "TransactionType" mechanism of ODM however allows to only send the updates themselves, i.e. only the data points that were changed, and it is always clear what the status of each data point is.
  • this is human-readable! Did you ever try to open SAS-XPT files with another tool than from SAS? SAS XPT is binary and so one needs to write special software (writing software is always expensive), or use SAS to be able to inspect the contents of the file
  • one can easily develop different stylesheets to get different "views" on the same set of data. With the conventional tools (SASViewer, JMP, ...) you get only one view: the tabular view.
You may see redundant information in the XML snippet. But these are due to the SDTM standard itself, as a result of the restrictions of the SAS XPT format itself: due to these restrictions, the developers of the SDTM standard had to add additional variables to make some information visible.

So the next step could be to write an extension (similar to define.xml) for submission data in XML that allow to get rid of much of the redundant information that is now present. This would further enable to reduce the file size of SDTM submission data sets.
But that is new material for another post.

Now that we talk about file size, it was the FDA that made the choice for SAS XPT in the past (although there were alternatives). SAS XPT format wastes enormous amounts of (disk) space. And now the same FDA is complaining about file size of SDTM submissions! Well, they got what they wanted (i.e. trouble) isn't it?
So, it is high time that the FDA starts investing in XML knowledge, as it is the standard for exchange of information worldwide, not only in the healthcare world, but also in the financial world, the travel world, in bioinformatics, in chemical research, in astronomy, etc. etc..

One could now state that a good alternative may be HL7-v3 messages (which are also based on XML, though more a "rape" of the XML standard). I wrote already about why that is not a good idea, and will also write some more about that in one of my next posts.

No comments:

Post a Comment