Saturday, September 20, 2014

SDTM: let the service do the work - not the dataset

This week, I found some time to continue working on SDTM. Or better: on services for SDTM. In my previous blog entry, I already showed how web services can help working with controlled terminology such as the LOINC codelist for laboratory tests.
I know extended this for CDISC controlled terminology (CDISC-CT) in general, based on the work of my student Wolfgang Hof. First, I download the latest CDISC-CT (june 26) from the NCI website as a set of XML files. Starting from these, I generated and populated a database with about 6 tables. I then wrote some RESTful services so that remote applications can retrieve information for answering questions like:
  • what is the test name for test code XYZ?
  • what is the NCI code for test code or test name XYZ (or the other way around)?
  • what is the CDISC definition of controlled term ABC?
  • are there any synonyms for controlled term ABC?
I hope to make these services available for the general public in the next few weeks.

Then I implemented a good number of these services in the "Smart Dataset-XML Viewer". Here is a screenshot as an example:

What you see that is when the user hovers the mouse over a test code (in this case a LBTESTCD: SPGRAV), the web service is triggered, the test name and NCI code is retrieved from the remote server/database and displayed as a tooltip on the cell contents.
When the user right-clicks the LBTESTCD cell, the web service is triggered and looks up the "CDISC definition" for the given test code and displays it in a separate window (left upper corner).
When the user right-clicks the LOINC code for this test (in this case 2965-2) a request is send to the RESTful web service of the "National Library of Medicine", returning the address of a website with explanations about the test, which is then displayed in a browser window that pops up.

On the right, you also see some yellow-colored cells. These indicate that there is something special with the data. In the current case, the cell is colored because its value is lower than the low normal range limit. This is not done by a web service, but by the viewer software itself. Thus, when using this feature, the SDTM variable LBNRIND is superfluous and can be removed from the SDTM specification ("let the service do the work - not the dataset"). Other such features that are already present in the "Smart Dataset-XML Viewer" are:

  • show date of first and last exposure in the DM dataset (retrieved from EX)
  • show --DY value on any --DTC value (calculated from difference with RFSTDTC in DM)
  • show visit name on VISITNUM (retrieved from TV)
Essentially this means that many of the SDTM variables (all the ones that are "derived") are superfluous. We estimate that about 1 in 3 SDTM variables could be removed from the SDTM-IG as they can be calculated "on the fly" from the data that are already present in the datasets, or being retrieved by a web service. For example, all --TEST variables are superfluous, as their value can be obtained from a web service.

Now, this is just the tip of the iceberg. So many other things are possible which can considerably contribute to data quality in SDTM submissions. A few examples:

  • the web service informs about what the usual units for the test or observation are. For example: mm[Hg] for SYSBP and DIABP, cm and [in_i] (inches) for WEIGHT, no units for SPGRAV. This can be used to test whether the combination of ORRES and ORRESU is reasonable and acceptable
  • if it were allowed to use UCUM notation for ORRESU/STRESU (unfortunately it is not yet, although all EHR systems and Hospital Information Systems work with UCUM - it is even mandated by Meaningful Use), then the value of --STRESN could be automatically calculated. The combination of the value of --TESTCD with --ORRES and --ORRESU could be send to the web service with the request "please calculate the standardized numerical value", as the web service already knows to what unit the value must be standardized to for the specific test. This would even enable to have such normalizations as some of the values for blood pressure are e.g. in [psi] (pounds per square inch)
In my opinion, these are the kind of features and services people will expect from SHARE in the future. SHARE should be more than a repository of standard specifications, it should behave as an semi-intelligent system that help sponsors and reviewers improve data quality of electronic submissions.

For those who like these features of the "Smart Dataset-XML Viewer" and these web services, I am still working on improving the features and extending them, and I hope to make a new (branched off) version of the Viewer available on the Sourceforge website within the next 1-3 weeks. So please remain a bit patient ...

Comments are of course always welcome!

1 comment:

  1. NCI vocabulary should be accessible via Common Terminology Services. There is an informatics standard for this problem. (so that would be the best approach). But in practice - doing it manually (locally) may be best