Friday, January 23, 2015

ValueList Web Services in the "Smart Dataset-XML Viewer"

I now also implemented these services into the "Smart Dataset-XML Viewer". I still need to QC it and will then make the new version available through SourceForge.

Here is a snapshot of a VS dataset. It validates with no errors nor warnings in OpenCDISC (2.0).

 See something special?

Now I let the "Smart Dataset-XML Viewer" validate the data using the following web services:

  • check whether a CDISC unit is a correct unit for a given (VS) test code
    ({testcode}/{cdiscunit} )
  • check whether a Vital signs "position" (VSPOS) is a correct "position" for a given (VS) test code
    ({testcode}/{position} )
Here is the result:

 The "Smart Dataset-XML Viewer" finds the following problems:
  •  mm[Hg] is not a valid unit for VSTESTCD=SYSBP (second row)
    (remark that this data point came from an EHR, where UCUM notation is mandatory, but CDISC still does not allow UCUM...)
  • cm is not a valid unit for VSSTRESU with VSTESTCD=SYSBP (same row)
  • cm is not a valid unit for VSORRESU with VSTESTCD=SYSBP
    obviously a data management error (although a mapping error cannot be excluded)
  • SITTING is not a valid VSPOS ("position") with VSTESTCD=HEIGHT
Once again, this dataset passed without errors/warnings through OpenCDISC.
The reason is that the latter does not implement this kind of plausibility rules. It e.g. just checks whether "cm" is a valid member of the [UNIT] codelist (which it is). But of course it is not applicable to a blood pressure.

Now, one could implement such plausibility rules in software (hardcode it as OpenCDISC mostly does for other rules), but why do that (with zero transparency) when a web service is available?

I must explicitely thank Anthony Chow (CDISC) who published these rules in the form of an Excel worksheet (see "CT Mapping/Alignment Across CodeLists" at the CDISC-CT website).
All I did was move this information into a database and write the RESTful web service for it.

This kind of functionality is exactly what CDISC users want to see in SHARE. My implementation is just a prototype of "proof of concept", and of course I am talking with CDISC about how this kind of web services can be provided by the real SHARE.


  1. "Here is a snapshot of a VS dataset. It validates with no errors nor warnings in OpenCDISC (2.0)."

    It's not correct. OpenCDISC validator complains about non-standard term "mm(Hg)".

    Please keep in mind, that the OpenCDISC validation specifications you refer to are purely based on FDA Business Rules for SDTM data. The goal of OpenCDISC Validator is to provide executable FDA checks.

    Please believe me that OpenCDISC engine has enough mature functionality to handle Value Level validation.

    Kind Regards,
    Sergiy Sirichenko

  2. Thanks Sergiy - you are right, OpenCDISC would give an error on mm[Hg].
    But it should not (not OpenCDISC's fault) as 99.9% of the healthcare industry uses this notation (UCUM) and it it even mandatory in EHRs and I think also by Meaningful Use. The CDISC-CT team however decided to reinvent the wheel.
    This is however the only one and has no relationship with valuelevel metadata.

    I cannot consider the FDA rules as machine executable. The rules as published in the Excel Worksheets are not precise nor directly interpretable by machines. This makes them unsuitable for direct implementation in a validation engine: they first need to be interpreted by a human and then coded into software by a human.

    In computer science, machine-executable rules are rules that are as well readable by humans and directly executable by computers.

    It is the virtue of OpenCDISC that it has interpreted these rules and implemented them in executable software, but we must blame the FDA for having published rules that are not exact, in some cases even wrong (the ORRESU discussion), open for interpretation and not directly machine executable. WIth the (financial) resources they have, they could do much better.