Friday, July 7, 2017

CDISC-CT offers more than you think!

But even CDISC doesn't know that ...

Now that classes, exams and theses corrections are all done, I today finally found the time to work on one of my favorite topics: connecting CDISC Controlled Terminology (CDISC-CT) with other (and better) terminologies.

As I already stated in prior posts, CDISC-CT just consists of lists, with almost no relations between terms in the lists. The only (indirect) relations that are described are these between --TESTCD (test code) and --TEST (test name) through the NCI code.
For example, in CDISC-CT the only relationship between "diastolic blood pressure" (DIABP, NCI code C25299) and "systolic blood pressure" (SYSBP, NCI code C25298) is that they are both a vital signs test (NCI code C66741). But so are "height" (HEIGHT, C25347) and  "body frame size" (FRMSIZE, C49680). So the relationship between SYSBP and DIABP is exactly the same as between SYSBP and HEIGHT. We do all however know that systolic and diastolic blood pressure are highly related, and that there is no relation at all between systolic blood pressure and height.
All this knowledge is NOT in CDISC-CT, as it only contains ... lists.

However, other people are smarter and have developed UMLS, the "Unified Medical Language System". UMLS tries to connect all terminologies in medicine and healthcare, and very fortunately this includes CDISC-CT (through NCI controlled terminology).
Pretty recently, the National Library of Medicine NLM made a RESTful web service available for working with UMLS. It allows to submit a term or code in one system (e.g. CDISC-CT, LOINC, SNOMED-CT) and then ask for all related terms (parents, childs, mappings to other terminology systems, and much more) of the submitted term. One can then use the result list in different way, e.g. pick a related term, submit it and ask for related terms, get the list, pick ...
Like that, one can perform "chaining" and build networks of related terms with different kinds of relationships, and this not only within a specific coding system, but also between coding systems.

The RESTful API is not so easy to work with, as it requires a registration and works with "ticket granting tickets", allowing to retrieve a "ticket", which is only valid for a single REST request. Also, the response comes as JSON, which is not my big strength yet, so I transform that to XML, which is then parsed to retrieve the information.

My first "chaining" experiments were pretty successful. I developed some simple software that allows to submit a CDISC-CT, LOINC or SNOMED-CT term (others to come), and than (from the response) produces a list of mapped terms (in other systems), parent terms (in the same or other system) and child terms. The user can then select one of the related items, and submit that for further chaining. At the moment, the software is still very simple, and choices must still be provided through the console.

[SCREENSHOT TO COME HERE]

As I already stated, if you submit a CDISC-CT term, and ask for the parent and child terms, you won't come far, as such relations are merely present in CDISC-CT. The nice thing however is that the smart people at NIH and NLM added them as well as is possible. So you will e.g. find that CDISC "ALBGLYCA" (Glycated Albumin test) is a child of "ALB" (Albumin test), although that is not described in CDISC-CT at all.

I first did something very simple: I submitted CDISC-CT "ALB" (C64431) and asked for parent and child element. Remember that CDISC-CT as published by CDISC does not provide any such information. Here is a selection of the result (only most interesting terms are displayed):


 Ok - extremely simple, but already much more than is in CDISC-CT itself!

I did something very similar for CDISC-CT "DIABP" (C25299). Here are the results in a simple tabular way (instead of a picture). Again, this is a selection only:


Remark that term NCI C54706 is even not in CDISC SDTM-CT!

I also tried out "chaining". I again started from DIABP and then first looked for the child terms, picked on of them, looked for the child elements, ... Here is a partial result:

And similar, but then looking for ancestors:


I already hear a lot of my colleagues scream "why don't you use semantic web and RDF"?
They are completly right! But I am still at the beginning, exploring the possibilities with the RESTful web services, thinking about filters (no, I do not want all the MESH translations in my results), thinking about using this in a way that makes sense, optimizing my code (for each RESTful request, one need to retrieve a new "ticket", which makes it pretty slow).

I have already a masterproject in mind for a good student, building a graphical interface around this, so that the user can just click on a node, and either all parents, children or mappings to other code systems (or all of them) are generated through the RESTful web service and then displayed with the possibility for user-selected filters, ...




All I wanted to show today is that when using UMLS, there is more in CDISC-CT than one thinks, but even CDISC does not know that ...


No comments:

Post a Comment