Saturday, October 28, 2017

LOINC and the mapping to SDTM-LB

LOINC is coming! Last year, when the FDA announced requiring the use LOINC coding in SDTM, there was a lot of panic (and resistence) in CDISC and at some sponsors. Recently (October 2017), the FDA then moved the date of absolute requirement to March 2020, but also stated that support for LOINC code starts immediately.

The mapping of (local) lab test codes to the lab test controlled terminology  in CDISC-SDTM can be challenging. With the wording of Paul Vervuren: "One has to find candidates in the extensive controlled terminology list. Then there can be multiple lab tests that map to a single SDTM controlled term. This means additional variables must be used in order to produce a unique test definition (e.g. LBCAT, LBSPEC, LBMETHOD and/or LBELTM). Finally, it can occur that a controlled term is not available and a code needs to be defined in agreement with the rules for Lab tests".

In this blog entry we will explain such a mapping starting from a lab test defined by a LOINC test code. LOINC is a worldwide used system (not just a list of terms) for test codes used in healthcare, not only for lab tests, but also for vital signs and many other tests. LOINC coding is found (or even mandated) to be used in:
  • HL7-v2 messages in hospital information systems (HIS)
  • Interoperable electronic health records (HL7-v3, CDA/CCD, FHIR)
  • messages that use the CDISC-Lab standard
Many laboratory instrument vendors already have the LOINC codes "baked in" into their instruments and devices. The IVD Industry Connectivity Consortium (IVD = In Vitro Diagnostics) fully embraces the use of LOINC and helps its members implementing it.  Also the FDA is a member of the consortium.

The SDTM lab test codes however are used nowhere in the world except for SDTM submissions. CDISC however does not allow (yet) to use LOINC codes for use in LBTESTCD, this although the LOINC code uniquely describes the lab test, which the CDISC controlled terminology for LBTESTCD does not at all, even not when in combination with LBCAT, LBSPEC, LBMETHOD, ...

So unfortunately, we need to map from a universal system (LOINC) to a local system (only used in submissions). How can this be done?

Let us take an example.

The LOINC code 73710-6 uniquely defines the test "Weed Allergen Mix 209 (Common ragweed+Western ragweed+Giant ragweed) IgE Ab [Units/volume] in Serum by Multidisk". It is usually measured in k[IU]/L (thousand international units per liter). "k[IU]/L" is UCUM notation, which is again a system for units, and not just a list like the CDISC [UNIT] controlled terminology.

Now, how do we map this test to SDTM?

First we must understand that the LOINC code is just a number for the "LOINC Name", which consists of 5-6 parts ("dimensions"). The "LOINC Name" of code 73710-6 is "(Ambrosia elatior+Ambrosia psilostachya+Ambrosia trifida) Ab.IgE:ACnc:Pt:Ser:Qn:Multidisk" with each part being separated by a colon (":"). So the parts are:
  • Component: (Ambrosia elatior+Ambrosia psilostachya+Ambrosia trifida) Ab.IgE
  • Property measured: ACnc ("arbitrary concentration")
  • Time aspect: Pt ("point in time")
  • System/Specimen: Ser (serum)
  • Scale: Qn (quantitative)
  • Method: Multidisk
The latter (method) is only added when absolutely necessary, e.g. when the results depend on the method used.
All this information can easily be retrieved using a local copy of the LOINC database, one of the several RESTful webservices for machine-to-machine communication, a simple Google search, or the LOINC search website. For example:


For the mapping, let us first start with LBTESTCD (lab test code). Can we find something in the latest (2017-09-29) CDISC codelist LBTESTCD (NCI code C65047)?
When searching for "Ambrosia", we can find 4 terms for "Ambrosia psilostachya pollen antigen XXX antibody" where "XXX" is either "IgA", "IgE", "IgG" or "IgG4" (C130092 to C130092 ). That's it. Anyway, considerable less than the number of hits (49) when searching for "Ambrosia" in LOINC. None of them fits, as our test is about a mixture of pollen. In the SDTM LBTESTCD codelist, if we look for "mix 209" or even for "209", we find ... nothing.
There are now 2 possibilities:
  1. extend the LBTESTCD codelist with our own invented term (8 characters maximum, not starting with a number.
  2. request a new term to the CDISC-CT development team
Option 1 is fine but means that this term will not be standardized (except maybe within our company). So for the regulatory authorities, it cannot be used for comparison with other studies
Option 2 means that we will need to wait 6 months or more for the new term to be approved (with the risk that our request is turned down). Do we want to delay our submission for 6 months?

As there is no "hit" for LBTESTCD, there cannot be one for "LBTEST" (laboratory test name) either. So also this codelist needs to be extended. The most logical choice seems to be to use the "LOINC long (common) name) which is "Weed Allergen Mix 209 (Common ragweed+Western ragweed+Giant ragweed) IgE Ab [Units/volume] in Serum by Multidisk". However, we can't as it is more than 40 characters long and LBTEST is limited to 40 characters due to a relict of the SAS Transport 5 limitations. So we need to shorten it, maybe to "Weed Allergen Mix 209 IgE Ab [Units/volume] in Serum by Multidisk", which is still more than 40 characters. Limiting to the absolutely necessary, we can use "Weed Allergen Mix 209 IgE Ab". Remark that the wording "in Serum" must be removed, as it belongs to LBSPEC (specimen) and the test name must be the same independent of the specimen type (at least in the CDISC-CT phylosophy).
Finally, less than 40 characters!

The next SDTM variable describing the test that needs to be populated is LBCAT. It is an "expected" SDTM variable, but there is no CDISC controlled terminology. So we need to define something ourselves. We can e.g. choose "ALLERGY ANTIGEN ANTIBODY". Essentially, it is almost useless for the regulatory authorities, as each sponsor will use different naming for the categories. If we look into the LOINC database, we can also use the value of "Class", which delivers "ALLERGY".

The next one is "LBSPEC" (specimen). It is "permissible", but we usually need it to at least try to uniquely describe the test. There is CDISC controlled terminology for this (codelist SPECTYPE, NCI code C78734) where we find the code "SERUM" (NCI code C119550).
This is very nice, but we also need to take into account that we will need to program this in our mapping scripts, as our computer does not know that "Ser" and "Serum" are the same thing.

The next SDTM variable is "LBMETHOD". Also here, it is subject to CDISC controlled terminology (codelist METHOD, NCI code C85492). This codelist does not only contain lab methods, but any type of methods for different SDTM domains. It does however not contain the term "MULTIDISK" which is clearly the one we need. So we again need to either extend the codelist or do a "new term request" and wait 6 months at least.

We still need to populate "LBORRESU" (Original Result Units). Unfortunately, it is under controlled terminology (i.m.o. a major SDTM design error) by the codelist "UNIT" (NCI code C71620). In this codelist, we don't find our unit "k[IU]/L" as it is UCUM notation and CDISC-CT still refuses to work with UCUM notation. So let us try "kIU/L" which would be the equivalent CDISC notation. No success either. We do however find "kIU/L" as a synonym for "IU/mL", i.e. 1 kIU/L = 1 IU/mL. Fortunately, searching synonym-test code pairs can be automated through a RESTful web service.
So we need to populate LBORRESU with "IU/mL", as the SDTM-IG states that "When sponsors have units that are not in this column, they should first check to see if their unit is a synonym of an existing unit and submit their lab values using that unit" (section 6.3 - Assumptions). Unfortunately, this also means that we loose traceability to the original unit which is "k[IU]/L".

So the mapping between our test with LOINC code 73710-6 and SDTM is not only tedious, but it also leads to a non-unique description of our test and we also loose traceability. We can of course also populate "LBLOINC" with our LOINC code, but this does not liberate us of populating LBTESTCD, LBTEST, etc.

But is this all necessary? Let us do a test, and create an SDTM-LB dataset. Being a rebellion, I put "L73710_6" as the value of LBTESTCD (adding an "L" in front and replacing the dash with an underscore due to the SAS-XPT rules) and add that to the (extended) codelist in my define.xml. For the other things, I do as described above.


Let us now inspect the record in the "Smart Dataset-XML Viewer", which is an open-source software for inspecting, visualizing and validating SDTM, SEND and ADaM datasets. Here is a snapshot:



And when keeping the mouse over the LBLOINC column:



showing us much more information, provided by a free RESTful web service.

When right-clicking the LBLOINC cell, a RESTful web service (delivered by the US National Library of Medicine) is triggered, popping up a window in our favorite browser, delivering even more information:


One could also submit the LOINC code to the UMLS RESTful web services to find relationships of this test with other tests, diagnoses for diseases (e.g. ICD-10) and much much more, thus building "networks of information and knowledge".
Can you do this with CDISC-CT for LBTESTCD? No way!
We are currently working on an application to generate and display such "networks of information and knowledge" and will later add it to the "Smart Dataset-XML Viewer".

Essentially, when looking at this, it means that when the LOINC code is provided, LBTESTCD, LBTEST, LBSPEC and a number of other SDTM-LB variables are completely unnecessary, as a the LOINC code already contains this information, but even in a better structured and consistent way.
However, the SDTM-IG still forces us to perform the tedious mappings to these (in this case unnecessary) variables. What a waste of time!

So it is really time to rethink the SDTM-LB domain for the case that the LOINC code is available (which will be the case for 95% of the data within the next few years). A first proposal has already been published, which can serve as a discussion start point for a better (or new) SDTM-LB domain.

Conclusions

Mapping from a LOINC code to CDISC controlled terminology can be very challenging. This not only applies to mapping starting from LOINC coding, but also to local lab codes. These problems are not due to the LOINC code system itself, but due to the SDTM controlled terminology being unable to uniquely describe lab tests, and the "reinvention of the wheel" of lab test codes by CDISC.
"In 5 years from now, everything will be e-Source in clinical research" is a statement I often hear. It also means that all our lab tests will be transmitted using LOINC coding. Instead of trying to map these to CDISC-SDTM, which is very tedious, we better should rethink SDTM and especially the LB domain and the controlled terminology for it. A first proposal for an "LB domain for use with electronic health record data" was already published. It can be used as a starting point for a discussion about a better SDTM domain and considerably better SDTM controlled terminology.

P.S. Special thanks to Thierry Lambert for pointing me to a few errors that have now been corrected.

Sunday, October 22, 2017

Trip to Japan: the Japanese CDISC Experience

Somewhat more than a week ago, I was in Japan (11-14 October) on invitation of UMIN (University hospital Medical Information Network), and in the capacity of my professorship in Medical Informatics at the University of Applied Sciences FH Joanneum. Essentially, I already started traveling on Monday 9th, from Graz to Vienna by train, as I needed an early flight to London Heathrow. So I staid overnight in Vienna. From Heathrow, I then had a flight to Tokyo Narita. Unlike the week before, when I traveled to Oxford, where I gave an ODM course at IDDO on behalf of CDISC Education, the voyage went very well, I could even sleep about 6 hours (the flight is 11 hours). Arriving at Narita airport, I still was happy that Yoshiteru Chiba, the initiator of this visit and driving force of CDISC in Japan, picked me up at the airport at noon.

We then traveled to Tokyo city, where I got a nice room in the University Hospital's Guest House, located on the Tokyo University campus, a very nice oasis in the middle of a busy city.

The afternoon was spend on a visit to UMIN itself, where I was very well received by Prof. Kiuchi, director of UMIN, and taken care of by the coworkers Ms. Karube and Ms. Watari. This was really necessary, as my voice was in not-too-good shape, as I catched a cold on my voyage to Oxford the week before. So they provided me with lots of hot drinks and sweeties that are good for the throat. Also Dr. Masafumi Okada was there, who did such marvelous things with the CDISC ODM standard, and surely is the number 1 expert on ODM in Japan.

On our way from the guest house to the UMIN offices, I also observed Japanese students playing American Football. This was a bit surprising for me - I presume this sport was introduced by American soldiers.

After diner in a nice local restaurant (popular with students),

Yoshiter Chiba, driving force of CDISC in Japan

With my colleague Dr. Masafumi Okada

 Thursday morning was devoted to a meeting with Japanese specialists in the field of clinical research, with representatives of several universities, and of representatives of AMED, the Japanese Agency for Medical Research and Development, a rather young organization that is coordinating medical and clinical research in Japan. This is a very good thing, as one needs to realize that due to demographic evolution and a low birth rate, 40% of the population will be over 65 years old in 2050!
During this meeting, I gave a short presentation titled "Integration of Electronic Health Records in Clinical Research", depicting the work done at CDISC, EHR4CR in Europe and my personal perspective on the use of electronic health records in clinical research. This meeting brought me great insight in the Japanese situation, and provided me many useful contacts. I was especially impressed by the presentation and work of Prof. Matsumura, who has set up a large system for the Osaka region. You can find one of his publications here. Also the other presentations were very interesting, but I can't describe them all here.

In the afternoon, we had the "CDISC Symposium", attended by over 150 participants, in one of the larger lecture halls at the university. You can find the program here (in Japanese). If you would like to obtain a copy in English, just drop me a mail. This symposium (mostly in Japanese) covered several aspects of CDISC in Japan, and also included a presentation from a PMDA respresentative.

Prof. Kiuchi (Director of UMIN) giving the introductory presentation

Also Dr. Masafumi Okada and Yushiteru Chiba provided presentations.


Dr. Okada's presentation was titled "Metadata Mapping and eSource". You can find the English version of his slides here.
It was then my turn to present (but not in Japanese 😉), and obtained very nice introductory words from Dr. Okada:


I gave a presentation titled "The Use of Electronic Health Records in Clinical Research - The Value of CDISC Standards".



The slides will be posted on the UMIN website very soon. If you would like to receive a copy sooner, please again let me know. Also all the slides of the presentations of the colleagues will be will be available soon.
Later that evening we then had a "Japanese hot plate cooking" (teppanyaki) at a local restaurant with all the symposium speakers. It was great fun (and tasted very good), but I was glad we had several specialists in the group explaining how to handle the dishes.


The next morning, we traveled to Nagoya to attend a CJUG (CDISC Japanese User Group) meeting by Shinkansen train. This high speed train (we drove a type N700A, maximum speed 285 km/h) only took somewhat more than 1.5 hours for the about 356 km trajectory. Unbelievable! And this is even not the fastest model! There is also the E5 model, which makes it up to 320 km/h! The train connection between the 2 largest cities in Austria (Graz and Vienna, distance 200km) takes 2.5 hours. Yes, our own train system is really underdeveloped.

The meeting was held in the "board hall" of the Nagoya Medical Center hospital and was attended by about 50 CJUG members (so I also exchanged that number of business cards 😊). Here is an impression from the meeting room:




The meeting started with a webconference presentation on Lauren Becnel of CDISC about the SHARE API 2.0. This brought us a number of new insights, bringing some hope that we can finally come to a modern set of CDISC standards that can be used through RESTful web services.
My own presentation was titled "CDISC and Artificial Intelligence", explaining what needs to change and needs to be done to be able to use artificial intelligence in working with CDISC standards. One of the topics was about machine-readable (SDTM and other) standards documents and especially Implementation Guides, as well as considerable better controlled terminology (now essentially just lists without system nor relationships between terms).
I will probably publish the slides in the near future.

After my own presentation, my voice was finally ruined, so I was unfortunately not able to discuss a lot with my Japanese colleagues, many I knew from the CJUG mailing lists. But it was great that I could finally meet all these great people in person.
As part of the meeting, we also had a visit to the local clinical research center, bringing even more insight into how clinical research works in Japan. We saw a list of all the research projects that were executed, some with very modern EDC systems, but also found projects with MS Excel as the "EDC system" even for larger clinical trials. With respect to that, Japan is not different from the rest of the world...

After the meeting part of the participants went to the city center by bus. I learned that in Nagoya, you pay when you drop of the bus using the exact amount in coins. Most people however pay electronically, using Felica (RFID) technology. Close to the train station (and our hotel for that night), we had a last very nice diner, where I was almost the only person not taking a beer (due to my throat problems):





Yohiteru Chiba (on the right on the picture) took care that we didn't make it late, as we would need to leave at 06:20 the next morning. We then traveled to Nagoya airport, where we flew back to Tokyo Narita airport. Unfortunately the weather was pretty bad, so that we did not get a look on Mount Fuji. At Tokyo airport, I then needed to say "goodbye" to Yoshiteru, who took so well care of me all these days. A big "thank you" to him, his colleagues at UMIN, and all these wonderful people who I met in Japan.
My journey was not come to an end yet. Taken from Nagoya, it took almost 24 hours before reaching Vienna (again over London Heathrow). In the next days, my voice only recovered slowly - it took several days before I could do presentations again.

This trip has been an unforgettable experience for me. Although I was in Japan for only a few days, I learned to appreciate the Japanese people and culture, and especially experienced the very high level in which CDISC standards are implemented, this altough the language barrier. This language barrier was also not-so-easy to overcome for me, and there is a large number of discussions I will need to follow up by E-mail. I will however do this with great pleasure.

Acknowlegments: Thank you to Dr. Masafumi Okada for providing the photographs. These are and remain under Dr. Okada's copyright.