Saturday, January 13, 2018

Why changing "Submission Value" into "Preferred Term" is a bad idea

The CDISC-CT team recently published a new Controlled Terminology Package 33 for public review. At the same time, a proposal for changing the column header from "CDISC Submission Value" to "CDISC Preferred Term" was published:


In this blog, I will explain why this is a bad idea and why CDISC members should protest against it.

You can already find my own protest here:

First of all, we need to take into account that CDISC controlled terminology is based on tradition rather than on science. CDISC controlled terminology is a set of "lists", without any relations between the terms. CDISC members can ask to add terms based on their own, local usage of a term.
For example, last automn, I asked to add "centimeter mercury column" to the "UNIT" list as in the country I originate from (Belgium) blood pressure is measured (by tradition) in "centimeter mercury column" rather than in "millimeter mercury column". So CDISC added it to the list. What is however not visible from that list is what the relation is between "centimeter mercury column" and "millimeter mercury column". As a human, I know that 1 cmHg = 10 mmHg. But how does my computer know that? Does the CDISC-CT allow to know how to convert "pounds per square inch" into "millimeter mercury column"? If CDISC would allow UCUM notation, such unit conversions can easily be automated. And how does my computer know that (for CDISC codes) "SEVERE" is worse than "MODERATE" is worse than "MILD"? This all is not part of CDISC-CT.

Also, CDISC is publishing codelists for things it has no authority in. For example, it publishes "lists" of microorganisms (codelist MICROORG), whereas specialists in the field have developed taxonomies (for example NCBI) and also SNOMED-CT has a full taxonomy of microorganisms: 


The NCBI and SNOMED-CT taxonomies of microorganisms is based on science, the CDISC "list" of microorganisms is based on allowing members to add terms to the list based on the tradition how they name a microorganism locally. In the CDISC-CT list of microorganisms, you will not find any information on how these organisms are related to each other - it is just a list.

There are some cases where these "lists" based on tradition make sense, for example for "vital signs test code" (VSTESTCD/VSTEST), although this is also already covered by a scientific taxonomy developed by LOINC:

We indeed need to realize that LOINC is not yet used in every hospital, although it is mandated to be used in electronic health records in many countries and by the US "Meaningful Use" program, so such a VSTESTCD codelist can be used as a temporary solution, but it should not be forever.



So, the proposal to change the column header from "CDISC Submission Value" to "CDISC Preferred Term" is suggesting that in the whole clinical research process (and thus not only in submissions to regulatory authorities) we should start using terms that are based "on tradition", and forget about all the science. So it suggests that instead of writing "Glucose" in our protocols, we should start writing "GLUC", or instead of writing "measure the number of Metamyelocytes/100 leukocytes" (LOINC code 28541-1) in our protocols, we should put "BASOMM" as that is the CDISC "preferred term" and then also add a "method" from the "METHOD" CDISC codelists, and add additional terms from other CDISC-CT lists to complete the description of "measure the number of Metamyelocytes/100 leukocytes, use LOINC code 28541-1".


Changing the designation "CDISC Submission Value" into "CDISC Preferred Term" would be a very dangerous evolution. It would isolate us further from other standardization organizations for which there is an overlap in application area. It would make the statement to these SDOs saying "We don't need you".
And it would mean that CDISC completely "says goodbye" to the use of concepts that are based on science.





A second major problem is that CDISC controlled terminology is tightly bound to the 30 year old, obsolete SAS Transport 5 format (XPT format), with its 8-character and 40-character limitations. This format is only used within CDISC, no other industry worldwide is using this anymore. For example, CDISC "test codes" (--TESTCD) are limited to 8 characters only, which must be ASCII characters, and may not start with a number. Test names (--TEST) are limited to 40 characters and must be ASCII characters. This has lead to some idiotic test codes and names, such as "Corpuscular HGB Conc Distribution Width" as "test name" for "test code" "CHDW" (NCI-ID C139068) where the word "Concentration" needed to be shortened to "Conc" because of the 40 character limitation. Also "CHDW" is meaningless as a mnemonic, due to the 8-characted limitation for --TESTCD.

So, when this proposal would be accepted, we are pinning everything we do in terminology, whether it is in submissions or in non-regulated research, to the outdated XPT format. This means that for everything that is "CDISC preferred"
  • is limited to 8 characters when it is a code
  • is limited to 40 characters when it is a name or description
  • is not allowed to have any characters outside the ASCII-range, so "ñ", "ü", "á" (spanish characters), no German characters like "ß", "ü", no Norwegian characters like "å" or "æ", no Japanese, no Chinese, no Arabic, no Korean, no ...
  • may not start with a number

Do we really want this? Do we really want to say to people who do not submit to regulatory authorities, but do want to use CDISC standards, that they should keep away from LOINC, from UCUM, from SNOMED-CT and NCBI coding, and use CDISC terms instead that
  • are nowhere else used in the world
  • that are based on tradition
  • that are not based on science at all
Do we want to say to them that their codes should be not longer than 8 characters, and that non-ASCII characters are not allowed as these do not comply to "CDISC preferred"? Should we force them to implement the limitations of the XPT format in their systems? Highly probably, they do not use SAS-XPT at all.
This CDISC-CT proposal indeed looks like "megalomania" to me.



It is already bad/sad/mad enough that for submissions, we are obliged to use controlled terminology that is not based on science, and now the CDISC-CT team wants to extend this to everything we do in clinical research. Have they really gone mad?

If you agree and/or feel the same way, please comment directly to CDISC on their JIRA "issue" site: https://jira.cdisc.org/projects/CT/issues/. You will need an account, but if you don't have one, you can create one using https://jira.cdisc.org/secure/Signup!default.jspa. Please take into account that this account is not the same as your "CDISC members" account.

Your comments here are of course always welcome!