Monday, September 29, 2014

The meaning of "Unit"

Last week, I worked on a mapping between the CDISC-CT [UNIT] codelist, and UCUM. For every "unit" published by CDISC (there are about 550 of them), I tried to find an appropriate UCUM notation. Then I used the mapping (which was done using extending the Excel worksheet provided by CDISC) to generate a relational database and also generated a RESTful web service.
So, when you make an HTTP request "http://www.xml4pharmaserver.com:8080/CDISCCTService/rest/getUCUMFromCDISCUnit/BEATS/MIN" the corresponding UCUM notation {beats}/min will be returned. Similarly, if you submit "http://www.xml4pharmaserver.com:8080/CDISCCTService/rest/getUCUMFromCDISCUnit/mmHg", then "mm[Hg]" will be returned.

I then implemented this web service in the Smart Dataset-XML viewer: when the user right-clicks a cell with a unit (e.g. --ORRESU or --STRESU value), the web service is triggered and the UCUM notation is shown (when the cell value is a valid unit from the [UNIT] list). A few screenshots are shown below:







In some cases, the CDISC notation follows the UCUM notation, but this is surely not always the case, especially for non-SI units the deviations are considerably.

What difficulties did I encounter during the mapping exercise?
Quite a few ...
Some CDISC "units" are not units at all. For example "Virtual Pixel" (NCI C71620).
Other "units" are mixing up objects "what it is about" and units. For example "g/mol Creatinine". UCUM has recognized that this bad habit exists and has solved this by so-called "annotations" (see the UCUM specification). So the UCUM notation for this is "g/mol{creatinine}.
In my opinion, CDISC should control annotations for use in clinical research, not the units themselves.
A difficulty that arose, and costed me quite an amount of time is the "unit" "U/kg". The CDISC definition is: "An arbitrary unit of substance content expressed in units of biological activity per unit of mass equal to one kilogram. Unit per kilogram is also used as a dose calculation unit expressed in arbitrary units per one kilogram of body mass". This sounds like a dual definition, i.e. "U" is used for two different things. When it is a unit of biological or catalytic activity the UCUM unit "U" can be used which is equal to 1 umol/min:


So when a biologial activity is meant, the corresponding UCUM notation for "U/kg" would then simply be "U/kg" which is equal to 1 umol/min/kg.

When "arbitrary units per one kilogram of body mass" is meant (second part of the CDISC definition), then it is something arbitrary, and depending on what is measured. In such a case, an annotation must be used. So, in the second case, the UCUM notation must be {Unit}/kg.

It is OK that a "CDISC unit" means completely two different things depending on the use case? I don't think so. Is "arbitrary unit" a unit anyway? Isn't the wording "arbitrary units" a "contradictio in terminis" anyway?

Do you also think CDISC should stop developing controlled terminology for "units" and use UCUM?

You reactions are as always highly appreciated.
 





No comments:

Post a Comment