What is a Simple Refset in SNOMED CT?

Overview

The term “refset” is a portmanteau of the two words, “reference” and “set”. These are basically subsets of the full SNOMED CT dataset.

The International Release has more than 350,000 active concepts that map to more than 750,000 terms covering all healthcare-relevant terms from abscess to zygote. It is common sense to accept that not everybody will always require all of these and instead require only certain parts that are situational. Consequently, a practical way of limiting these needs is necessary.

Fortunately, SNOMED International has made this possible by formulating and publishing a standardized way to do this that is termed as refsets. This Reference Set mechanism provides a standard way to refer to a subset of SNOMED CT components. The process may also be used to add customized information to a component.

Being standards-compliant, these refsets are shareable externally without any issues. These refsets help address a variety of use-cases, namely, language (part of every International Release) for language-specific terms, mapping to other codesets like ICD, ordered to provide additional functionality to meet advanced variants of the requirements, and simple refsets to address situations like excluding content, including content, catering to national requirements or regional variations or specialty variations or to support data entry protocols by limiting the number of concepts or terms for data entry or analysis.

To create a new refset, an organization will require a namespace from either their national resource center or SNOMED International. This will help in identifying their source and maintain provenance throughout the various releases. For this, the organization needs to have an associate license to legally use the SNOMED CT data files.

Every refset will require regeneration after every SNOMED CT release – International or national or internal as applicable. This process can be automated by using specially designed scripts – usually EQL or SQL.

Refsets may be of two types – simple, explained below, or extended where more data columns are added. Mappings and language refsets are examples of extended refsets.

Simple Refset Data File Structure

Fields ALWAYS present	Id	Identifier
	effectiveTime	Versioning component
	active
	moduleId
	refsetId	The Refset Identifier
	referencedComponentId	The identifier of the component referred to

Table structure re-fashioned from a similar one available from NRCeS, India

Data Structure Explained

Every refset must mandatorily have the above six columns. Of these, the first “id” is the identifier of the data row-set. The next “effectiveTime” is the date-time from which it is effective, “active” denotes the active or inactive status, “moduleId” is the identifier of the module and is a valid SNOMED CT concept with the semantic tag “core metadata concept”. These three represent the versioning component of the data file. The second last column “refsetid” is the identifier of the refset. The last column is “referencedComponentId”, which is the identifier of the component referred to. This must refer to an existing component that is available either in the International Release or national release or extension or map. These are mostly the concepts but may be terms or relationships too.

Example

From July 2021 International Release

Field	Content	Readable Version *[not provided in actual data file]*
id	800aa109-431f-4407-a431-6fe65e9db160
effectiveTime	20170731	31 July 2017
active	1	Yes
moduleId	900000000000207008	SNOMED CT core module (core metadata concept)
refsetId	723264001	Lateralizable body structure reference set (foundation metadata concept)
referencedComponentId	731819006	Entire synovial membrane of upper limb (body structure)

Uses

Simple refsets are used to indicate if a particular concept belongs to a particular group.

While International Releases contain only one simple refsets, namely, the lateralisable body structure reference set, which is a foundation metadata concept. The refset lists all body structures that have both left and right sides and can thus be “lateralised” using the “laterality” attribute.

One can always create can create own sets for concepts related to laterality, allergies, allergens, lists of operation codes for a hospital, diagnosis for use by a specialty like psychiatry, religions for use in a patient registration system, etc.

Practical Applications

The easiest and probably the best way to harness the power that these refsets provide is to use a terminology server. The other option is to set up the environment oneself.

For the following example, it is assumed that the free SNOMED CT terminology server Csnoserv from NRCeS, Pune, India has been set up.

To retrieve a list of all terms containing the word “limb” from the refset 723264001 [Lateralizable body structure reference set (foundation metadata concept)], the following URL needs to be used:

localhost:8080/csnoserv/api/search/suggest?term=limb&state=active&acceptability=preferred&refsetid=723264001

To search for all terms that contain the word “limb” from the refset as above, the following URL needs to be used:

localhost:8080/csnoserv/api/search/search?term=limb&state=active&acceptability=preferred&refsetid=723264001

Advantages

Using refsets in terminology servers causes significant performance improvements thereby positively impacting usability.