The PCORnet i2b2 Information Model

SCILHS (with significant help from the Greater Plains Collaborative) has developed an i2b2 information model that represents the PCORnet Common Data Model (CDM). This information model consists of an i2b2 ontology/terminology and a process for mapping local data elements to the ontology without changing the underlying imported data. This approach highlights i2b2’s ability to separate data model from both information model and the underlying data format.

By conforming to this ontology, our sites will be able to programmatically generate data marts in the PCORnet data model format, which will enable detailed analysis through the PCORnet Distributed Research Network (DRN).  Likewise, if PCORNet Queries were rewritten to run against the i2b2 schema, they could run reliably at every site using the PCORNet CDM i2b2 ontology.

The ontology is live at Change the username to pcori, leave the password as demouser, and click ‘Login’. The existing demo data has been mapped to the PCORI ontology, so most queries involving demographics, diagnoses, and enrollment will work. This did not require any changes to the demo data, only to the information model (ontology). Adding new demo data for the remaining sections of the ontology is high on our priority list.

The ontology is currently in an alpha state. We will release it publicly once we have vetted it with our sites. If you are part of a CDRN or PPRN and interested in being an alpha tester, feel free to contact Jeff Klann directly at Jeff dot Klann at Our mapping documentation and tools are also forthcoming.

About the ontology

What’s in the ontology?

  • Core ontology – based on Dan Connolly’s code to generate ontologies from the PCORnet CDM spec, currently v1 of that spec.
  • Lori Phillip’s ontology trees on BioPortal – ICD-10 2014AA, ICD-9 2014AA
  • Partners’ RPDR CMS-DRG tree (we do not have MS-DRG yet)
  • Nathan Wilson’s HCPCS tree
  • 3-digit zip codes derived from the i2b2 demo data
  • An age tree that is a more granular version of that found in the demo data (to support pediatric age queries)
  • Some clarifications (e.g., meaning of the procedure code types) from the coordinating center appear in the tooltips.

What’s not in the ontology?

LOINC v236 and SNOMED Clinical Findings v1.2 are available on BioPortal, but they are not included in this version of the ontology. We suspect SNOMED is not presently used, and the language from the coordinating center indicates CPT and HCPCS are preferred over LOINC for procedures at present: “Only billed procedures should be included in the PROCEDURE table. The ORDER concept may be incorporated into future phases of the CDM.”

CPT is not included because it is not freely available.

Mapping to the Ontology

More information on this process will be released shortly. Our goal is to support PCORnet CDM queries in i2b2 with minimal changes to local sites’ data and ETL processes.

The key insight is that because i2b2 performs dimensional queries, all children in a hierarchy are included in a query. Therefore, given a standard parent node (such as ICD9:250 for diabetes), local child nodes can be added that represent local terminologies. These will automatically be picked up by the query process.

There are other, simpler methods of performing this mapping in some cases. For example, changes to terms that utilize the patient dimension or concept dimension require only one field (C_DIMCODE) be modified for any number of local codes. Also, some terms can be implemented as computed values (such as encounter-based enrollment), which will make calculations on the data warehouse without changes to the imported data.

– Jeffrey Klann, PhD