SCILHS - Scalable Collaborative Infrastructure for a Learning Health System

Category Archives

5 Articles

The PCORnet i2b2 Information Model

by Jessica Lyons

SCILHS (with significant help from the Greater Plains Collaborative) has developed an i2b2 information model that represents the PCORnet Common Data Model (CDM). This information model consists of an i2b2 ontology/terminology and a process for mapping local data elements to the ontology without changing the underlying imported data. This approach highlights i2b2’s ability to separate data model from both information model and the underlying data format.

By conforming to this ontology, our sites will be able to programmatically generate data marts in the PCORnet data model format, which will enable detailed analysis through the PCORnet Distributed Research Network (DRN).  Likewise, if PCORNet Queries were rewritten to run against the i2b2 schema, they could run reliably at every site using the PCORNet CDM i2b2 ontology.

The ontology is live at Change the username to pcori, leave the password as demouser, and click ‘Login’. The existing demo data has been mapped to the PCORI ontology, so most queries involving demographics, diagnoses, and enrollment will work. This did not require any changes to the demo data, only to the information model (ontology). Adding new demo data for the remaining sections of the ontology is high on our priority list.

The ontology is currently in an alpha state. We will release it publicly once we have vetted it with our sites. If you are part of a CDRN or PPRN and interested in being an alpha tester, feel free to contact Jeff Klann directly at Jeff dot Klann at Our mapping documentation and tools are also forthcoming.

About the ontology

What’s in the ontology?

  • Core ontology – based on Dan Connolly’s code to generate ontologies from the PCORnet CDM spec, currently v1 of that spec.
  • Lori Phillip’s ontology trees on BioPortal – ICD-10 2014AA, ICD-9 2014AA
  • Partners’ RPDR CMS-DRG tree (we do not have MS-DRG yet)
  • Nathan Wilson’s HCPCS tree
  • 3-digit zip codes derived from the i2b2 demo data
  • An age tree that is a more granular version of that found in the demo data (to support pediatric age queries)
  • Some clarifications (e.g., meaning of the procedure code types) from the coordinating center appear in the tooltips.

What’s not in the ontology?

LOINC v236 and SNOMED Clinical Findings v1.2 are available on BioPortal, but they are not included in this version of the ontology. We suspect SNOMED is not presently used, and the language from the coordinating center indicates CPT and HCPCS are preferred over LOINC for procedures at present: “Only billed procedures should be included in the PROCEDURE table. The ORDER concept may be incorporated into future phases of the CDM.”

CPT is not included because it is not freely available.

Mapping to the Ontology

More information on this process will be released shortly. Our goal is to support PCORnet CDM queries in i2b2 with minimal changes to local sites’ data and ETL processes.

The key insight is that because i2b2 performs dimensional queries, all children in a hierarchy are included in a query. Therefore, given a standard parent node (such as ICD9:250 for diabetes), local child nodes can be added that represent local terminologies. These will automatically be picked up by the query process.

There are other, simpler methods of performing this mapping in some cases. For example, changes to terms that utilize the patient dimension or concept dimension require only one field (C_DIMCODE) be modified for any number of local codes. Also, some terms can be implemented as computed values (such as encounter-based enrollment), which will make calculations on the data warehouse without changes to the imported data.

– Jeffrey Klann, PhD


Query Health: standards-based, cross-platform population health surveillance

by scilhs

Check out Jeff Klann’s recent publication in JAMIA that describes Query Health, a government-led initiative to enable distributed, secure, standards-based population health measurement. The reference implementation uses two technologies also being used within PCORnet: the i2b2 clinical data analytics warehouse and the PopMedNet secure query distribution system.

Query Health supports participation of disparate organizations, with different underlying platforms, data structures, and governance policies. For this it uses a distributed query model. PopMedNet sends ‘questions to the data’ in a standards-based, platform-agnostic format. Data partners transform the question into code that can be executed on their local platform. In our i2b2-PopMedNet integration, we wrote a translator to convert the standardized questions (in the Health Quality Measures Format) into the i2b2 query format. These queries execute against a QueryHealth ontology that uses an agreed-upon data model. hQuery, a document database with a very different structure, also integrated with PopMedNet and executed the same inputs, using a JavaScript translator. We piloted the Query Health Reference implementation successfully at three sites.

This paper demonstrates a unique way to study health information across many sites while allowing individual organizations to process queries, disclose only the minimum necessary information to answer the query, and leverage their existing information technology investment through platform-independent queries.

The full-text publication can be found at the following link:


SCILHS Query Workflow

by kmandl

Workflows for issuing simple and complex PCORNet queries to SCILHS.

With a goal of reducing the overall expenditure on data transformation, we agreed upon a revised data query plan with the coordinating center (Platt, Brown, Curtis, Murphy and Mandl meeting at Harvard Pilgrim on March 10, 2014).  The opportunity to later transform our full data into a PCOR Common Data Model is in no way precluded. However, the following approach enables early success, a learning curve and a shared learning process for CDRNs re: data transformation.

The workflow we are implementing is shown in the diagram above.

  • Simple queries are generated from the PCORnet Coordinating Center, and expressed in plain English that is further defined into a SCILHS query.

  • SCILHS will form SHRINE queries to define a query cohort using existing i2b2 databases.

  • Complex queries are accomplished through a three-step process:

    • A simple query is run (as above).   The result of the simple query is a patient list.

    • The patient list is used to create a disease-specific data subset which is transformed into the PCORNet Common Data Model and transferred to the analysis platform.

    • SCILHS runs PCORnet SQL against the transformed results only.

Query results will be shared with PCORnet and/or used to generate patient lists for our outreach efforts.

For early phases of PCORNet,  this approach requires transformation of thousands, rather than millions of records.