Download T6: Record Locator Service: Technical Background from the Massachusetts Prototype Community
Following the federated data architecture principle, RLS persists minimal patient data centrally. The core of the RLS data store is a community Master Patient Index (CMPI) that supports lookup of patient electronic health record locations based on basic demographic attributes.
A canonical information model is used to develop reference XML schema that CDX Gateways use to send and receive messages based on the HL7 Reference Information Model (RIM).19 A logical data model using standard Entity-Relationship diagramming notation is derived from the information model. All messaging services that RLS supports are integrated with the physical implementation of the logical model in the form of a relational database.
The RLS information model view derived from the HL7 RIM is shown in Figure 24 for reference.
Figure 24: Information Model View
The community Master Patient Index follows the traditional MPI structure storing only ‘pointers’ to providers systems and patient identifiers therein, in addition to essential demographics attributes that can be searched on. The pointer to patient records is the id attribute of the communityMasterPatientIndex (CMPI) class. The list of attributes shown in the model view represents a set of all possible patient demographics. An RLS implementation would choose a specific subset of demographic attributes for the CMPI based on the specific community policies and requirements.
As can be seen from the information model, the patient EHR that the RLS index points to is a hierarchical abstraction of the RIM classes. With the patient index provided by the CMPI users can retrieve and select from visits or patient encounters at the provider facility. Users may then navigate from encounters to individual care records represented by the generalized clinical act class that may refer to procedures, observations (covering laboratory results, diagnostic images, etc.) and substanceAdministration (medication) lists.
The classes and attributes in the RLS information model are translated into entities and attributes of a logical data model that can be implemented physically in a relational (SQL) DBMS. The logical data model derived from the information model is shown in Figure 25.
The Entity-Relationship (ER) modeling notation used here is directly translatable to physical SQL databases. Entities have attributes corresponding to the class attributes of the information model. Attributes above the dividing line form the ‘primary key’ of the entity. Relationships between entities are denoted with lines that have a crows-foot notation to symbolize the ‘many’ end of a one-to-many relationship. These relationships result in the entity at the ‘many’ end inheriting the primary key of the entity at the ‘one’ end, as a foreign key marked as (FK).
The identifiedPerson entity represents the person as maintained in the source (EHR) system. The attributes of the CMPI entity shown do not signify the norm in any way. Select attributes from the identifiedPerson entity are replicated into the CMPI entity, based on community requirements for patient record matching. The identifying information (primary key) of patients in the communityMasterPatientIndex (CMPI) is formed by concatenating the identifiers of the assigningOrganization and the identifiedPerson. This combination of identifiers provides a unique key for the CMPI record.
Figure 25: Logical Data Model
Within a clinical data source, patients are usually assigned local identifiers (e.g. MRN, chart number, etc.). In some instances alternate standard identifiers (e.g. Social Security Number, Medicaid numbers etc.) are used. Given the expected variability in data quality in diverse clinical systems, and the privacy constraints around some of the standard identifiers used (such as SSN) RLS does not distinguish between these two types of identifiers. Each is treated as a non-intelligent key to the patient record in the clinical data source.
The gatewayDevice entity is used to store the network address of the clinical data source managed by the assigningOrganization. Thus, along with the patient pointer information the CMPI returns the network address (of the Gateway) to which queries for patient medical records should be sent. When an EHR Gateway receives a patient medical data request it resolves the medical record location using the personIDRoot (assigningOrganization’s ID) part of the patient index, and redirects the query to the appropriate clinical data source.
The RLS supports use of multiple other identifiers for a patient such as identifiers used by ancillary systems. Theses additional identifiers are used more as attributes than identifiers, and may be used to search for the patient in the RLS. Standard identifiers, e.g. SSN, may be explicitly used as otherIDs, if the RLS implementation policy and regulations permit. The otherIDRoot attribute entity represents organizations such as the Social Security Administration (for SSN) or state Registry of Motor Vehicles (for driver’s licenses).
The assignedPerson entity represents the user who has access to the gatewayDevice. The user role that determines access rights of the user is carried in the ‘code’ attribute (following the HL7 v3 implementation guide).
In addition to the business domain entities, messages and message logs are represented in the model. Messages are not stored physically in the RLS database except as XML strings in the message logs. Message logs are generic entities that may be used to store all messages that flow through the RLS/Gateway. This entity also carries patient and user attributes related to the message, which supports auditing of the logs.
8.2.1 Identifier Attributes
Translation from the object-oriented information model to classic relational data structures requires that the HL7 v3 data types be converted to SQL data types. The conversion is for the most part straight-forward where the components of the object attribute types such as II, EN, AD, etc. are flattened out to sequences of SQL data types.
Identifiers in the logical data model are formed from HL7 v3 instance identifiers (type II), and are worth examining in more detail since they are critical to understanding of the data returned by RLS. The II data type is defined as:20
An identifier that uniquely identifies a thing or object. Examples are object identifier for HL7 RIM objects, medical record number, order id, service catalog item id, Vehicle Identification Number (VIN), etc. Instance identifiers are defined based on ISO object identifiers.
Instance identifiers are used for patient, organizations, devices etc. The HL7 v3 data type II has the following structure:
| Element | Description |
|---|---|
| root | A unique identifier that guarantees the global uniqueness of the instance identifier. The root alone may be the entire instance identifier. This is number sequence that matches a pattern corresponding to DCE UUID, ISO OID, or strings consisting only of (US-ASCII) letters, digits and hyphens, where the first character must be a letter. |
| extension | A character string as a unique identifier within the scope of the identifier root. If the root is used as a unique identifier, the extension is null. |
| assigningAuthority | A human readable name or mnemonic for the assigning authority. Note: no automated processing must depend on the assigning authority name to be present in any form. |
| displayable | Specifies if the identifier is intended for human display and data entry (displayable = true) as opposed to pure machine interoperation (displayable = false). |
The extension, assigningAuthority, and displayable attributes are all optional. The root may itself be used as a unique identifier, such as when it contains a UUID. In the RLS data model, the convention is to use UUIDs for transactional entities such as messages. Entities such as organizations and devices have fixed identifiers set up by the RLS, which may use OIDs or UUIDs. Patients have two part identifiers, where the root maps to the assigningOrganization’s id and the extension to the specific person id (e.g. MRN).
The root attribute of the patient identifier is set to the OID or UUID of the ‘assigningOrganization’ that defines the id namespace (within which the id is unique). The personIDRoot of the CMPI is a foreign key mapping to the assigningOrganization primary key idRoot and the personIDExtension maps to the identifiedPerson primary key idExtension. The personIDRoot may be considered the prefix that RLS attaches to make the pointer unique in the CMPI.
The primary key of the CMPI is not used for searching as the demographics attributes are. Searchable identifiers are stored in the otherIDs entity. For example, when a user specifies, where permitted, the SSN of the patient as a query criterion, the RLS derives the OID of the SSA using a lookup table of standard OIDs, which is then used to match the patientIDRoot of the otherIDs table and the given SSN is matched to the patientIDExtension.
The physical data model maps very closely to the logical model shown above. However, the physical tables are not all implemented in the same database instance since the architecture posits the RLS as a combination of a Patient Index service and a distributed Gateway service. The distribution of tables across the Patient Index and the Gateway is worth further discussion. The problem of OIDs management is also relevant to this design discussion.
The tables generated from the CMPI, patientConsent, otherIDs and otherNames entities reside in the Patient Index database. In addition the patient matching algorithm may create persistent secondary indexes to increase the performance of lookup queries. For example a probabilistic matching method may need to maintain a secondary index of Soundex transformed names. Since the RLS architecture needs to work with multiple matching algorithms, the patient matching component is treated as a separate service that maintains all the secondary indexes it needs. Optionally, the record matching algorithm may also generate a linking identifier that would be persisted along with the index. However, this would lead to increased maintenance overheads.
The remaining tables in the logical data model are created in the Gateway service data storage layer. Authorized user identity (assignedPerson) and messageLog tables are used for the purposes described above at Gateway services at each node in the healthcare information network, including the RLS node.
Organization and gateway information is maintained at each node based on the message processing requirements at the node. The RLS maintains the master list of Gateway services at all the network nodes. The Gateway service at the participating nodes maintains information on the various clinical data sources that it supports. The RLS does not require to know the details of the individual clinical data sources at each node. That information is abstracted by the Gateway service at that node. The RLS maintains a local copy of the OIDs for standard identifier assigning authorities as replicated from centrally maintained registries, e.g. the HL7 OID registry.
RLS accepts patient index data from the distributed sources with patient identifiers qualified with the assigningOrganization id. If these assigningOrganizations are not stored in the RLS Gateway, then the patient record in the CMPI is provided an additional prefix: the gatewayDevice.idRoot. The sequence of actions to build up the patient index in the CMPI is shown in Figure 26.
Figure 26: Patient Identifier Composition in CMPI
The patient record location provided by RLS in response to patient index lookup requests will contain the composite patient index made up of: personIDRoot, personIDExtension and the gatewayDevice.telecomURI. The recipient of this patient record location sends a query to the gatewayDevice.telecomURI with the composite patientID information. Since the remote Gateway maintains the assigningOrganization IDs it is able to resolve the composite patient identifier and retrieve the requested medical data.
As the above discussion shows, the division of labor between the Gateways at the clinical data source and the RLS requires that the appropriate cross-reference tables be maintained accordingly. At the RLS the various remote Gateways are assigned OIDs and maintained in the gatewayDevice table. The Gateway at the clinical data source, in turn, assigns OIDs to each clinical data source and maintains the cross-reference in the assigningOrg table. The use of OIDs is not mandatory; any local identifier system may be used as long as the patient identifier composition process described above is followed.
8.3.1 Data Quality Management
A general principle is that the CMPI database be a read-only version of patient records as they exist on source systems. The CMPI is intended purely for record matching and is not to be considered a patient registry. Data quality issues are expected to be resolved on the source systems, and cleansed data would replicate to the CMPI. The CMPI is therefore different from an Enterprise MPI in that no central data management organization is envisaged for an RLS. Data cleansing and quality services are not thought to be viable in a community of disparate, autonomous enterprises contributing data into the CMPI.
8.3.2 Data Cache
Message caching (logging) is likely to be required (over and beyond the persistence service offered by the MQ engine)
The data services layer in the CDX Gateway could serve as a cache for patient EHR data. Alternate architectures use clinical data repositories at the edge to serve the data requests received via information exchanges, so that core clinical data sources are not hit by these queries that could potentially impact the core clinical system performance. This aspect is discussed in more detail in Section 6.3. Some nodes may not want to push their EHR directly to the CMPI, instead may choose to expose their EMPI, or replicate their MPIs to the CMPI. Replicated MPI should have a ‘time-to-live’ based expiry, after which it must be refreshed.