HL7 SQL Ex - Enterprise Only

HL7 SQL Ex - Enterprise Only

Overview

The HL7 SQL feature enables you to query HL7 messages using standard SQL, making it easy to search, filter, and analyze data. Behind the scenes, HL7Spy leverages DuckDB — a high-performance analytics database engine — to deliver fast and efficient query processing.

Getting Started

How HL7Spy Processes Your Query

  • When you execute the query (via the Execute button or by pressing F5), HL7Spy scans the query to identify all referenced HL7 Paths.

  • HL7 messages in the current tab are parsed, and an in-memory DuckDB database is generated using only the fields identified in the query.

  • Each Segment defined in the query is given its own table, and each segment-field is given it’s own column

  • Once the database is populated, the query is passed to the DuckDB engine for processing

If the same HL7 Paths were used in a previous query, the parsing step is skipped to improve performance.

Predefined HL7 Paths (under Options → Define HL7 Paths to Load) are always included during parsing. This can minimize unnecessary reparsing across queries—but adding too many may slow down the initial parsing step.

Data Model

  • Each HL7 segment is represented as a separate table (e.g., MSH, PID, OBX)

  • All tables have an index column and sourceId column (only required when the “Include All Tabs“ option is enabled) for performing joins

  • The combination of sourceId and index uniquely identifies a message in the MSH table

  • All values are stored as strings

  • Tables are created dynamically based on the paths used in the query

HL7 Path Format

HL7 paths follow this format: SEG-F[R].C.S

  • SEG = Segment name (e.g., MSH, PID, OBX)

  • F = Field number

  • [R] = Field repeat (optional, * = any repeat)

  • .C = Component number (optional)

  • .S = Subcomponent number (optional)

Examples:

  • PID-3 = Field 3 in PID segment

  • OBX-5.1.2 = Field 5, component 1, subcomponent 2 in the OBX segment

  • OBX-5[*] = The entire OBX-5 field including all repeats

Writing Queries

Basic Query Structure

--region find all ORUs SELECT msh.index, msh."MSH-9", pid."PID-3", obx."OBX-5[*]" as result FROM MSH msh JOIN PID pid ON msh.index = pid.index JOIN OBX obx ON msh.index = obx.index WHERE msh."MSH-9.2" = 'R01'; --endregion

Key Points

  1. Always join tables using the index column

  2. Use HL7 paths in both SELECT and WHERE clauses

  3. Enclose column names in double quotes

  4. Use table aliases for clarity

  5. Always return the “index” column so HL7Spy can hyperlink between the result row, and the message in the tab

  6. Use ILIKE for case insensitive queries

  7. Use regions to demarcating queries

Regions can be expanded or collapsed and allow you to run a query by placing the cursor anywhere within the region—no need to manually select the text. All SQL within the region boundaries is sent to the database engine for execution.

Sample Queries

Finding Messages by Type

SELECT * FROM MSH WHERE "MSH-9" = 'ADT^A01';

Patient Demographics

SELECT msh.index, msh."MSH-9.1" as message_type, pid."PID-3" as patient_id, pid."PID-5.1" as last_name, pid."PID-5.2" as first_name, pid."PID-7" as birth_date FROM MSH msh JOIN PID pid ON msh.index = pid.index;

Lab Results

SELECT msh.index, msh."MSH-9.1" as message_type, obr."OBR-4.1" as test_name, obx."OBX-5[*]" as result_value, obx."OBX-6" as units, obx."OBX-8" as reference_range FROM MSH msh JOIN OBR obr ON msh.index = obr.index JOIN OBX obx ON msh.index = obx.index WHERE obr."OBR-4.1" ILIKE '%GLU%';

Find all messages that have more than 1 IN1 segment

WITH in1_counts AS ( SELECT in1."IN1-0", -- IN1-0 is the segment name IN1 index, COUNT(*) AS in1_count FROM IN1 in1 GROUP BY in1."IN1-0", index ) SELECT MSH.* FROM MSH JOIN in1_counts USING (index) WHERE in1_counts.in1_count > 1;

Find all messages that have more than 1 IN1 segment across all loaded tabs

WITH in1_counts AS ( SELECT in1."IN1-0", -- Segment identifier in1.index, in1.sourceId, COUNT(*) AS in1_count FROM IN1 in1 GROUP BY in1.sourceId, in1.index, in1."IN1-0" ) SELECT msh.* FROM MSH msh JOIN in1_counts in1c ON msh.sourceId = in1c.sourceId AND msh.index = in1c.index WHERE in1c.in1_count > 1;

Advanced - Find the last ORU foreach male patient

WITH ORU_R01_Messages AS ( SELECT index FROM MSH WHERE "MSH-9" = 'ORU^R01' ), FemalePatients AS ( SELECT index, "PID-3" AS patient_id FROM PID WHERE "PID-8" = 'M' ), MatchingMessages AS ( SELECT p.patient_id, p.index FROM ORU_R01_Messages r JOIN FemalePatients p USING (index) ), LastMessagePerPatient AS ( SELECT patient_id, MAX(index) AS last_index FROM MatchingMessages GROUP BY patient_id ) SELECT * FROM LastMessagePerPatient l JOIN MSH m on l.last_index = m.index ORDER BY last_index;

Tips and Best Practices

Query Structure

  • Start with MSH table for message type filtering

  • Use meaningful table aliases

  • Include relevant fields in SELECT clause

  • Add appropriate WHERE clauses for filtering

  • Use SEG-0 to refer to a segment name in a query. For example, IN1-0 evaluates to IN1

Error Prevention

  • Double-check HL7 path syntax

  • Verify segment names are correct

  • Ensure proper quoting of column names

  • Test queries with a small dataset first

Troubleshooting

Common Issues

Query Execution Errors

  • Error: "Table not found": Ensure the segment exists in your data

  • Error: "Column not found": Verify the HL7 path syntax

  • Error: "Syntax error": Check SQL syntax and quotes

Query Structure Issues

  • Missing joins on index column: Always join tables using the index column

  • Incorrect table aliases: Use meaningful and consistent aliases

  • Missing quotes around column names: Always use double quotes for HL7 paths

  • Incorrect string matching in WHERE clauses: Use exact string matches

Data Handling Issues

  • Assuming numeric comparisons on string values: All values are stored as strings

  • Not handling empty or null values: Use IS NULL or IS NOT NULL in WHERE clauses

  • Incorrect composite value matching: Use exact string matches for composite values

  • Missing field repeats in joins: Include field repeats in join conditions when necessary

Additional Resources

  • HL7 v2 Message Structure Documentation

  • DuckDB SQL Documentation

ChatGPT Instructions

ChatGPT is an invaluable resource for writing complex queries. Below is a prompt designed to help ChatGPT generate SQL queries tailored specifically for HL7 data.

 

You are an SQL assistant for querying HL7 data in a DuckDB database. ### Data Model - Each HL7 segment is a table (e.g., MSH, PID) - A combination of the index and sourceId columns uniquely identifies a row in the MSH table and can be used for joining other tables. The sourceId column is only required if multiple tabs are being queried. - Columns are double quoted HL7 Paths. Use "SEG-F" unless explicitly specified by the user - All values are strings ### HL7 Path Format "SEG-F[R].C.S" - [R] = field repeat (optional, * = all repeats) - .C = component (optional) - .S = subcomponent (optional) Examples: - PID-3 → field 3 in PID - OBX-5.1.2 → field 5, comp 1, subcomp 2 ### Instructions - Write DuckDB SQL only. Do not explain. - Join tables on both 'index' and 'sourceId' columns - Column names (HL7 Paths) must double-quoted and be used verbatim in both the SELECT and WHERE clauses - Do not simplify or normalize column names (e.g., "OBX-5[*]" is a valid column name and must be used as-is). - Treat composite values (e.g., 'ORU^R01') as full string matches - Use CTEs and aliases when clearer