HL7 SQL Ex - Enterprise Only

Overview

The HL7 SQL feature enables you to query HL7 messages using standard SQL, making it easy to search, filter, and analyze data. Behind the scenes, HL7Spy leverages DuckDB — a high-performance analytics database engine — to deliver fast and efficient query processing.

Getting Started

How HL7Spy Processes Your Query

When you execute the query (via the Execute button or by pressing F5), HL7Spy scans the query to identify all referenced HL7 Paths.
HL7 messages in the current tab are parsed, and an in-memory DuckDB database is generated using only the fields identified in the query.
Each Segment defined in the query is given its own table, and each segment-field is given it’s own column
Once the database is populated, the query is passed to the DuckDB engine for processing

If the same HL7 Paths were used in a previous query, the parsing step is skipped to improve performance.

Predefined HL7 Paths (under Options → Define HL7 Paths to Load) are always included during parsing. This can minimize unnecessary reparsing across queries—but adding too many may slow down the initial parsing step.

Data Model

Each HL7 segment is represented as a separate table (e.g., MSH, PID, OBX)
All tables have an index column and sourceId column (only required when the “Include All Tabs“ option is enabled) for performing joins
The combination of sourceId and index uniquely identifies a message in the MSH table
All values are stored as strings
Tables are created dynamically based on the paths used in the query

HL7 Path Format

HL7 paths follow this format: SEG-F[R].C.S

SEG = Segment name (e.g., MSH, PID, OBX)
F = Field number
[R] = Field repeat (optional, * = any repeat)
.C = Component number (optional)
.S = Subcomponent number (optional)

Examples:

PID-3 = Field 3 in PID segment
OBX-5.1.2 = Field 5, component 1, subcomponent 2 in the OBX segment
OBX-5[*] = The entire OBX-5 field including all repeats

Writing Queries

Basic Query Structure

--region find all ORUs
SELECT msh.index, msh."MSH-9",
       pid."PID-3",
       obx."OBX-5[*]" as result
FROM MSH msh
JOIN PID pid ON msh.index = pid.index
JOIN OBX obx ON msh.index = obx.index
WHERE msh."MSH-9.2" = 'R01';
--endregion

Key Points

Always join tables using the index column
Use HL7 paths in both SELECT and WHERE clauses
Enclose column names in double quotes
Use table aliases for clarity
Always return the “index” column so HL7Spy can hyperlink between the result row, and the message in the tab
Use ILIKE for case insensitive queries
Use regions to demarcating queries

Regions can be expanded or collapsed and allow you to run a query by placing the cursor anywhere within the region—no need to manually select the text. All SQL within the region boundaries is sent to the database engine for execution.

Sample Queries

Finding Messages by Type

SELECT * 
FROM MSH 
WHERE "MSH-9" = 'ADT^A01';

Patient Demographics

SELECT msh.index, msh."MSH-9.1" as message_type,
       pid."PID-3" as patient_id,
       pid."PID-5.1" as last_name,
       pid."PID-5.2" as first_name,
       pid."PID-7" as birth_date
FROM MSH msh
JOIN PID pid ON msh.index = pid.index;

Lab Results

SELECT msh.index, msh."MSH-9.1" as message_type,
       obr."OBR-4.1" as test_name,
       obx."OBX-5[*]" as result_value,
       obx."OBX-6" as units,
       obx."OBX-8" as reference_range
FROM MSH msh
JOIN OBR obr ON msh.index = obr.index
JOIN OBX obx ON msh.index = obx.index
WHERE obr."OBR-4.1" ILIKE '%GLU%';

Find all messages that have more than 1 IN1 segment

WITH in1_counts AS (
  SELECT
    in1."IN1-0", -- IN1-0 is the segment name IN1
    index,
    COUNT(*) AS in1_count
  FROM IN1 in1
  GROUP BY in1."IN1-0", index
)
SELECT
  MSH.*
FROM MSH
JOIN in1_counts USING (index)
WHERE in1_counts.in1_count > 1;

Find all messages that have more than 1 IN1 segment across all loaded tabs

WITH in1_counts AS (
  SELECT
    in1."IN1-0",        -- Segment identifier
    in1.index,
    in1.sourceId,
    COUNT(*) AS in1_count
  FROM IN1 in1
  GROUP BY in1.sourceId, in1.index, in1."IN1-0"
)
SELECT
  msh.*
FROM MSH msh
JOIN in1_counts in1c 
  ON msh.sourceId = in1c.sourceId
 AND msh.index = in1c.index
WHERE in1c.in1_count > 1;

Advanced - Find the last ORU foreach male patient

WITH ORU_R01_Messages AS (
  SELECT index
FROM MSH
WHERE "MSH-9" = 'ORU^R01'
  ),
FemalePatients AS (
  SELECT index, "PID-3" AS patient_id
FROM PID
WHERE "PID-8" = 'M'
  ),
MatchingMessages AS (
  SELECT p.patient_id, p.index
  FROM ORU_R01_Messages r
JOIN FemalePatients p USING (index)
  ),
LastMessagePerPatient AS (
  SELECT patient_id, MAX(index) AS last_index
    FROM MatchingMessages
    GROUP BY patient_id
)
SELECT *
  FROM LastMessagePerPatient l
JOIN MSH m on l.last_index = m.index 
ORDER BY last_index;

Tips and Best Practices

Query Structure

Start with MSH table for message type filtering
Use meaningful table aliases
Include relevant fields in SELECT clause
Add appropriate WHERE clauses for filtering
Use SEG-0 to refer to a segment name in a query. For example, IN1-0 evaluates to IN1

Error Prevention

Double-check HL7 path syntax
Verify segment names are correct
Ensure proper quoting of column names
Test queries with a small dataset first

Troubleshooting

Common Issues

Query Execution Errors

Error: "Table not found": Ensure the segment exists in your data
Error: "Column not found": Verify the HL7 path syntax
Error: "Syntax error": Check SQL syntax and quotes

Query Structure Issues

Missing joins on index column: Always join tables using the index column
Incorrect table aliases: Use meaningful and consistent aliases
Missing quotes around column names: Always use double quotes for HL7 paths
Incorrect string matching in WHERE clauses: Use exact string matches

Data Handling Issues

Assuming numeric comparisons on string values: All values are stored as strings
Not handling empty or null values: Use IS NULL or IS NOT NULL in WHERE clauses
Incorrect composite value matching: Use exact string matches for composite values
Missing field repeats in joins: Include field repeats in join conditions when necessary

Additional Resources

HL7 v2 Message Structure Documentation
DuckDB SQL Documentation

ChatGPT Instructions

ChatGPT is an invaluable resource for writing complex queries. Below is a prompt designed to help ChatGPT generate SQL queries tailored specifically for HL7 data.

You are an SQL assistant for querying HL7 data in a DuckDB database.

### Data Model
- Each HL7 segment is a table (e.g., MSH, PID)
- A combination of the index and sourceId columns uniquely identifies a row in the MSH table
  and can be used for joining other tables. The sourceId column is only required if multiple tabs are being queried.
- Columns are double quoted HL7 Paths. Use "SEG-F" unless explicitly specified by the user
- All values are strings

### HL7 Path Format
"SEG-F[R].C.S"  
- [R] = field repeat (optional, * = all repeats)
- .C = component (optional)
- .S = subcomponent (optional)

Examples:
- PID-3 → field 3 in PID
- OBX-5.1.2 → field 5, comp 1, subcomp 2

### Instructions
- Write DuckDB SQL only. Do not explain.
- Join tables on both 'index' and 'sourceId' columns
- Column names (HL7 Paths) must double-quoted and be used verbatim in both the SELECT and WHERE clauses
- Do not simplify or normalize column names (e.g., "OBX-5[*]" is a valid column name and must be used as-is).
- Treat composite values (e.g., 'ORU^R01') as full string matches
- Use CTEs and aliases when clearer