HL7 SQL Ex - Enterprise Only
Overview
The HL7 SQL feature enables you to query HL7 messages using standard SQL, making it easy to search, filter, and analyze data. Behind the scenes, HL7Spy leverages DuckDB — a high-performance analytics database engine — to deliver fast and efficient query processing.
Getting Started
How HL7Spy Processes Your Query
When you execute the query (via the Execute button or by pressing F5), HL7Spy scans the query to identify all referenced HL7 Paths.
HL7 messages in the current tab are parsed, and an in-memory DuckDB database is generated using only the fields identified in the query.
Each Segment defined in the query is given its own table, and each segment-field is given it’s own column
Once the database is populated, the query is passed to the DuckDB engine for processing
If the same HL7 Paths were used in a previous query, the parsing step is skipped to improve performance.
Predefined HL7 Paths (under Options → Define HL7 Paths to Load) are always included during parsing. This can minimize unnecessary reparsing across queries—but adding too many may slow down the initial parsing step.
Data Model
Each HL7 segment is represented as a separate table (e.g., MSH, PID, OBX)
All tables have an index column and sourceId column (only required when the “Include All Tabs“ option is enabled) for performing joins
The combination of sourceId and index uniquely identifies a message in the MSH table
All values are stored as strings
Tables are created dynamically based on the paths used in the query
HL7 Path Format
HL7 paths follow this format: SEG-F[R].C.S
SEG = Segment name (e.g., MSH, PID, OBX)
F = Field number
[R] = Field repeat (optional, * = any repeat)
.C = Component number (optional)
.S = Subcomponent number (optional)
Examples:
PID-3 = Field 3 in PID segment
OBX-5.1.2 = Field 5, component 1, subcomponent 2 in the OBX segment
OBX-5[*] = The entire OBX-5 field including all repeats
Writing Queries
Basic Query Structure
--region find all ORUs
SELECT msh.index, msh."MSH-9",
pid."PID-3",
obx."OBX-5[*]" as result
FROM MSH msh
JOIN PID pid ON msh.index = pid.index
JOIN OBX obx ON msh.index = obx.index
WHERE msh."MSH-9.2" = 'R01';
--endregionKey Points
Always join tables using the index column
Use HL7 paths in both SELECT and WHERE clauses
Enclose column names in double quotes
Use table aliases for clarity
Always return the “index” column so HL7Spy can hyperlink between the result row, and the message in the tab
Use ILIKE for case insensitive queries
Use regions to demarcating queries
Regions can be expanded or collapsed and allow you to run a query by placing the cursor anywhere within the region—no need to manually select the text. All SQL within the region boundaries is sent to the database engine for execution.
Sample Queries
Finding Messages by Type
SELECT *
FROM MSH
WHERE "MSH-9" = 'ADT^A01';Patient Demographics
SELECT msh.index, msh."MSH-9.1" as message_type,
pid."PID-3" as patient_id,
pid."PID-5.1" as last_name,
pid."PID-5.2" as first_name,
pid."PID-7" as birth_date
FROM MSH msh
JOIN PID pid ON msh.index = pid.index;Lab Results
SELECT msh.index, msh."MSH-9.1" as message_type,
obr."OBR-4.1" as test_name,
obx."OBX-5[*]" as result_value,
obx."OBX-6" as units,
obx."OBX-8" as reference_range
FROM MSH msh
JOIN OBR obr ON msh.index = obr.index
JOIN OBX obx ON msh.index = obx.index
WHERE obr."OBR-4.1" ILIKE '%GLU%';Find all messages that have more than 1 IN1 segment
WITH in1_counts AS (
SELECT
in1."IN1-0", -- IN1-0 is the segment name IN1
index,
COUNT(*) AS in1_count
FROM IN1 in1
GROUP BY in1."IN1-0", index
)
SELECT
MSH.*
FROM MSH
JOIN in1_counts USING (index)
WHERE in1_counts.in1_count > 1;Find all messages that have more than 1 IN1 segment across all loaded tabs
WITH in1_counts AS (
SELECT
in1."IN1-0", -- Segment identifier
in1.index,
in1.sourceId,
COUNT(*) AS in1_count
FROM IN1 in1
GROUP BY in1.sourceId, in1.index, in1."IN1-0"
)
SELECT
msh.*
FROM MSH msh
JOIN in1_counts in1c
ON msh.sourceId = in1c.sourceId
AND msh.index = in1c.index
WHERE in1c.in1_count > 1;Advanced - Find the last ORU foreach male patient
WITH ORU_R01_Messages AS (
SELECT index
FROM MSH
WHERE "MSH-9" = 'ORU^R01'
),
FemalePatients AS (
SELECT index, "PID-3" AS patient_id
FROM PID
WHERE "PID-8" = 'M'
),
MatchingMessages AS (
SELECT p.patient_id, p.index
FROM ORU_R01_Messages r
JOIN FemalePatients p USING (index)
),
LastMessagePerPatient AS (
SELECT patient_id, MAX(index) AS last_index
FROM MatchingMessages
GROUP BY patient_id
)
SELECT *
FROM LastMessagePerPatient l
JOIN MSH m on l.last_index = m.index
ORDER BY last_index;Tips and Best Practices
Query Structure
Start with MSH table for message type filtering
Use meaningful table aliases
Include relevant fields in SELECT clause
Add appropriate WHERE clauses for filtering
Use SEG-0 to refer to a segment name in a query. For example, IN1-0 evaluates to IN1
Error Prevention
Double-check HL7 path syntax
Verify segment names are correct
Ensure proper quoting of column names
Test queries with a small dataset first
Troubleshooting
Common Issues
Query Execution Errors
Error: "Table not found": Ensure the segment exists in your data
Error: "Column not found": Verify the HL7 path syntax
Error: "Syntax error": Check SQL syntax and quotes
Query Structure Issues
Missing joins on index column: Always join tables using the index column
Incorrect table aliases: Use meaningful and consistent aliases
Missing quotes around column names: Always use double quotes for HL7 paths
Incorrect string matching in WHERE clauses: Use exact string matches
Data Handling Issues
Assuming numeric comparisons on string values: All values are stored as strings
Not handling empty or null values: Use IS NULL or IS NOT NULL in WHERE clauses
Incorrect composite value matching: Use exact string matches for composite values
Missing field repeats in joins: Include field repeats in join conditions when necessary
Additional Resources
HL7 v2 Message Structure Documentation
DuckDB SQL Documentation
ChatGPT Instructions
ChatGPT is an invaluable resource for writing complex queries. Below is a prompt designed to help ChatGPT generate SQL queries tailored specifically for HL7 data.
You are an SQL assistant for querying HL7 data in a DuckDB database.
### Data Model
- Each HL7 segment is a table (e.g., MSH, PID)
- A combination of the index and sourceId columns uniquely identifies a row in the MSH table
and can be used for joining other tables. The sourceId column is only required if multiple tabs are being queried.
- Columns are double quoted HL7 Paths. Use "SEG-F" unless explicitly specified by the user
- All values are strings
### HL7 Path Format
"SEG-F[R].C.S"
- [R] = field repeat (optional, * = all repeats)
- .C = component (optional)
- .S = subcomponent (optional)
Examples:
- PID-3 → field 3 in PID
- OBX-5.1.2 → field 5, comp 1, subcomp 2
### Instructions
- Write DuckDB SQL only. Do not explain.
- Join tables on both 'index' and 'sourceId' columns
- Column names (HL7 Paths) must double-quoted and be used verbatim in both the SELECT and WHERE clauses
- Do not simplify or normalize column names (e.g., "OBX-5[*]" is a valid column name and must be used as-is).
- Treat composite values (e.g., 'ORU^R01') as full string matches
- Use CTEs and aliases when clearer