Hardware & Performance Observations
Over the last week I have been running Connexion with different hardware configurations to help pin down some platform recommendations. The base system is an ivy-bridge 3770K CPU with 8G 1600 RAM (quad-core desktop). Connexion and SQL Server standard are both running on the same box.
Channels used
Disk tests were performed with 4 identical channel consisting of a file reader, queue, HL7 transform, and file writer.
There are several measures of interest:
Steady-state throughput: What is the maximum throughput of Connexion when queueing and processing multiple channels. What are the resource bottlenecks?
Steady-state consistency: Is the performance consistent during processing or do the speeds fluctuate?
DB Locking: Maintenance and message querying during processing can cause database locks and slowdowns.
Summary
Test | Throughput | Consistency | CPU% | Disk% | Index | Limit |
|---|---|---|---|---|---|---|
Single spindle, all db files | 3,200 | Fair | 30% | 100% | Very Slow (>5m for 7m records) | Disk |
Raid 0 Spindles, all db files | 5,300 | Fair | 50% | 96% | Very Slow (3:53/7M records) | Disk |
LDF spindle, MDF/NDF spindle | 5,000 | Good | 45% | ~100% Log, <5% MDF/NDF | Very Slow (1:45/2.5M records) | Disk |
MDF/NDF on Raid 0 Spindles, Log on SSD | 11,000 | Good | 100% | ~50% SSD, ~5% HDD | Very Slow (0:45/2.5M records) | CPU |
Raid 0 SSD, all db files | 11,000 | Good | 100% | 100% | Fast (0:03/4.5M records) | CPU |
MDF/NDF on SSD, Log on Raid 0 SSD | 11,000 | Good | 100% | Log @ 30%, MDF/NDF @ 5% | Fast (0:16/7.4M records) | CPU |
SQL via 100Mbit | 4,000 | Good | 20% App Serv/60% DBServ | 30% DB Serv | Fast (003:1M records) | Network |
SQL via 1000Mbit | 4,800 | Good | 24% App Serv/80% DBServ | 37% DB Serv | Fast (003:1M records) | Network |
SQL via 1000Mbit direct attach | 5,000 | Good | 25% App Serv/80% DBServ | 37% DB Serv | Fast (003:1M records) | Network |
Async Mirror (Primary=MDF/NDF on SSD, LDF on SSD. Fail-Over=All on SSD). | 9,200 | Good | 100% Primary, 55% Fail-over | 30% Primary, 97% Fail-over | Fast (0:14:3M records) | CPU |
Sync Mirror (Primary=MDF/NDF on SSD, LDF on SSD. Fail-Over=All on SSD). | 2,900 | Good | 40% Primary, 30% Fail-over | 4% Primary, 90% Fail-over | Fast (0:14:6M records) | Network |
Cxn on 3770K, SQL on 4770K, Gigabit, All SSD | 15,000 | Good | 80% Cxn, 60% SQL | Log @30%, MDF/NDF @ 5% | Fast (0:20:8M records) | ? |
In both the single spindle and Raid 0 spindle tests, the disk IO was the limiting resource. Once the CPU was saturated, the speed was fixed regardless of disk configuration. Placing the database log file on a separate drive provided the biggest increase in throughput performance, however, with the MDF/NDF files on spindles, the index rebuild was still very slow. Splitting the MDF/NDF onto an SSD and the log file onto an SSD provided the most headroom for the system, and most likely would provide enough throughput to saturate a second CPU.
The log file saw the highest write speeds, capping at about 30MB/s. This translates into about 2.3 terabytes of data written for a fully saturated 4 core system over 24 hours. Most enterprise SSDs, such as the Intel S3700 allow 10 complete drive writes per day with a lifespan of 5 years. Two 512 GB SSDs in a RAID 1 configuration would provide a minimum 10 years of life for a fully saturated 4-core system. On an 8-core system, this lifespan would be reduced to a minimum of 5 years. On our test setup, we are processing 65 million messages per hour (1.5 billion msgs per day) which should be adequate.
The data and index files benefit greatly from SSDs as well, as this minimized any database locking during maintenance. Furthermore it should have much better performance when querying for messages via the queue UI, as an SSD can read and write simultaneously whereas a spindle only operates serially. Assuming that the database is going to be a medium size (up to 2 TB), a RAID 10 array of SSDs (4 x 1TB, 8 x 512GB, etc) could be used. SSDs are increasing in capacity quite quickly so 2+ TB drive are probably not far away.
I would think think that with decent hardware, the following should be considered for performance (from most important to least):
Connexion and SQL Server on the same OS, if there is sufficient CPU/Disk headroom. If a second box is available for SQL Server, it should have similar specs/clock as the Cxn server.
LOG file on a non-shared, fast drive (queuing and processing performance).
MDF/NDF on a non-shared, fast drive (database maintenance performance).
If the CPU is now the limiting factor, increase the core count when the number of channels is greater than the number of cores, or increase the clock speed if the number of channels is equal to or less than the number of cores.
Ensure that SQL Server has adequate memory allocated (8 GB total system memory appears to be adequate for smaller systems).
Networking impact
When Connexion was tested with SQL Server on a separate physical machine, performance was cut by roughly 55%, when using a consumer-grade gigabit network. Interestingly, it appears that we are nowhere near saturating the ethernet pipe (pushing 110Mbit combined) and so moving to a lower latency platform could move the bottleneck back to the disks and/or cpu. There is some expected CPU overhead and latency as the packet stream is written and read, but the actual performance drop is quite significant. If adequate hardware is available, it makes sense to host both Connexion and SQL Server on the same physical machine.
Update:
We added a new computer for testing (i7 4770K, 16GB RAM, SSDs) which was provisioned as a SQL Server for Nick's dev box (i7 3770K, 8GB, SSDs). This setup saw the queueing-only speed drop from ~13K/s on a single box to about ~11K/s using both. Neither CPU was breaking a sweat, nor were the disks. I suspect this is the impact of gigabit network latency, as the throughput numbers were well below the theoretical maximum. As soon as both queueing and processing were happening simultaneously, the single-box setup was limited by available CPU and the total throughput dropped to about 9K/s. In the dual-box setup, the Cxn server hit about 80% CPU and the SQL server about 50% and the processing rate increased to 15K/s.
There appears to be about a 15% performance penalty when moving to a two-box setup using consumer-grade gigabit equipment. This does, however, allow us to scale the CPU, memory, and disks horizontally and therefore end up with a significantly higher overall throughput. It does appear that the CPU of the SQL Server host needs to be similar to that used by the Connexion host. When SQL Server was hosted on a slower CPU, but with fast disks and gigabit ethernet, the processing speed dropped significantly (and in this test, none of the SQL Server host's resources were maxed out).
I would conclude that it is probably still more cost-effective and 'management-effective' to scale vertically to the 2-socket level of hardware, with quality CPUs, than to use two single-socket servers with the same CPUs (ie, Cxn + SQL on 2 x Xeon, vs Cxn on 1 x Xeon + SQL on 1 x Xeon). This may not be the case, however, if your Cxn servers are CPU bound but you have lots of headroom on your SQL host. In this case, it makes sense to add Cxn boxes until the SQL host is saturated. Then provision new additional SQL hosts as required.
Thoughts
Connexion's performance depends almost entirely on the workload (channels and devices). We've seen some impressive performance numbers on our in-house development machines (quad-core i7) with local SQL Server instances using relatively light test channels. It seems likely that commodity servers (single quad-core xeon) with SSDs should be able to handle reasonable loads in an extremely cost-effective way. In single-socket systems with SSDs, I would expect CPU to be the limiting resource, and so a move to a dual socket architecture and processors with more cores should provide an easy and economical upgrade path.
Low-end server | Cost (USD) |
|---|---|
Dell Single Xeon | 6,500 |
SSDs | 4,600 |
Software | Included |
Total Cost | 11,100 |
High-end server | Cost (USD) |
|---|---|
Dell Dual Xeon | 6,700 |
SSDs | 4,600 |
Software (SQL) | 3,000 |
Total Cost | 14,300 |
Individual Tests:
Single Spindle (All Files)
7200 RPM spindle (WD Caviar green 1TB).
Max Throughput | 3200 |
Consistency | Fair |
CPU | ~30% |
Disk | 100% |
Index Rbld | more than 5 minutes with 7M |
Dual Spindle (Raid 0, All Files)
7200 RPM spindle (WD Caviar green 1TB) x 2 in RAID 0.
Max Throughput | 5300 |
Consistency | Fair |
CPU | ~50% |
Disk | 96% |
Index Rbld | 3:53/7M records |
Two Spindles
7200 RPM spindle (WD Caviar green 1TB) x 2, Log file on one, MDF/NDF on the other.
Max Throughput | 5000 |
Consistency | Good |
CPU | 45% |
Disk | ~100% Log, <5% MDF/NDF |
Index Rbld | 1:45/2.5M records |
MDF & NDF on Raid 0 Spindles, Log on SSD
7200 RPM spindle (WD Caviar green 1TB) x 2 in RAID 0 hosting MDF and NDF only. Log file on Raid 0 SSD (2 x Samsung 840 Pro 125GB).
Max Throughput | 11000 |
Consistency | Good |
CPU | 100% |
Disk | ~50% SSD, ~5% HDD |
Index Rbld | 0:45/2.5M records |
Raid 0 SSD Only
All DB files on Raid 0 SSD (2 x Samsung 840 Pro 125GB).
Max Throughput | 11000 |
Consistency | Good |
CPU | 100% |
Disk | ~100% SSD |
Index Rbld | 0:03/4.5M records |
Raid 0 SSD Only LOG, SSD for MDF/NDF
Log on Raid 0 SSD (2 x Samsung 840 Pro 125GB), MDF/NDF on 64GB OCZ Vertex 3.
Max Throughput | 11000 |
Consistency | Good |
CPU | 100% |
Disk | Log @ 30%, MDF/NDF @ 5% |
Index Rbld | 0:16/7.4M records |
100 Mb Network (with switch)
Sql server running on an i3 530 connected via 100 megabit switch.
Max Throughput | 4000 |
Consistency | Good |
CPU (app serv) | 20% |
CPU (db serv) | 60% |
Disk (db serv) | 30% |
Index Rbld | 0:03/1M records |
DB Server | APP Server |
|---|---|
1000 Mb Network
Sql server running on an i3 530 connected via 1000 megabit switch.
Max Throughput | 4800 |
Consistency | Good |
CPU (app serv) | 24% |
CPU (db serv) | 80% |
Disk (db serv) | 37% |
Index Rbld | 0:03/1M records |
DB Server | App Server |
|---|---|
1000 Mb Network Direct connection (no switch)
Interestingly, it appeared that there was a very slight speed bump when bypassing the switch altogether, although the 4% increase may have been due to other factors.
Jumbo Packets
No difference in speed was observed using jumbo packets.
Database Mirroring (SQL 2012) Async
Database mirroring was setup between Nick's dev machine (Cxn + Sql Server) and the VM server (i3, 8GB, SSDs).
Max Throughput | 9200 |
Consistency | Good |
CPU (app serv) | 100% |
CPU (db serv) | 55% |
Disk | 30% Primary, 97% Fail-over |
Index Rbld | 0:14/3M records |
It appears that SQL Server requires significantly more CPU with mirroring enabled. When queuing only, we see the cpu spike to 35-50% vs the 10-15% for an unmirrored database. Disk writes on the primary don't appear to be any higher than normal. The fail-over sees quite high disk-write speeds. It appears to average 70-100 MB/s with spikes in the high-100's; high enough to warrant an SSD. It appeared (from the database mirroring monitor) that the fail-over instance was able to keep up quite well as the unsent log size and unrestored log size seemed to stay at 0 most of the time.
It would appear that CPU is the limiting factor in the processing speed, with SQL Server stealing more CPU away from Connexion.
Primary | Fail-over |
|---|---|
Database Mirroring (SQL 2012) Sync
Database mirroring was setup between Nick's dev machine (Cxn + Sql Server) and the VM server (i3, 8GB, SSDs).
Max Throughput | 2,900 |
Consistency | Good |
CPU (app serv) | 40% |
CPU (db serv) | 30% |
Disk | 4% Primary, 90% Fail-over |
Index Rbld | 0:14/6M records |
Primary | Fail-Over |
|---|---|