Hardware & Performance Observations

Over the last week I have been running Connexion with different hardware configurations to help pin down some platform recommendations. The base system is an ivy-bridge 3770K CPU with 8G 1600 RAM (quad-core desktop). Connexion and SQL Server standard are both running on the same box.

Channels used

Disk tests were performed with 4 identical channel consisting of a file reader, queue, HL7 transform, and file writer.

There are several measures of interest:

Steady-state throughput: What is the maximum throughput of Connexion when queueing and processing multiple channels. What are the resource bottlenecks?
Steady-state consistency: Is the performance consistent during processing or do the speeds fluctuate?
DB Locking: Maintenance and message querying during processing can cause database locks and slowdowns.

Summary

Test	Throughput messages/sec	Consistency	CPU%	Disk%	Index Rebuild	Limit

Test	Throughput messages/sec	Consistency	CPU%	Disk%	Index Rebuild	Limit
Single spindle, all db files	3,200	Fair	30%	100%	Very Slow (>5m for 7m records)	Disk
Raid 0 Spindles, all db files	5,300	Fair	50%	96%	Very Slow (3:53/7M records)	Disk
LDF spindle, MDF/NDF spindle	5,000	Good	45%	~100% Log, <5% MDF/NDF	Very Slow (1:45/2.5M records)	Disk
MDF/NDF on Raid 0 Spindles, Log on SSD	11,000	Good	100%	~50% SSD, ~5% HDD	Very Slow (0:45/2.5M records)	CPU
Raid 0 SSD, all db files	11,000	Good	100%	100%	Fast (0:03/4.5M records)	CPU
MDF/NDF on SSD, Log on Raid 0 SSD	11,000	Good	100%	Log @ 30%, MDF/NDF @ 5%	Fast (0:16/7.4M records)	CPU
SQL via 100Mbit	4,000	Good	20% App Serv/60% DBServ	30% DB Serv	Fast (003:1M records)	Network
SQL via 1000Mbit	4,800	Good	24% App Serv/80% DBServ	37% DB Serv	Fast (003:1M records)	Network
SQL via 1000Mbit direct attach	5,000	Good	25% App Serv/80% DBServ	37% DB Serv	Fast (003:1M records)	Network
Async Mirror (Primary=MDF/NDF on SSD, LDF on SSD. Fail-Over=All on SSD).	9,200	Good	100% Primary, 55% Fail-over	30% Primary, 97% Fail-over	Fast (0:14:3M records)	CPU
Sync Mirror (Primary=MDF/NDF on SSD, LDF on SSD. Fail-Over=All on SSD).	2,900	Good	40% Primary, 30% Fail-over	4% Primary, 90% Fail-over	Fast (0:14:6M records)	Network
Cxn on 3770K, SQL on 4770K, Gigabit, All SSD	15,000	Good	80% Cxn, 60% SQL	Log @30%, MDF/NDF @ 5%	Fast (0:20:8M records)	?

In both the single spindle and Raid 0 spindle tests, the disk IO was the limiting resource. Once the CPU was saturated, the speed was fixed regardless of disk configuration. Placing the database log file on a separate drive provided the biggest increase in throughput performance, however, with the MDF/NDF files on spindles, the index rebuild was still very slow. Splitting the MDF/NDF onto an SSD and the log file onto an SSD provided the most headroom for the system, and most likely would provide enough throughput to saturate a second CPU.

The log file saw the highest write speeds, capping at about 30MB/s. This translates into about 2.3 terabytes of data written for a fully saturated 4 core system over 24 hours. Most enterprise SSDs, such as the Intel S3700 allow 10 complete drive writes per day with a lifespan of 5 years. Two 512 GB SSDs in a RAID 1 configuration would provide a minimum 10 years of life for a fully saturated 4-core system. On an 8-core system, this lifespan would be reduced to a minimum of 5 years. On our test setup, we are processing 65 million messages per hour (1.5 billion msgs per day) which should be adequate.

The data and index files benefit greatly from SSDs as well, as this minimized any database locking during maintenance. Furthermore it should have much better performance when querying for messages via the queue UI, as an SSD can read and write simultaneously whereas a spindle only operates serially. Assuming that the database is going to be a medium size (up to 2 TB), a RAID 10 array of SSDs (4 x 1TB, 8 x 512GB, etc) could be used. SSDs are increasing in capacity quite quickly so 2+ TB drive are probably not far away.

I would think think that with decent hardware, the following should be considered for performance (from most important to least):

Connexion and SQL Server on the same OS, if there is sufficient CPU/Disk headroom. If a second box is available for SQL Server, it should have similar specs/clock as the Cxn server.
LOG file on a non-shared, fast drive (queuing and processing performance).
MDF/NDF on a non-shared, fast drive (database maintenance performance).
If the CPU is now the limiting factor, increase the core count when the number of channels is greater than the number of cores, or increase the clock speed if the number of channels is equal to or less than the number of cores.
Ensure that SQL Server has adequate memory allocated (8 GB total system memory appears to be adequate for smaller systems).

Networking impact

When Connexion was tested with SQL Server on a separate physical machine, performance was cut by roughly 55%, when using a consumer-grade gigabit network. Interestingly, it appears that we are nowhere near saturating the ethernet pipe (pushing 110Mbit combined) and so moving to a lower latency platform could move the bottleneck back to the disks and/or cpu. There is some expected CPU overhead and latency as the packet stream is written and read, but the actual performance drop is quite significant. If adequate hardware is available, it makes sense to host both Connexion and SQL Server on the same physical machine.

Update:

We added a new computer for testing (i7 4770K, 16GB RAM, SSDs) which was provisioned as a SQL Server for Nick's dev box (i7 3770K, 8GB, SSDs). This setup saw the queueing-only speed drop from ~13K/s on a single box to about ~11K/s using both. Neither CPU was breaking a sweat, nor were the disks. I suspect this is the impact of gigabit network latency, as the throughput numbers were well below the theoretical maximum. As soon as both queueing and processing were happening simultaneously, the single-box setup was limited by available CPU and the total throughput dropped to about 9K/s. In the dual-box setup, the Cxn server hit about 80% CPU and the SQL server about 50% and the processing rate increased to 15K/s.

There appears to be about a 15% performance penalty when moving to a two-box setup using consumer-grade gigabit equipment. This does, however, allow us to scale the CPU, memory, and disks horizontally and therefore end up with a significantly higher overall throughput. It does appear that the CPU of the SQL Server host needs to be similar to that used by the Connexion host. When SQL Server was hosted on a slower CPU, but with fast disks and gigabit ethernet, the processing speed dropped significantly (and in this test, none of the SQL Server host's resources were maxed out).

I would conclude that it is probably still more cost-effective and 'management-effective' to scale vertically to the 2-socket level of hardware, with quality CPUs, than to use two single-socket servers with the same CPUs (ie, Cxn + SQL on 2 x Xeon, vs Cxn on 1 x Xeon + SQL on 1 x Xeon). This may not be the case, however, if your Cxn servers are CPU bound but you have lots of headroom on your SQL host. In this case, it makes sense to add Cxn boxes until the SQL host is saturated. Then provision new additional SQL hosts as required.

Thoughts

Connexion's performance depends almost entirely on the workload (channels and devices). We've seen some impressive performance numbers on our in-house development machines (quad-core i7) with local SQL Server instances using relatively light test channels. It seems likely that commodity servers (single quad-core xeon) with SSDs should be able to handle reasonable loads in an extremely cost-effective way. In single-socket systems with SSDs, I would expect CPU to be the limiting resource, and so a move to a dual socket architecture and processors with more cores should provide an easy and economical upgrade path.

Low-end server	Cost (USD)

Low-end server	Cost (USD)
Dell Single Xeon	6,500
SSDs	4,600
Software	Included
Total Cost	11,100

High-end server	Cost (USD)

High-end server	Cost (USD)
Dell Dual Xeon	6,700
SSDs	4,600
Software (SQL)	3,000
Total Cost	14,300

Individual Tests:

Single Spindle (All Files)

7200 RPM spindle (WD Caviar green 1TB).

Max Throughput	3200
Consistency	Fair
CPU	~30%
Disk	100%
Index Rbld	more than 5 minutes with 7M

Dual Spindle (Raid 0, All Files)

7200 RPM spindle (WD Caviar green 1TB) x 2 in RAID 0.

Max Throughput	5300
Consistency	Fair
CPU	~50%
Disk	96%
Index Rbld	3:53/7M records

Two Spindles

7200 RPM spindle (WD Caviar green 1TB) x 2, Log file on one, MDF/NDF on the other.

Max Throughput	5000
Consistency	Good
CPU	45%
Disk	~100% Log, <5% MDF/NDF
Index Rbld	1:45/2.5M records

MDF & NDF on Raid 0 Spindles, Log on SSD

7200 RPM spindle (WD Caviar green 1TB) x 2 in RAID 0 hosting MDF and NDF only. Log file on Raid 0 SSD (2 x Samsung 840 Pro 125GB).

Max Throughput	11000
Consistency	Good
CPU	100%
Disk	~50% SSD, ~5% HDD
Index Rbld	0:45/2.5M records

Raid 0 SSD Only

All DB files on Raid 0 SSD (2 x Samsung 840 Pro 125GB).

Max Throughput	11000
Consistency	Good
CPU	100%
Disk	~100% SSD
Index Rbld	0:03/4.5M records

Raid 0 SSD Only LOG, SSD for MDF/NDF

Log on Raid 0 SSD (2 x Samsung 840 Pro 125GB), MDF/NDF on 64GB OCZ Vertex 3.

Max Throughput	11000
Consistency	Good
CPU	100%
Disk	Log @ 30%, MDF/NDF @ 5%
Index Rbld	0:16/7.4M records

100 Mb Network (with switch)

Sql server running on an i3 530 connected via 100 megabit switch.

Max Throughput	4000
Consistency	Good
CPU (app serv)	20%
CPU (db serv)	60%
Disk (db serv)	30%
Index Rbld	0:03/1M records

DB Server	APP Server

DB Server	APP Server

1000 Mb Network

Sql server running on an i3 530 connected via 1000 megabit switch.

Max Throughput	4800
Consistency	Good
CPU (app serv)	24%
CPU (db serv)	80%
Disk (db serv)	37%
Index Rbld	0:03/1M records

DB Server	App Server

DB Server	App Server

1000 Mb Network Direct connection (no switch)

Interestingly, it appeared that there was a very slight speed bump when bypassing the switch altogether, although the 4% increase may have been due to other factors.

Jumbo Packets

No difference in speed was observed using jumbo packets.

Database Mirroring (SQL 2012) Async

Database mirroring was setup between Nick's dev machine (Cxn + Sql Server) and the VM server (i3, 8GB, SSDs).

Max Throughput	9200
Consistency	Good
CPU (app serv)	100%
CPU (db serv)	55%
Disk	30% Primary, 97% Fail-over
Index Rbld	0:14/3M records

It appears that SQL Server requires significantly more CPU with mirroring enabled. When queuing only, we see the cpu spike to 35-50% vs the 10-15% for an unmirrored database. Disk writes on the primary don't appear to be any higher than normal. The fail-over sees quite high disk-write speeds. It appears to average 70-100 MB/s with spikes in the high-100's; high enough to warrant an SSD. It appeared (from the database mirroring monitor) that the fail-over instance was able to keep up quite well as the unsent log size and unrestored log size seemed to stay at 0 most of the time.

It would appear that CPU is the limiting factor in the processing speed, with SQL Server stealing more CPU away from Connexion.

Primary	Fail-over

Primary	Fail-over

Database Mirroring (SQL 2012) Sync

Database mirroring was setup between Nick's dev machine (Cxn + Sql Server) and the VM server (i3, 8GB, SSDs).

Max Throughput	2,900
Consistency	Good
CPU (app serv)	40%
CPU (db serv)	30%
Disk	4% Primary, 90% Fail-over
Index Rbld	0:14/6M records

Primary	Fail-Over

Primary	Fail-Over