Hardware & Performance Observations

Hardware & Performance Observations

Over the last week I have been running Connexion with different hardware configurations to help pin down some platform recommendations. The base system is an ivy-bridge 3770K CPU with 8G 1600 RAM (quad-core desktop). Connexion and SQL Server standard are both running on the same box.

Channels used

Disk tests were performed with 4 identical channel consisting of a file reader, queue, HL7 transform, and file writer.

There are several measures of interest:

  1. Steady-state throughput: What is the maximum throughput of Connexion when queueing and processing multiple channels. What are the resource bottlenecks?

  2. Steady-state consistency: Is the performance consistent during processing or do the speeds fluctuate?

  3. DB Locking: Maintenance and message querying during processing can cause database locks and slowdowns.

Summary

Test

Throughput
messages/sec

Consistency

CPU%

Disk%

Index
Rebuild

Limit

Test

Throughput
messages/sec

Consistency

CPU%

Disk%

Index
Rebuild

Limit

Single spindle, all db files

3,200

Fair

30%

100%

Very Slow (>5m for 7m records)

Disk

Raid 0 Spindles, all db files

5,300

Fair

50%

96%

Very Slow (3:53/7M records)

Disk

LDF spindle, MDF/NDF spindle

5,000

Good

45%

~100% Log, <5% MDF/NDF

Very Slow (1:45/2.5M records)

Disk

MDF/NDF on Raid 0 Spindles, Log on SSD

11,000

Good

100%

~50% SSD, ~5% HDD

Very Slow (0:45/2.5M records)

CPU

Raid 0 SSD, all db files

11,000

Good

100%

100%

Fast (0:03/4.5M records)

CPU

MDF/NDF on SSD, Log on Raid 0 SSD

11,000

Good

100%

Log @ 30%, MDF/NDF @ 5%

Fast (0:16/7.4M records)

CPU

SQL via 100Mbit

4,000

Good

20% App Serv/60% DBServ

30% DB Serv

Fast (003:1M records)

Network

SQL via 1000Mbit

4,800

Good

24% App Serv/80% DBServ

37% DB Serv

Fast (003:1M records)

Network

SQL via 1000Mbit direct attach

5,000

Good

25% App Serv/80% DBServ

37% DB Serv

Fast (003:1M records)

Network

Async Mirror (Primary=MDF/NDF on SSD, LDF on SSD. Fail-Over=All on SSD).

9,200

Good

100% Primary, 55% Fail-over

30% Primary, 97% Fail-over

Fast (0:14:3M records)

CPU

Sync Mirror (Primary=MDF/NDF on SSD, LDF on SSD. Fail-Over=All on SSD).

2,900

Good

40% Primary, 30% Fail-over

4% Primary, 90% Fail-over

Fast (0:14:6M records)

Network

Cxn on 3770K, SQL on 4770K, Gigabit, All SSD

15,000

Good

80% Cxn, 60% SQL

Log @30%, MDF/NDF @ 5%

Fast (0:20:8M records)

?

In both the single spindle and Raid 0 spindle tests, the disk IO was the limiting resource. Once the CPU was saturated, the speed was fixed regardless of disk configuration. Placing the database log file on a separate drive provided the biggest increase in throughput performance, however, with the MDF/NDF files on spindles, the index rebuild was still very slow. Splitting the MDF/NDF onto an SSD and the log file onto an SSD provided the most headroom for the system, and most likely would provide enough throughput to saturate a second CPU.

The log file saw the highest write speeds, capping at about 30MB/s. This translates into about 2.3 terabytes of data written for a fully saturated 4 core system over 24 hours. Most enterprise SSDs, such as the Intel S3700 allow 10 complete drive writes per day with a lifespan of 5 years. Two 512 GB SSDs in a RAID 1 configuration would provide a minimum 10 years of life for a fully saturated 4-core system. On an 8-core system, this lifespan would be reduced to a minimum of 5 years. On our test setup, we are processing 65 million messages per hour (1.5 billion msgs per day) which should be adequate.

The data and index files benefit greatly from SSDs as well, as this minimized any database locking during maintenance. Furthermore it should have much better performance when querying for messages via the queue UI, as an SSD can read and write simultaneously whereas a spindle only operates serially. Assuming that the database is going to be a medium size (up to 2 TB), a RAID 10 array of SSDs (4 x 1TB, 8 x 512GB, etc) could be used. SSDs are increasing in capacity quite quickly so 2+ TB drive are probably not far away.

I would think think that with decent hardware, the following should be considered for performance (from most important to least):

  1. Connexion and SQL Server on the same OS, if there is sufficient CPU/Disk headroom. If a second box is available for SQL Server, it should have similar specs/clock as the Cxn server.

  2. LOG file on a non-shared, fast drive (queuing and processing performance).

  3. MDF/NDF on a non-shared, fast drive (database maintenance performance).

  4. If the CPU is now the limiting factor, increase the core count when the number of channels is greater than the number of cores, or increase the clock speed if the number of channels is equal to or less than the number of cores.

  5. Ensure that SQL Server has adequate memory allocated (8 GB total system memory appears to be adequate for smaller systems).

Networking impact

When Connexion was tested with SQL Server on a separate physical machine, performance was cut by roughly 55%, when using a consumer-grade gigabit network. Interestingly, it appears that we are nowhere near saturating the ethernet pipe (pushing 110Mbit combined) and so moving to a lower latency platform could move the bottleneck back to the disks and/or cpu. There is some expected CPU overhead and latency as the packet stream is written and read, but the actual performance drop is quite significant.  If adequate hardware is available, it makes sense to host both Connexion and SQL Server on the same physical machine.

Update:

We added a new computer for testing (i7 4770K, 16GB RAM, SSDs) which was provisioned as a SQL Server for Nick's dev box (i7 3770K, 8GB, SSDs). This setup saw the queueing-only speed drop from ~13K/s on a single box to about ~11K/s using both. Neither CPU was breaking a sweat, nor were the disks. I suspect this is the impact of gigabit network latency, as the throughput numbers were well below the theoretical maximum. As soon as both queueing and processing were happening simultaneously, the single-box setup was limited by available CPU and the total throughput dropped to about 9K/s. In the dual-box setup, the Cxn server hit about 80% CPU and the SQL server about 50% and the processing rate increased to 15K/s.

There appears to be about a 15% performance penalty when moving to a two-box setup using consumer-grade gigabit equipment. This does, however, allow us to scale the CPU, memory, and disks horizontally and therefore end up with a significantly higher overall throughput. It does appear that the CPU of the SQL Server host needs to be similar to that used by the Connexion host. When SQL Server was hosted on a slower CPU, but with fast disks and gigabit ethernet, the processing speed dropped significantly (and in this test, none of the SQL Server host's resources were maxed out).

I would conclude that it is probably still more cost-effective and 'management-effective' to scale vertically to the 2-socket level of hardware, with quality CPUs, than to use two single-socket servers with the same CPUs (ie, Cxn + SQL on 2 x Xeon, vs Cxn on 1 x Xeon + SQL on 1 x Xeon). This may not be the case, however, if your Cxn servers are CPU bound but you have lots of headroom on your SQL host. In this case, it makes sense to add Cxn boxes until the SQL host is saturated. Then provision new additional SQL hosts as required.

Thoughts

Connexion's performance depends almost entirely on the workload (channels and devices). We've seen some impressive performance numbers on our in-house development machines (quad-core i7) with local SQL Server instances using relatively light test channels. It seems likely that commodity servers (single quad-core xeon) with SSDs should be able to handle reasonable loads in an extremely cost-effective way. In single-socket systems with SSDs, I would expect CPU to be the limiting resource, and so a move to a dual socket architecture and processors with more cores should provide an easy and economical upgrade path.

 

Low-end server

Cost (USD)

Low-end server

Cost (USD)

Dell Single Xeon

6,500

SSDs

4,600

Software

Included

Total Cost

11,100

High-end server

Cost (USD)

High-end server

Cost (USD)

Dell Dual Xeon

6,700

SSDs

4,600

Software (SQL)

3,000

Total Cost

14,300

 

Individual Tests:

Single Spindle (All Files)

7200 RPM spindle (WD Caviar green 1TB).

Max Throughput

3200

Consistency

Fair

CPU

~30%

Disk

100%

Index Rbld

more than 5 minutes with 7M

Dual Spindle (Raid 0, All Files)

7200 RPM spindle (WD Caviar green 1TB) x 2 in RAID 0.

Max Throughput

5300

Consistency

Fair

CPU

~50%

Disk

96%

Index Rbld

3:53/7M records

Two Spindles

7200 RPM spindle (WD Caviar green 1TB) x 2, Log file on one, MDF/NDF on the other.

Max Throughput

5000

Consistency

Good

CPU

45%

Disk

~100% Log, <5% MDF/NDF

Index Rbld

1:45/2.5M records

MDF & NDF on Raid 0 Spindles, Log on SSD

7200 RPM spindle (WD Caviar green 1TB) x 2 in RAID 0 hosting MDF and NDF only. Log file on Raid 0 SSD (2 x Samsung 840 Pro 125GB).

Max Throughput

11000

Consistency

Good

CPU

100%

Disk

~50% SSD, ~5% HDD

Index Rbld

0:45/2.5M records

Raid 0 SSD Only

All DB files on Raid 0 SSD (2 x Samsung 840 Pro 125GB).

Max Throughput

11000

Consistency

Good

CPU

100%

Disk

~100% SSD

Index Rbld

0:03/4.5M records

 

Raid 0 SSD Only LOG, SSD for MDF/NDF

Log on Raid 0 SSD (2 x Samsung 840 Pro 125GB), MDF/NDF on 64GB OCZ Vertex 3.

Max Throughput

11000

Consistency

Good

CPU

100%

Disk

Log @ 30%, MDF/NDF @ 5%

Index Rbld

0:16/7.4M records

100 Mb Network (with switch)

Sql server running on an i3 530 connected via 100 megabit switch.

Max Throughput

4000

Consistency

Good

CPU (app serv)

20%

CPU (db serv)

60%

Disk (db serv)

30%

Index Rbld

0:03/1M records

DB Server

APP Server

DB Server

APP Server

1000 Mb Network

Sql server running on an i3 530 connected via 1000 megabit switch.

Max Throughput

4800

Consistency

Good

CPU (app serv)

24%

CPU (db serv)

80%

Disk (db serv)

37%

Index Rbld

0:03/1M records

DB Server

App Server

DB Server

App Server

1000 Mb Network Direct connection (no switch)

Interestingly, it appeared that there was a very slight speed bump when bypassing the switch altogether, although the 4% increase may have been due to other factors.

Jumbo Packets

No difference in speed was observed using jumbo packets.

Database Mirroring (SQL 2012) Async

Database mirroring was setup between Nick's dev machine (Cxn + Sql Server) and the VM server (i3, 8GB, SSDs).

Max Throughput

9200

Consistency

Good

CPU (app serv)

100%

CPU (db serv)

55%

Disk

30% Primary, 97% Fail-over

Index Rbld

0:14/3M records

It appears that SQL Server requires significantly more CPU with mirroring enabled. When queuing only, we see the cpu spike to 35-50% vs the 10-15% for an unmirrored database. Disk writes on the primary don't appear to be any higher than normal. The fail-over sees quite high disk-write speeds. It appears to average 70-100 MB/s with spikes in the high-100's; high enough to warrant an SSD. It appeared (from the database mirroring monitor) that the fail-over instance was able to keep up quite well as the unsent log size and unrestored log size seemed to stay at 0 most of the time.

It would appear that CPU is the limiting factor in the processing speed, with SQL Server stealing more CPU away from Connexion.

Primary

Fail-over

Primary

Fail-over

Database Mirroring (SQL 2012) Sync

Database mirroring was setup between Nick's dev machine (Cxn + Sql Server) and the VM server (i3, 8GB, SSDs).

Max Throughput

2,900

Consistency

Good

CPU (app serv)

40%

CPU (db serv)

30%

Disk

4% Primary, 90% Fail-over

Index Rbld

0:14/6M records

Primary

Fail-Over

Primary

Fail-Over