Archive for the ‘Sysadmin Tales’ Category

Move to NCP

Chris Twa | August 31st, 2010 | No Comments »

Over the weekend we completed a number of upgrades:

  • Main storage is now based on Nexenta Core 3.0
  • zpools were upgraded
  • System memory on main storage was increased to 12GB
  • Additional gigabit ethernet interfaces were installed and aggregated (LACP layer 4) for the storage area network
  • A new storage switch was deployed
  • STP root was reassigned
  • CPUs were upgraded on one virtual host server

The performance increase has been fantastic and we’re still not done!  Our new array (two seven U320 disk raidz2 arrays) isn’t in use yet.  We have four SSDs to play around with and lots of benchmarks to run.  We’ll be trying various ZFS tricks like using an SSD for the ZIL and also playing with L2ARC.

Getting past the hotmail spam filter

Chris Twa | August 17th, 2010 | No Comments »

A client was having some problems sending to @hotmail.com addresses.  They weren’t on any RBLs or anything that bad, but nonetheless ALL their messages were going to the Junk Folder.  At an onsite consulting visit with this Saskatoon business, Saskaweb found the following resources  to check when delivering email to @hotmail.com:

http://mail.live.com/mail/troubleshooting.aspx

http://mail.live.com/mail/services.aspx

In short – make sure you have an SPF record!  We’re still waiting for hotmail to realize that we’ve added an SPF, but all of our checks now look good.  Fingers crossed!

OpenSolaris is dead

Chris Twa | August 14th, 2010 | 3 Comments »

Days after we ordered parts to optimize our OpenSolaris system (including 4 new SSDs), Oracle has pulled the plug.  This is quite a letdown for Saskaweb.

Where to next?  Should we move over to Nexenta?  Although we’ve been very happy with our OSOL server, ongoing support (at least community driven) is a a requisite.

HyperDB

Chris Twa | August 9th, 2010 | No Comments »

This is my first post after adding hyperdb to this blog.  Doesn’t look any different?  Good!

Our master MySQL server is now averaging 30 queries per second and I’m looking to spread some of that load over replicated spares.  (We switched from a cluster to a replicated setup about a month ago.)  After a bit of reading, I decided to give hyperdb a try.  The process wasn’t too bad – though I wish there was a bit more documentation.  Hint:  Make sure PHP error reporting is sent to your apache logs and keep an eye.  After changing the config, hyperdb works fine and is (hopefully) spreading the load over our replicated spares.

Parts ordered for ZFS expansion

Chris Twa | August 3rd, 2010 | 2 Comments »

We’ve been impressed with ZFS and will add another terabyte and lots of spindles. We’ll take advantage of the downtime to add two more network interfaces which will be bonded for greater bandwidth. With luck, that’ll be online by the end of August.

Our current storage switch is an older Cisco which doesn’t support LACP, so that’ll have to be replaced as well. From what I’ve read, it doesn’t look like PaGP is really supported with OpenSolaris. It’d be nice to stick with Cisco but a new gigabit Cisco just isn’t in the budget.  Ah well, it’s a single VLAN so it shouldn’t be that hard to handle for a non-Cisco switch.  We have a nice 3com on the shelf that should be up to the challenge.

So after the expansion, our ZFS rig will have 4x1TB SATA in a raidz1, 7x73GB 15k u320 in a raidz2, and 7x143GB 10k u320 in another raidz2.

The plan will be:

Large files will relatively little i/o (some of our offsite backup files) and zfs backups from some client vms will be stored on the 3TB SATA array.  Our little four SATA disk raidz can’t produce a lot of iops so we don’t want to stick things like live vm’s on it.

We’ll be moving one client’s small business server over to the 7x73GB array and see how the performance is.  It’d be nice to get more than one exchange server per array, but Exchange is a real pig for disk i/o.  Even when nothing seems to be happening, the lights are all blinking.  We’ll probably move our webstore to the 7x146GB array although it’s not really hurting now so we’ll have to see.

If this all works out, it’ll be a pretty inexpensive and fast little SAN.  A terabyte of storage over 14 u320 spindles with all the benefits of ZFS for under $1000.  I don’t want to get too far ahead of myself, but I’d really like to beat a Dell MD3000i with this ZFS rig.  Can’t wait until the parts arrive!

Switch from MySQL Cluster to Replication

Chris Twa | July 25th, 2010 | No Comments »

We’ve switched from MySQL cluster to replication.  We found that although the cluster gave us excellent throughput, a few cluster crashes combined with lengthy startup times forced the issue.

During our move, we changed to a master/slave replicated structure.  We will be trying out various technologies to load balance sql queries in the near future.

New space is now being used

Chris Twa | July 16th, 2010 | No Comments »

It certainly wasn’t as seamless as we wanted, but the server move has been completed.

Most of our servers have been moved but we are maintaining some networking gear and a backup server in the original space.

Although we had a lot of trouble with bringing an Exchange server online, our legacy email was operational within two hours.  This included moving locations and changing our database structure from clustered to replicated.  So that part went well.

What didn’t go well was Exchange panicking over some active directory work we decided to do within our scheduled downtime.  That was a mistake — we shouldn’t have tried to get too much accomplished within the maintenance window.  The result was too many variables which made for difficult troubleshooting.

Another mistake that we made was focusing all of our attention on this Exchange problem.  We should’ve taken a break from this and bring other services online.  It took only thirty minutes to bring legacy flat web pages online, but this process wasn’t started until we had turned the corner on our Exchange problem.  We hadn’t prepared a triage plan, which was a mistake.

Back to the good news:

The cooling in the new space is much better.

Cooling comparison - old space versus new

This server was in a stubborn hot spot with our old room.  Notice the end of daily temperature fluctuations — our old system relied on building air circulation which was shut off at night.  The new system uses chilled water and the temperature control is much tighter.  With our climate the chilled water system will get ‘free cooling’ in the winter which means lower electricity use which is good for everyone!

Additional layer three functionality (new toys)

Chris Twa | July 2nd, 2010 | No Comments »

In a previous article we talked about linking two spaces via L2TPv3.  We’ve decided to drop this in favour of layer 3 management.

We wanted to add additional layer 3 management for a while now.

  • Faster intervlan routing.  Why kill all those CPU cycles on our router when it can be handled by a layer 3 switch.
  • Better QoS control for colocated servers.  We’ve been providing network access to our clients via NAT.  Although this has fit the requirements of our current clients, we wanted to be able to offer WAN IPs on our client colocated servers.  In order to provide QoS without NAT, we needed more equipment.
  • More sensible topology.  We’ll be using OSPF to better manage the unique needs of our clients and our two spaces.  By going with layer 3, we’ll segment our networks to ease management and add redundancy.

Additional server space online in mid July

Chris Twa | July 2nd, 2010 | No Comments »

We’ve secured additional hosting center space in Innovation Place’s Concourse building in Saskatoon.  The room isn’t huge but it will allow us some additional possibilities.

The biggest advantage of this additional space is a chilled-water based cooling system that will allow us greater density and lower our electrical power requirements.  We’re really trying to “green” up our business and this new system will be much more efficient for our hosting center.

Here are a few things we’re hoping to add with this second space:

  • Failover router for our primary ISP.  Due to constraints with one of our clients this project probably won’t happen until September.  We’re planning on configuring a router in each space as a redundant, failover pair using HSRP and OSPF.
  • More servers. Of course!  We’ll be moving most our existing equipment to this space although the long term plan will be to have redundant SANs in either space.  As above, we’re waiting on a client and probably won’t be setting up the second SAN until the fall.
  • More reliable service. Cooling and power concerns have pressured us to gain this additional space. We expect downtime to decrease.

Testing OpenSolaris for VM storage

Chris Twa | April 9th, 2010 | No Comments »

We’re  currently testing a ZFS-based fileserver for VM storage.  Everyone was right — hardware compatibility is a problem with OpenSolaris.

Aside from the hardware compatibility, the OpenSolaris install is very easy.

It took me about five minutes to fall in love with ZFS.  Wow — I can’t believe how easy it is to manage disks.  NFS was similarly a snap and we now have a few test VMs running on them with VMWare.  All in all, I’m pretty pleased with OpenSolaris and I think we’ll be expanding its use once we get some more hardware.