Server Fault Tolerance Opinions

JaseVaughn

Baseband Member
Messages
24
As you can see from the title of this thread I would like to get opinions on a server fault tolerance solution that I am creating for my company.

Here is the scope.

We are implementing a new Time Management System at my company. The software is from a company called Replicon. The software is web based and is going to be running off of IIS 7 on Windows Server 2008 Enterprise Servers. It ties into a backend Database running SQL Server 2005 Enterprise. All systems involved in this setup are running Windows Server 2008 Enterprise.

The CIO has asked me to create this system with fault tolerance and has suggested that we use The Failover Clustering in Windows Server 2008. His plan for these systems is as follows: 3 Systems at our office location 2 Web application servers and a Database server. Then at our remote DR site the other database server and another web application server. So 5 Systems total. He would like to have the 3 web application servers in a Server 2008 Failover Cluster which would involve the one that's at the DR Site to communicate over a WAN link to the servers here at our office. The Database servers of course would be in a SQL Server 2005 Cluster replicating over the WAN link.

Now for my thoughts. As far as the SQL side of this project is concerned I fully agree with the solution stated above since the all data for the application is housed in the SQL Server Database having one here at the office and one at the DR Site in a cluster makes perfect sense and I support that solution.

My thoughts on the Web Application Servers are different. I disagree with putting those three machines in a Server 2008 Failover Cluster for some obvious reasons.

1. Although Server 2008 has made it much easier to create Failover Clusters over a WAN Link at geographically disperse locations since you don't have to extend the VLAN the servers are on. It's still a lot of configuration and work involved to make it work correctly. Also you still need to buy third party replication software such as Double-Take to do replication over the WAN Link.

2. These Servers are Web Application Servers and are going to be serving up client requests constantly to enter time and do various tasks in the time management system. If these 3 systems are in a Server 2008 Failover Cluster only one of these servers will be serving up client requests while the other two just sit on stand by listening to the heartbeat of each other resulting in a large waste of processing power.

What I propose for the Web Application Servers is the 2 on site be placed in a Server 2008 Network Load Balancing configuration. Why? Because this solution seems to make the most practical sense. Since we are working with Web Application Servers that will be taking client requests the NLB will split the load between these two servers resulting in better performance and the processing power of these servers being utilized. Also this solution will add fault tolerance so if one of the machines goes down the other one will keep serving up requests until we bring it back on-line. Also if we have to maintenance one of the servers you can use the Drainstop command in NLB to on the fly transfer any of the client connections on that server to the other one and then suspend the server that you want to do work on. Resulting in zero down time on users if I have to do something during normal operation hours. As much as I love sitting in an empty office building at 9 PM I'll pass.

As for the other Application Server easy have that one at the DR site connected and talking to the Database Server that's there with it. So it will be on standby and ready to go in the event of a Major disaster and I'll just make a DNS Change to direct clients to that one if for some reason I lost the other two. Remember since the Database Servers are Cluster over the WAN link and replication of the information is taking place all the company data will be there at that site I just have to point clients to use the web server if something were to happen.

Anyway this is my thoughts on this. I welcome your ideas and thoughts on this matter. :)
 
Hmm, I don't see the point of the 2 web servers at location A for a simple timesheet system

are you expecting a lot of traffic?
if you are then as you say NLB is the only way to go,
of course having 2 servers does allow for downtime as well, which can only be a bonus.
though I do have to ask, if you're expecting a lot of traffic, why not have 2 servers at your DR site as well?


as for the database,
you're only having 2, and clustering them across sites? I'd say that this was a pretty bad idea as a drop in the inter site connection could lead to the fail over server attempting to take on the master role assuming that the current server has failed.
plus the only way I can think that you'd be able to use shared storage would be using iSCSI, this doesn't work over NAT, and I don't think it'd be reliable over a VPN between the sites either,

I think that your better solution is to use 6 servers.

2 web front end, (if you feel that you need these) and one at the DR site

then 3 database servers,
2 clustered servers at your main site, then a DR server where the data is log shipped out to the DR site from the main servers.
 
Yes, I will be expecting a lot of traffic because these servers will be hosting other front end web applications as well. That's why I care about the use of the processing power and the fault tolerance.

As for your clustering statment. If I had just two database servers configured in a cluster over the wan link. If for some strange reason it were to drop transfer of services couldn't take place in the first place if that happen because the server here would hold those and if the link went down how would my database server here transfer them to the one at the DR Site? Now in a real disaster I would be already on my way to the DR site to do any configuration that I need. The main reason of having that Disaster Recovery site is if this site were to go down. The main reason for the cluster is to have the database's replicating the information. Also in the event that one here went down atleast I could still direct clients to use that until I restored the one here. At degraded performance of course but nevertheless.
 
I've probably missed something...

but in order to cluster the database servers you'd have to have a shared data source?

how do you intend to do this over the WAN?

I understand that you plan to do the heartbeat for the servers over the WAN, (possibly via VPN), but in practise this means that if the WAN link drops the heart beat drops and the passive node will start up as the active node, (now you have two active node because both assume that with no heartbeat the other node is dead).

Also, generally best advice is that heartbeats are done either directly with a crossover cable, or using their own dedicated hub. you might find that the WAN link has too much latency for the heartbeat to work properly, so you may find servers attempting to grab the active roll assume that the other server has failed anyway.

I see what you are saying, usually when you shut down the active node, it initiates fail over, but the passive node will (when connectivity to the active node fails) assume that the active node has a power failure and assume responsibility as the active node.


when you talk about replication do you mean what I said when I said log shipping to replicate the data?

i.e

you have your web front end cluster and backend database

which are completely independent of your DR site?

the backend DB server replicates to the datacentre so that in the event of failure at your main site you can just transfer the DNS records to the DR site and resume service as normal?

in this case, it wouldn't be a failover cluster that I'd first assumed that you meant.
 
Back
Top Bottom