GlockTalk.com
Home Forums Classifieds Blogs Today's Posts Search Social Groups



  
SIGN-UP
Notices

Glock Talk
Welcome To The Glock Talk Forums.
Reply
 
Thread Tools Display Modes
Old 05-28-2010, 17:25   #1
betyourlife
on a GLOCK
 
betyourlife's Avatar
 
Join Date: May 2004
Location: Seattle, WA
Posts: 10,553
Using SAN's with virtual servers: doesn't it get in the way of DR?

So you can create highly fault tolerant systems with clusters, SAN's etc. But what happens when the building falls down? You really need to have servers that are geographically in seperate locations.

It seems to me that SAN's complicate having geographically seperate virtual servers so that if one location is destroyed, the other location picks up for it.

So why not just eliminate the SAN and have everything run on clustered servers, with replication to other virtual machines in geographically seperate parts of the world?

A SAN seems great when we're just talking about a cluster in one facility with not concern for disaster recovery, but it seems that it complicates things greatly when planning disaster recovery/business continuity.

It seems that it would simplify things greatly from a business continouity perspective.

Last edited by betyourlife; 05-28-2010 at 17:27..
betyourlife is offline   Reply With Quote
Old 05-29-2010, 00:13   #2
Pierre!
NRA Life Member
 
Pierre!'s Avatar
 
Join Date: Jun 2003
Location: Lovin Sparks Nv!
Posts: 4,137
Well, it really comes down to data change rate. Then there is bandwidth... Now add on - what is the recovery time objective, and the acceptable recovery point objective.

Data cannot be completely mirrored at all times. Period. Latency comes into play, database locking, files in use that are locked... but more importantly is BUDGET!

Bandwidth, Bandwidth... BANDWIDTH is a great part of the issue. Ya got to have a big pipe and secure it too. The bummer is that security can eat a portion of the bandwidth. And synchronous or even asynchronous pipes can get expensive when you get to the 6MB level, let alone the DS3 45MB level!!!

Even if you resort to snapshots, there can still be alot of data to move depending on the size of the organization, and the data change rate. It only takes one little hiccup in the line to backlog a tremendous amount of data! At some point, catching up becomes a weeks long project!

Add it all together, and the cost begins to escalate exponentially. This is where you need to draw the line and management has to hand down what is an acceptable cost for for the RTO/RPO required... and typically what management WANTS and what they will PAY FOR comes down to two completely different solutions.

Does that make sense?

SAN? Yes, it may be appropriate, or not... again, it depends on the factors stated above... but it's the data change rate, the data type (database or files), actual data size, RTO/RPO requirements, and bandwidth that actually will determine the solution for a particular organization.

In a perfect world there would be no data locks, and bandwidth would be plentiful.

That just isn't the world we are living in however.

HTH
__________________
The Seeber Consulting Blog

Download YOUR copy of Internet Safety Tips - "The Essentials"!
My Gift to You, AND it's >FREE<

Last edited by Pierre!; 05-29-2010 at 00:17..
Pierre! is offline   Reply With Quote
Old 05-29-2010, 18:49   #3
betyourlife
on a GLOCK
 
betyourlife's Avatar
 
Join Date: May 2004
Location: Seattle, WA
Posts: 10,553
So let's assume that the pipe is huge. Ultimately, there is no signigicant degridation (IMO) in performance by storing the VM's on the servers disks via RAID rather than a SAN.

From a Disaster recovery perspective, having to have multiple SAN's replicating out there is not only a pain, but also increases the cost.

I like the KISS concept of storing things on the server rather than a SAN, then just replicating the VM to another Virtual environment on a geographically seperate area.

I think is simplifies things in general, with no significant issues.
betyourlife is offline   Reply With Quote
Old 05-29-2010, 23:11   #4
Radian
Senior Member
 
Radian's Avatar
 
Join Date: May 2010
Posts: 728
It all boils down to money..

with infinite amounts you extend fiber FC or FICON networks and use systems that use synchronous writes. IE you stock trade is recorded in 3 places before it is journaled in NYC.

This is a multimillion dollar animal, its complex, and can be brittle. You will p ay hundreds of thousands a month for a point to point fiber link, minus the gear to transmit on it. But it is a hot dr scenario and assumes you have extensive parts in place to swing you workload over to a redundant site if one fails. IE external load balancers, basically a dynamic system to deal with a failure.

Next is semi sync, you have a warm site. You can loose some info, refuse a few transactions while you go live in a dr site.

Finally you have async systems that lag from a minute to hours. If you process in batch transaction (edi stuff, not credit card or real time stuff) this works and is "cheap"

Running ANY replication technology even inexpensive stuff is vastly superior than recovering from tape.

VM systems are just files at the end of the day.
Radian is offline   Reply With Quote
Old 05-30-2010, 08:22   #5
Radian
Senior Member
 
Radian's Avatar
 
Join Date: May 2010
Posts: 728
Follow up on Local Disk and Real DR

Local disk is actually far more complex in vmland. First vmotion, SRM (dr automation tool) and many other features DO NOT WORK with local disk.

Local disk is much slower than either 10Gb/s NFS or 8GB/s FC. An HA storage array (active active hosts) with hundreds of disks provide far more reliability than local disk hanging off a local bus.

The high end is an OS Cluster (MS / AIX / Oracle grid whatever) where half you active disk is in one building and the other half is 80KM away. The cluster is load balanced externally. IE (pci transactions go wherever) either node can run the system

The quorum disk's location is arbitrary but in the event you fail the cluster the other half (or parts) takes over. This event is HOT. There is your DR. DONE.

Now compare Amazon and Brownell's. Amazon NEEDS this capability, the can not go down for an hour while a semi sync or async array is brought online and traffic flipped around.

Brownells does not need a multimillion dollar system to guarantee PCI transactions take place in any event short of thermonuclear war. The structure is the same. You just dont need the point to point fiber with sub 7ms latency. Replication is done over cheap MPLS and WAN accelerators.
Radian is offline   Reply With Quote
Old 05-30-2010, 12:24   #6
Pierre!
NRA Life Member
 
Pierre!'s Avatar
 
Join Date: Jun 2003
Location: Lovin Sparks Nv!
Posts: 4,137
And....

All of this is really irrelevant until the C Levels say what the RTO/RPO really is, and they give you a budget to work with...

Bandwidth don't last too long when the bank account is empty... (LOL)

Good Luck with the project!
__________________
The Seeber Consulting Blog

Download YOUR copy of Internet Safety Tips - "The Essentials"!
My Gift to You, AND it's >FREE<
Pierre! is offline   Reply With Quote
Old 05-30-2010, 15:04   #7
vafish
Senior Member
 
vafish's Avatar
 
Join Date: Mar 2003
Location: Commonwealth of Virginia
Posts: 23,741
If you got the budget and the bandwidth mirrored SANS are great.

I did a demo a couple of years ago using 2 Hitachi SANs mirrored across our corporate network.

We ran a streaming video from a VM and could unplug the server with the VM's on it and the back up server would take over, with the buffering in the video player you never missed a frame.
__________________
"If your plan is for one year, plant rice.
If your plan is for ten years, plant trees.
If your plan is for one hundred years,
educate children." -- Confucius
vafish is offline   Reply With Quote
Old 05-30-2010, 22:05   #8
RTmarc
Member
 
Join Date: Mar 2008
Location: Birmingham, AL
Posts: 76
SAN to SAN replication + VMWare infrastructure + VMware SRM = simple DR.

I'm an iSCSI guy myself so LeftHand SANs are typically my primary choice. Replication and snapshot capabilities are included in the price of the SAN, not expensive add-ons.
RTmarc is offline   Reply With Quote
Old 06-20-2010, 20:28   #9
betyourlife
on a GLOCK
 
betyourlife's Avatar
 
Join Date: May 2004
Location: Seattle, WA
Posts: 10,553
Quote:
Originally Posted by Radian View Post
Local disk is actually far more complex in vmland. First vmotion, SRM (dr automation tool) and many other features DO NOT WORK with local disk.

Local disk is much slower than either 10Gb/s NFS or 8GB/s FC. An HA storage array (active active hosts) with hundreds of disks provide far more reliability than local disk hanging off a local bus.

The high end is an OS Cluster (MS / AIX / Oracle grid whatever) where half you active disk is in one building and the other half is 80KM away. The cluster is load balanced externally. IE (pci transactions go wherever) either node can run the system

The quorum disk's location is arbitrary but in the event you fail the cluster the other half (or parts) takes over. This event is HOT. There is your DR. DONE.

Now compare Amazon and Brownell's. Amazon NEEDS this capability, the can not go down for an hour while a semi sync or async array is brought online and traffic flipped around.

Brownells does not need a multimillion dollar system to guarantee PCI transactions take place in any event short of thermonuclear war. The structure is the same. You just dont need the point to point fiber with sub 7ms latency. Replication is done over cheap MPLS and WAN accelerators.
FC SAN may be FASTER, but I'm not as concerned with speed or throughput as much as I am with having a failover site. I don't anticipate a significant degredation in performance in my applications.

My point is that if I had a limited budget, or at least wanted to keep the cost as low as possible, why not just move away from a cluster setup that relies on an expensive and complex SAN?

Originally I was designing it in a way that added another SAN and moved one server to another location. Then I thought, OK, why not just get rid of the SAN I have now, forget about clustering or multi site clustering that relies on another SAN somewhere else and just have two physical server geographically seperated with replication software moving files to the failover server.

Right now my systems (which I inherited) provide a high level of fault tolerance. They do not however, provide redundancy. Heck, one of the critical services my application depends on is only running one one physical server. The other app servers which are configured with NLB are running on vm's and there is no easy way to migrate the app from a physical server to another physical server where the app is running on two other VM's. If the phyiscal server this critical server fails, technically the application will still work, but many of the functions of the application would no longer work because the service that some features of the app rely on only ran on the physical server.

Working with the equipment I have, I want to move away from a SAN/cluster. I want to consolidate my servers and rely on virtualization to provide fault tolerance and disaster recovery on two physical servers geographically spread out. I would at the most have to purchase a replication software to replicate the files from one machine to another. Or if I wanted to keep it uber cheap, I could just export/import the vm's to the failover server daily and in the rare event that a component fails, I fireup the vm with the latest backup/snaphot.

Bandwidth is plentiful. I inherited a system that was setup barely good enough to run, but not very well from any failover or management perspective.

Last edited by betyourlife; 06-20-2010 at 20:31..
betyourlife is offline   Reply With Quote
Old 06-21-2010, 07:23   #10
RTmarc
Member
 
Join Date: Mar 2008
Location: Birmingham, AL
Posts: 76
SANs are neither as costly or complex as you are making them out to be. Starter SAN kits can be had from most major vendors for $25-30k depending on size and drive type. Depending on the size of your environment, the price of redundant physical servers will quickly exceed that amount; especially if you are talking virtualization host servers.

I'm not sure if you are using VMware or not but all of your VMotion / DRS / SRM functionality will rely on centralized storage. All host servers must see all datastores if those VMs are to be included in those features.

Almost all vendors are going to recommend no more than 5-10ms of latency with synchronous replication so unless your are running a 100Mb+ pipe from your primary to secondary location you are going to be stuck with asynchronous.

Personally, I'd get the primary site built like it ought to be first with solid - and consistent - backups and then build-out the DR site in a separate project with a separate budget. Buying servers and storage may look relatively cheap on paper but when you consider everything else that comes with it, the costs add up quickly. Remember, you'll need redundant servers, storage, network appliances (switches, firewalls, routers), software licenses, etc.

One final anecdote, if you are running a VMware environment (host servers, centralized storage, etc.) and run Site Recovery Manager (SRM) you can test your DR at the remote site at any time without interrupting your main office. SRM relies on S2S replication on the SAN and once triggered will automatically stand up your entire virtual environment according to a script you pre-configure in an isolated test environment.
RTmarc is offline   Reply With Quote
Old 06-21-2010, 12:33   #11
MavsX
The Dude Abides
 
MavsX's Avatar
 
Join Date: Jan 2009
Location: Arlington, VA
Posts: 3,033
we got tasked with implementing free Wifi in a 13 story apartment complex. The owners of the apartment complex wanted it to be free WiFi for all residents, as a way to attract potential residents. Anyway, one problem that we were running into was having enough bandwidth available for the amount of people that would be using the WiFi. We didn't know if Comcast could bring in enough bandwidth. Enter in Cogent. They can extend a fiber connection from where they already have some presence. They can give us 100Mbps for $2400 a month. Of course, we are going to have to spend like $100k in Cisco equipment on top of that, but still.
__________________
Glock 22 .40 S&W
CMMG M4 LEP II
Mossberg 500 Mariner
MavsX is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump




All times are GMT -6. The time now is 17:00.



Homepage
FAQ
Forums
Calendar
Advertise
Gallery
GT Wiki
GT Blogs
Social Groups
Classifieds


Users Currently Online: 1,403
425 Members
978 Guests

Most users ever online: 2,244
Nov 11, 2013 at 11:42