Data Server at Lowest Cost

Re: Storage Server at Lowest Cost

I don't understand, if each data has a duplicata somewhere then no matter how it is duplicated the system has to use twice the capacity doesn't it?
Well, I don't understand the exact inner workings of it either - but fortunately, some very clever people in a dark basement do. With RAID 6 you lose the capacity of two drives, and can lose any two drives but still keep all of your data. RAID 5 is similar but for 1 drive. Do some more reading into RAID - there's a lot more to it than you're currently assuming!
 
Re: Storage Server at Lowest Cost

I looked on the wikipedia page for RAID but I still don't understand RAID 5 and 6.
 
Re: Storage Server at Lowest Cost

Don't worry about how it works :) All you really need to know is that with RAID 5 you lose the capacity of 1 drive in the array, but you can lose any one disk and not lose any data. The same applies to RAID 6 but with 2 disks rather than one (you lose the capacity of 2, you have the redundancy of 2.)
 
Re: Storage Server at Lowest Cost

I posted a thread on the hardware forum to try to understand how RAID6 works. If I use it I need to understand how it works.
 
I've merged your two threads into one so that you can hopefully see all the information at once.
 
I don't have any experience with linux at all but I learn fast so if that is the solution I should use I will.
using Linux would enable to get a "free" ISCSI server -which would allow the disks to be presented by your "module" servers as if they were locally "in" you "master" server -the one that's using all the disks.
ISCSI server on windows (IIRC) is only available in the data-centre edition.

Another question: Is it possible to overlay several software based RAID arrays?
Yes, but be careful that you don't connect a load of RAID 5 arrays together with RAID 0, a complete failure of one array would be a complete failure of all your data sets,



My concerns are more focused on profitability and cost per unit of storage than on raw costs.
Are you trying to sell space as a service?
if so, you're going to find it much more cost effective to start by buying one server, perhaps having only a couple of TB space, then adding more space as needed, think about how the cost of disks fall over time whilst the size also gets bigged.
I'm old enough to remember drives costing £100 per GB, then £10 per GB, then £1 per GB, now it's less than a tenth of that!

buy what you absolutely need now, expand later...

I would need RAID 1 arrays for data redundancy, however, I would also need to split the data of all users among all the drives of the system in order to maximize performance.

I see what you're getting at, but for the amount of disks you're talking about RAID 1 is silly expensive! (and I believe you're actually thinking of RAID 1+0 anyway).

think about using RAID 5. so all the data is written to all the disks, that way if you have 10 disks, the data is spread across 10 spindles, 1 user connecting has the speed of ten disks, 2 users have the speed of 5 disks each.
(and neither of them are able to actualy use that as their connection is maxed out anyway!) -basically even if they had the speed on just 1 spindle, that's more than their internet line could cope with!.


For example, instead of having a RAID 1 at location 1, I could simply have a sever X at location 1 and a server X' (exact copy of server X) at location 2.
Symantec offer a product called Continuous protection service, it's a bit like enhanced DFS in that data written to one location is replicatied to another.


This mirroring, however would need to be designed in such a way as to use exactly the same (or almost the same) bandwidth than what would normally be used. It cannot be designed in a way that data is received and written at location 1 and then mirrored at location 2 as this would cost me double the bandwidth and have absolutely no pros on the performance side.
well, if you load balance your external connections then there would be a performance benefit.

but the crux of what you're asking for is impossible, you can't have the data appear in a second location magically, it has to be replicated to that second place. and you'll have to pay for the bandwidth to do this!
(but that's OK as if this is a service you'll be billing that cost back to your customers)

That means if a user at location 3 wants to save data on the servers he would write the data at both to location 1 and 2 at the same time
you'll hit a wall quickly with this method.

first to do this you're going to need to get some pretty good clientl software to do this, or rely on the user manually uploading their files twice.
also, what if the users don't upload twice, of your second data centre is down for a brief period the replication of the files won't have happened, what if at the users end their firewall blocks acces to your second datacentre?

if you want data replicated reliably, you need to look after that replication.

There is no budget for the project but a rather a question of profitability. In large volumes I would like to be able to end up with a total cost of less than 40$/Tb.
at this time, I don't think that you could buy decent server grade disks at that price...

(and your consumer units will have a higher rate of failure)

Mac OS X Server doesn't support Software RAID 6
a software implementation will be slow anyway, you'll want to buy a card that does this on hardware.

I don't want to use hardware based RAID so if I do overlay RAID arrays it will be software based.
Two things,
software RAID is slower than hardware RAID, -you're condemning the speed of your solution here
most software implementations aren't going to be able to do this.

you could overlay RAID implementations either with hardware that supports this, OR with the first layer being hardware that then presents lots of disks as just one disk to the software, then software taking your RAID arrays that are presented as single disks and combining those into a further array.


I posted a thread on the hardware forum to try to understand how RAID6 works. If I use it I need to understand how it works.
I wrote a guide after reading this thread.
http://www.computerforums.org/server-articles/guide-raid-104434.html
 
This link may help.

Petabytes on a budget: How to build cheap cloud storage | Backblaze Blog

to get a TCO of less than $40 per TB you're going to have to go big from the start, (trouble is that as I said earlier you'll be starting with acres of unused space whilst the purchase price of the drives are falling).

you'll need to go and find some custom Raid stuff to make a setup like that.

they (2 years ago), got the cost of a petabyte down to $81,000 per PB
that makes the 1TB that you're talking about cost $81 per TB

it's still twice what you want, -though costs have fallen on media so much that this may be possible now!
 
Back
Top Bottom