I don't have any experience with linux at all but I learn fast so if that is the solution I should use I will.
using Linux would enable to get a "free" ISCSI server -which would allow the disks to be presented by your "module" servers as if they were locally "in" you "master" server -the one that's using all the disks.
ISCSI server on windows (IIRC) is only available in the data-centre edition.
Another question: Is it possible to overlay several software based RAID arrays?
Yes, but be careful that you don't connect a load of RAID 5 arrays together with RAID 0, a complete failure of one array would be a complete failure of all your data sets,
My concerns are more focused on profitability and cost per unit of storage than on raw costs.
Are you trying to sell space as a service?
if so, you're going to find it much more cost effective to start by buying one server, perhaps having only a couple of TB space, then adding more space as needed, think about how the cost of disks fall over time whilst the size also gets bigged.
I'm old enough to remember drives costing £100 per GB, then £10 per GB, then £1 per GB, now it's less than a tenth of that!
buy what you absolutely need now, expand later...
I would need RAID 1 arrays for data redundancy, however, I would also need to split the data of all users among all the drives of the system in order to maximize performance.
I see what you're getting at, but for the amount of disks you're talking about RAID 1 is silly expensive! (and I believe you're actually thinking of RAID 1+0 anyway).
think about using RAID 5. so all the data is written to all the disks, that way if you have 10 disks, the data is spread across 10 spindles, 1 user connecting has the speed of ten disks, 2 users have the speed of 5 disks each.
(and neither of them are able to actualy use that as their connection is maxed out anyway!) -basically even if they had the speed on just 1 spindle, that's more than their internet line could cope with!.
For example, instead of having a RAID 1 at location 1, I could simply have a sever X at location 1 and a server X' (exact copy of server X) at location 2.
Symantec offer a product called Continuous protection service, it's a bit like enhanced DFS in that data written to one location is replicatied to another.
This mirroring, however would need to be designed in such a way as to use exactly the same (or almost the same) bandwidth than what would normally be used. It cannot be designed in a way that data is received and written at location 1 and then mirrored at location 2 as this would cost me double the bandwidth and have absolutely no pros on the performance side.
well, if you load balance your external connections then there would be a performance benefit.
but the crux of what you're asking for is impossible, you can't have the data appear in a second location magically, it has to be replicated to that second place. and you'll have to pay for the bandwidth to do this!
(but that's OK as if this is a service you'll be billing that cost back to your customers)
That means if a user at location 3 wants to save data on the servers he would write the data at both to location 1 and 2 at the same time
you'll hit a wall quickly with this method.
first to do this you're going to need to get some pretty good clientl software to do this, or rely on the user manually uploading their files twice.
also, what if the users don't upload twice, of your second data centre is down for a brief period the replication of the files won't have happened, what if at the users end their firewall blocks acces to your second datacentre?
if you want data replicated reliably, you need to look after that replication.
There is no budget for the project but a rather a question of profitability. In large volumes I would like to be able to end up with a total cost of less than 40$/Tb.
at this time, I don't think that you could buy decent server grade disks at that price...
(and your consumer units will have a higher rate of failure)
Mac OS X Server doesn't support Software RAID 6
a software implementation will be slow anyway, you'll want to buy a card that does this on hardware.
I don't want to use hardware based RAID so if I do overlay RAID arrays it will be software based.
Two things,
software RAID is slower than hardware RAID, -you're condemning the speed of your solution here
most software implementations aren't going to be able to do this.
you could overlay RAID implementations either with hardware that supports this, OR with the first layer being hardware that then presents lots of disks as just one disk to the software, then software taking your RAID arrays that are presented as single disks and combining those into a further array.
I posted a thread on the hardware forum to try to understand how RAID6 works. If I use it I need to understand how it works.
I wrote a guide after reading this thread.
http://www.computerforums.org/server-articles/guide-raid-104434.html