Thanks a lot for your reply Berry 120.
I was thinking of using two "Supermicro AOC-SAT2-MV8 8 Channel 300MB/S Per Channel 64 Bit Pci-X Interface Serial ATA Adapter" along with the standard ports on the motherboard which would make 20-22 SATA Drives per Unit (40-44 Tb with 2Tb drives). I would use a software based RAID 1.
I have no idea what the PCI bus transfer speed limits are, which I would need to take in account in choosing a motherboard.
I don't either know what power supply to choose for such configuration, nor what kind of processor to choose.
The AMD Athlon II X4 Quad-Core looks like a processor with a very good price/computing power ratio, however, despite of the computing requirements of software based raid, this might be an overkill. If it is comes the question if I should stick with it and try to increase the number of drives served by unit or just choose a smaller processor and have a larger number of systems. In financial terms it seems a better solution to use the most cost efficient processor and saturate it with the maximal computing requirements per quantity of storage than to create more systems with smaller processor. In technical terms, however, this might not be a good solution as creating a system with more than 100Tb of storage is likely to come with a few issues. First, I might not find a motherboard supporting 4/5 or more of the previously mentioned SATA adapter cards. Then I'm wondering if systems are designed (wether on the software side or on the hardware side) to handle that many drives. At last I would need more than Gigabit ethernet to satisfy the maximal networking needs of such a system. I could use two of them but I don't know if that is possible and if it is how data transfer is split between the two.
The case will be custom built in order to fit in a energy efficient cooling conglomerate, and won't cost more than 30$ per unit at most (or 50$ for units of 40-50 drives).
My concerns are more focused on profitability and cost per unit of storage than on raw costs. Spending 10K for a server is not a problem for me, however spending 100$ or more per Tb is. My expenses in building up such systems will come with the income which will finance them. All I need is good prospects in terms of final costs per unit of storage.
I don't have any experience with linux at all but I learn fast so if that is the solution I should use I will.
These storage systems are to be used by different servers. These other servers will be designed for very high speed and computing power and the storage systems accessed for storage applications. The storage might be accessed simultaneously by thousands of users. The processes through which they access it being handled by the servers designed for high computing power. The storage systems only being designed for the ability to serve all its hard-drives at their maximum transfer speeds.
Another question: Is it possible to overlay several software based RAID arrays?
I would need RAID 1 arrays for data redundancy, however, I would also need to split the data of all users among all the drives of the system in order to maximize performance. Indeed I would like to split the load equally among all drives so to avoid those random occurrences of having the users that have their data on the same drive to access it at the same time with 1/n drive transfer speeds for each user while other drives are unused. The ideal situation would be having all users benefiting of approximately the same transfer speeds (not that I limit but unlimited effective transfer speed) at each moment of time, given, of course, that they all have equal systems and network speeds. That would mean that the transfer speed each user could have would depend on the overall load of the data server rather than by random drive allocation of user data.
Also I would like to know if it is possible to mirror the data on different servers at different locations without it costing me more bandwidth than what I normally need.
For example, instead of having a RAID 1 at location 1, I could simply have a sever X at location 1 and a server X' (exact copy of server X) at location 2. By this mean I would have the exact same protection against drive failure related data loss but also a protection against physical security related threats like fire/theft/power breakdown/etc for the same price. Additionally I would maximize network efficiency by doing so. If the network access for server X is very high at one point of time compared to the one of other functionalities served by other servers then I will be limited by the network speeds of location 1 while the servers at location 2 might only be using a small share of the network speeds of that location. By having mirrored server locations my total bandwidth is the addition of the ones at each location, at each point of time, and just like the drives I am limited by my total network capacity rather than by random server locations.
This mirroring, however would need to be designed in such a way as to use exactly the same (or almost the same) bandwidth than what would normally be used. It cannot be designed in a way that data is received and written at location 1 and then mirrored at location 2 as this would cost me double the bandwidth and have absolutely no pros on the performance side. I would need the data to be read and written simultaneously at both locations at the same time. That means if a user at location 3 wants to save data on the servers he would write the data at both to location 1 and 2 at the same time, and read it at the same time. This would give him transfer speeds equivalent as writing on a RAID 1 array over the network. However it cannot be him that sends the data to the two locations as this would imply that he sends the information twice thus dividing his own transfer speed by two. The transfer would need to be dispatched in a way that does not divide bandwidth at any location (if that is possible).
Originally Posted by berry120
Hello "We Multiply You"
Indeed what you're asking for does sound...(to be respectful)...different. But hey, go big or go home. lol
The machine you want to build sounds a server with a hard drive farm installed in it. It's simply just a cage with drive after drive filled with data. The largest server tower I know of that supports a lot of drives and is fairly inexpensive, is an equipment rack server case. It holds up to 11 full sized 3.5" hard drives. I have built a couple of servers and it all depends on what your network needs, so if you would like my suggestions I need some input...
How many clients or computers on the network?
Is it a Local Area Network (LAN - briefly computers connected to a switch including the server) or a Wide Area Network (WAN - briefly switches and computers connected to routers connected to other routers)?
What's your estimated budget on the entire project?
Which OS are you most comfortable using? (Windows, Linux/Unix, Solaris, Novell, Macintosh)
What sort of environment are you in? (Business/enterprise or home multimedia/home lab)
What computers and/or equipment do you already possess?
Is it just a "Big Ole" server that you want or an entire network infrastructure to go with it since you mentioned performance?
Other than that I have decent idea of you're looking for. It's also called a SAN rather than just one "stand-alone server". A SAN is a Storage Area Network that houses multiple servers with multiple hard drive configurations in them.
Thank you very much for your answer. Because I didn't get any answer on this part of the forum I thought I maybe had posted at the wrong place and opened a similar topic here: Storage Server at Lowest Cost
You'll find most of the answers to your questions in my other threat but I will answer them here as well.
How many clients or computers on the network?:
Thousands, the storage server will be used by high performance servers which themselves will be accessed by thousands of clients via the internet.
What's your estimated budget on the entire project?
There is no budget for the project but a rather a question of profitability. In large volumes I would like to be able to end up with a total cost of less than 40$/Tb.
Which OS are you most comfortable using?
The only OS I know is Snow Leopard Server which I like a lot (though I haven't used all its functionalities yet).
What sort of environment are you in? I am starting a project which will grow itself to a large business size if successful and finance its own development along the way.
What computers and/or equipment do you already possess? I have a Macbook Air, a Mac Mini Server, an 17" iMac i7 and an ASUS F3J PC with windows 7.
Is it just a "Big Ole" server that you want or an entire network infrastructure to go with it since you mentioned performance? Its an entire infrastructure. What I want to do here is just one of several units of storage accessed by several different servers, several different applications, and many many different clients (both from the local network and internet).
Thanks again for you answer and for trying to help me. I will post this on my second thread so to bring this information there as well.