I'm not sure whether you're just looking to pick a fight here BAS so, if you are, lemme know and I'll just get the popcorn and enjoy the show.
No, he's not picking a fight, just being BAS. We all know what he's like, he's made his mind up, made statements (or at least implied things), and can't possibly deviate from that first position - regardless of what anyone else says.
Sometimes I wonder if low cost hardware leads to some "strange" ways of providing back up services using complex software procedures and switching ( re-routing ) of data paths. This often adds to the complexity and introduces more potential for failure / error.
Yes, it does !
It can come down to - do you buy a high spec server with redundant PSUs, etc, etc, and backed up by a 4 hour on-site guarantee from the vendor. Or do you cobble together several lower spec machines, shared storage, all the overhead and complexity that goes with that, and rely on not too much hardware failing at once.
If you've never looked at what they do, Google is an interesting case of taking low cost to the extreme and providing reliability by other means (in their case, writing their own filesystem !). Their reasoning goes that if the MTBF of a single node is (say) 1000 days, then on average a single machine will break down once every three years. If you have 1000 machines ina cluster, then on average you can expect to lose one machine per day. If you have 20,000 machines in a cluster, then your losses will be an average of 20/day or nearly one/hour.
So their answer is to buy cheap, and write redundancy into the software. Their JFS breaks large datasets into chunks, and the controller ensures that there are at least three copies of any chunk, spread across three nodes in the cluster, and not all in the same rack. Thus you can take out any node, or even two nodes, and the data is still there. You can take out a whole rack and the data is still there - the software will automatically allocate another storage node to replicate the data to to bring the number of available copies up to three.
That's great for a big read-only dataset (such as the back end indexes used for serving up queries) - trying to do that with rear-write data then introduces the nightmare of ensuring write concurrency and consistency.
They also automate their config, so when a node is replaced, all the operator needs to do is tell the management system it's identity and it's role - then when the node comes online, the management system can serve it up the right image, it can self install, and go into service without further intervention.
All brilliant, if you've an operation of such a scale where the investment is worthwhile.
For a small business, your best bet is buy something reasonably reliable, and have a business continuity plan that will allow you to keep going during any outage. Well actually, the BC plan should drive what you put in, since the BC plan will tell you what your technical recovery time objective needs to be.
If your TRTO is "several days" then anything other than something you can repair and be running again in a day or two is overkill. On the other hand, if you end up with a TRTO of 3 hours, having something you can get going by next day isn't worth it - you'll be out of business.