Appliance testing - none of you would do this would you ?

ricicle · 8 Mar 2013

I'm shutting our site totally down next month for about 15 mins. The head of IT is getting a bit twitchy about his UPS's !

ban-all-sheds · 8 Mar 2013

ericmark said:
Asking people to log off to be able to test is common and with servers finding windows to do work again common. But switching off is still a problem.

So virtualise them all, then you can move the VMs and empty the physical server.

Often they need switching on in a set order.

Do they?

SimonH2 · 8 Mar 2013

ericmark said:
However servers should not be supplied with extension leads where loss of power will cause problems. They should have all been on their own UPS and no two servers should be connected to the same UPS in the same way as one would not plug in two shavers to the same shaver socket.

All down to loss of earth connection when power is lost. One machine goes to one supply.

Twaddle.
If you unplug/cut the mains lead into a small UPS, then yes, you may remove the earth from the cases of some equipment. But you also remove all earth referencing such that in the event of a supply-case fault in any item of equipment, you have a floating case connected to a floating supply. If someone touches it, then they'll not get a shock as they will "earth" the case (albeit very weakly) and the end effect will be that the "live" becomes earthy and the "neutral" becomes live.

There is a big difference here between multiple devices running from an islanded UPS and two shavers sharing a shaver socket. In the latter you don't have any "earthing", in the former you have "earthing" such that if you do in fact have multiple faults that would cause a dangerous situation, there will also be fault currents in the CPCs of the islanded group of kit that will cause overcurrent protection to trip.
If there is a second fault, then either nothing happens, or it'll trip the UPS output (say if it's a neutral-earth fault).

So although the guy doing PAT testing may have made an error, clearly so had the guy or guys setting up the servers in the first place. There simply should not have been an extension lead between server and supply.

There wasn't in this case - just a "kettle lead" from machine to wall.

As to lead sets again there should be a label do not unplug. It is not good enough to expect people to know the difference between a PC and server.

There probably was a label at the wall socket. But, regardless of whether it is a PC or a server, there isn't an excuse for someone unplugging a piece of IT equipment that's obviously running without getting someone to shut it down first. That's a very fundamental thing to learn. If it's sat there with no screen, keyboard, or mouse attached then that should start making anyone with 2 brain cells turned on think about what it might be there for !

One place I worked all servers had RED plugs which was something of an argument as RED = 400 volt and we wanted BLUE plugs rather than white or black. However the idea of a set colour for special items makes sense and since hospitals and like use RED plugs I suppose red is a standard colour for items not to be unplugged.

Meanwhile, back in the real world, most customers won't pay for that. Besides which, I can assure you that having a red plug or anything like that doesn't make the slightest difference to most people. Especially when they get reused for non-important stuff and so having red plugs on anything at all is commonplace.
Yes, it's sad isn't it - but that's what experience tells me.

ban-all-sheds said:
So virtualise them all, then you can move the VMs and empty the physical server.

Says the man demonstrating a politicians level of knowledge of technical matters

Often they need switching on in a set order.

Click to expand...

Do they?

Actually, yes it's very common to have dependencies that dictate services need to be started in a set sequence. It's one of the issues we have at work if we have to do a cold start* - some stuff just won't work without other stuff already running. Simple example - if your DNS services aren't running, then anything that needs to resolve names to addresses won't work - and often that means it fails to startup properly.
Yes, in theory it would be possible to use IP addresses instead of names, but that adds a whole shedload of it's own problems which are a hassle all the time rather just the rare occasions when a cold start is needed. Trust me, using IP addresses where you can use names is just asking for trouble.
Of course, you can end up with circular dependencies - that adds a whole extra level of complexity and means having to get one thing partially running and then go round the loop (re)starting stuff until everything is running. Thankfully we have very few of those !

* Bringing everything up from scratch - which rather embarrassingly we have had to do a number of times. Note comment earlier about not being able to say too much in public, and comments from others about UPS batteries failing. You may think those are connected, I couldn't possibly comment. On the other hand, lead has a decent scrap value (especially when you've a 1/4 ton of dead batteries) and we subsidised the office Christmas do with a trip to the scrapyard

ban-all-sheds · 8 Mar 2013

SimonH2 said:
Says the man demonstrating a politicians level of knowledge of technical matters

Actually, says the man who knows that there are lots of virtualisation solutions out there, and that a large number of them support live migration of VMs.

Actually, yes it's very common to have dependencies that dictate services need to be started in a set sequence.

Sorry - a big chunk of my reply went missing...

After "Do they?" should have been:

In that case, if it's a problem, you need an automation/orchestration tool.

Again, virtualising can help, as the VM Manager may have automation tools.

Thankfully we have very few of those !

You shouldn't have any.

Automation tools worth their salt should pick up on circular dependencies.

Score 3 for virtualisation - you don't have to run multiple services in one VM if you don't want to.

SimonH2 · 9 Mar 2013

Sorry BAS, there is a saying that when you are in a hole, stop digging. You clearly know the buzzwords, but also clearly don't have practical experience - either that or your identity is revealed as one of those IT tools salesmen who will happily sell your tool as doing all that and more (probably even makes the coffee) but which actually turns out to have a lot of gaps.

Yes, automation tools can help, but mostly (for the general case) they are all custom - our setup isn't the same as other people's, the next guys's setup is different, and so on, so there is no such thing as an off the shelf tool that will do it. There's a trade-off between how much effort you put into automating a rare event, vs spending that time doing more useful stuff. Or put another way, for the very infrequent times it happens, and the little amount of manual intervention required, it simply isn't a good use of man hours to automate most of it.

ban-all-sheds · 9 Mar 2013

SimonH2 said:
Sorry BAS, there is a saying that when you are in a hole, stop digging.

So stop digging, then - I am 100% correct that if you virtualise your servers you can move VMs around if you need to empty a server but keep the services running. Along with increased server utilisation the ability to relocate VMs is a major benefit of virtualisation.

You clearly know the buzzwords, but also clearly don't have practical experience - either that or your identity is revealed as one of those IT tools salesmen who will happily sell your tool as doing all that and more (probably even makes the coffee) but which actually turns out to have a lot of gaps.

No tool will do everything, and no responsible, professional salesperson will claim otherwise.

Yes, automation tools can help, but mostly (for the general case) they are all custom - our setup isn't the same as other people's, the next guys's setup is different, and so on, so there is no such thing as an off the shelf tool that will do it.

I never claimed that there was, of course you'll need to customise it and quite possibly have to write some scripts.

There's a trade-off between how much effort you put into automating a rare event, vs spending that time doing more useful stuff. Or put another way, for the very infrequent times it happens, and the little amount of manual intervention required, it simply isn't a good use of man hours to automate most of it.

Rare and infrequent, eh?

SimonH2 said:
Bringing everything up from scratch - which rather embarrassingly we have had to do a number of times.

But even so, automation is beneficial. It enforces a discipline to remove circular dependencies. It lessens the pressure on administrators at a time of crisis - the fewer decisions they have to make under pressure the better. It builds on rapid provisioning and policy/time-based automation of VM relocation.

SimonH2 · 9 Mar 2013

ban-all-sheds said:
I am 100% correct that if you virtualise your servers you can move VMs around if you need to empty a server but keep the services running. Along with increased server utilisation the ability to relocate VMs is a major benefit of virtualisation.

Someone has been reading the glossy handouts.
Yes, virtualisation may allow VMs to be moved around. It depends on which virtualisation technology and what your infrastructure is. It's not as simple as "click here, and your VM magically moves" though. Firstly you have to have space for it on the destination - which means having excess capacity which is what virtualisation is (in part) intended to avoid. In the case of the original situation I posted about, this was a standalone server, specifically located in a separate building to their main servers - and used in a manner where it would not have been a problem to shut it down.

I never claimed that there was, of course you'll need to customise it and quite possibly have to write some scripts.

In other words, build the tools needed

But even so, automation is beneficial. It enforces a discipline to remove circular dependencies.

Assuming they are resolvable. How about resolving this then. A depends on B, B depends on C, B also depends on A, and C also depends (to a lesser extent) on A. The dependencies between A and B aren't negotiable* - B will not start up correctly without A being running, and A will not start up correctly without B running.

There are also a shedload of other systems, from disparate vendors, running a variety of operating systems and virtualisation technologies - that is the nature of the beast when you want different systems to be what is appropriate to the job, rather than picking a vendor and shoehorning your requirements into what that vendor provides (you may guess that I'm having a bit of a dig at houses that only run systems that came from Redmond). These other systems also depend on B (amongst others) - but are on multiple separate bits of hardware (some are even in another office in the building and on a different supply).

Having discussed it with others in similar situations, it's clear that there isn't a simple answer - and definitely not one off the shelf. I do have some cunning ideas to deal with some of the key dependencies, but as I said, it's a rare event and time is better spend on other tasks at the moment. Last time, I reckon it only took 10-20 minutes to get everything running - so not really worth spending days of time over.

It builds on rapid provisioning and policy/time-based automation of VM relocation.

No, I change my mind. You're not an IT salesman, it's far far worse than that - you speak like a ... and I feel a bit dirty even using the words ... a management consultant

* OK, it can be made so that A is not dependent on B, but that then means that there is an ongoing requirement to monitor changes and change A to match. That is highly prone to error, especially when some of the items are under the control of third parties with whom we have no direct relationship. And yes, I do have some ideas how to automate that process.

ban-all-sheds · 9 Mar 2013

SimonH2 said:
Someone has been reading the glossy handouts.

Someone knows what the state of the art is.

Another person is trying, for FKW reason, to pretend that that someone doesn't know what he's talking about.

Yes, virtualisation may allow VMs to be moved around. It depends on which virtualisation technology

What do you want? VMware? KVM? Xen? Hyper-V? Oracle VM? HP IVM? IBM PowerVM?

and what your infrastructure is.

Well, yes - I thought I could take it as read that without all server resources virtualised and without a common storage and network infrastructure shared by source and target then live migration won't work.

It's not as simple as "click here, and your VM magically moves" though.

True - it may take more than one click.

Firstly you have to have space for it on the destination - which means having excess capacity which is what virtualisation is (in part) intended to avoid.

So you've got an infrastructure which cannot cope with unplanned system outages, i.e. no high availability clustering? No DR?

You've got no lower priority VMs which can be shut down, or hibernated, or have their resource allocation screwed right down?

I never claimed that there was, of course you'll need to customise it and quite possibly have to write some scripts.

Click to expand...

In other words, build the tools needed

No - in other words use the tools to build your environment. What's wrong with you?

How about resolving this then. A depends on B, B depends on C, B also depends on A, and C also depends (to a lesser extent) on A. The dependencies between A and B aren't negotiable* - B will not start up correctly without A being running, and A will not start up correctly without B running.

If that were really the case then you could never actually start A or B, if those are services. If they are servers then your physical deployment model is wrong and you need to find the architect responsible for that and break his legs.

There are also a shedload of other systems, from disparate vendors, running a variety of operating systems and virtualisation technologies - that is the nature of the beast when you want different systems to be what is appropriate to the job, rather than picking a vendor and shoehorning your requirements into what that vendor provides

Of course there are, but the whole point of live migration is that things carry on running, and if a VM moves then it moves, and users or other systems using the services it provides also carry on running. A heterogeneous environment dos not prevent VM migration any more than it prevents you manually starting things up. Of course you can't move a Windoze VM to a Linux/HP-UX/AIX server.

Having discussed it with others in similar situations, it's clear that there isn't a simple answer - and definitely not one off the shelf.

Why does it have to be off the shelf? Virtually nothing else is.

No, I change my mind. You're not an IT salesman, it's far far worse than that - you speak like a ... and I feel a bit dirty even using the words ... a management consultant

Neither.

skotl · 10 Mar 2013

At last! Six years I've been a member and finally a topic where I have some expertise

Firstly, on the UPS side, of course you don't (beyond a very small server room) have a UPS per server. In decent sized data centres the UPS isn't even in the server room, it/they are way back in the power room.
Ideally you run two different supplies, protected by two different UPSs, down each rack and every server / switch has two PSUs, one plugged into each rail.

The servers do, of course, have IEC power leads (the very lead that our ex-electrician slapped TESTED stickers on without any, err, testing) and these run to power rails at the side or top of the racks.

As for virtualisation, then yes, of course it's a huge benefit in utilising server resources. We get to spread maybe 140% of the server load across the physical devices, knowing that not all applications will be using all the resources at all times.
As BAS mentions, you get some additional benefits with the higher end solutions such as VMWare ESX. The first is load balancing, where the farm will move virtual servers seamlessly to less busy hosts.
Next you have failover; when a host fails, the virtual server stutters a bit and then begins running on another host.
The final benefit circles back round to the initial discussions; maintenance of hardware becomes easier because you can "evacuate" virtual machines from a host onto one or more of the remaining hosts, then bring it down for maintenance, patching, etc.

The point about restarting hosts in a certain order is critical and, as BAS suggests, there are tools that help you here but it is a very complex art.
If you take a single application that we run in house, we need to start the following apps, in the following order (and they are each homed on their own dedicated virtual server); database server, application server, integration server, purchase-processing app, mail import app.

Tools like VMWare help you, by allowing you to state dependencies on servers which influences startup order. However, you quickly realise that you have circular dependencies on individual servers, due to different application requirements. So, BAS, it does help but with a total outage (our UPSs running dry) we would face probably around eight hours to get everything restarted. That's the difference between Business Continuity (UPS, virtual farms, load-balancing and failover, etc) and Disaster Recovery (the poo really did hit the fan).

SimonH2 · 10 Mar 2013

skotl said:
So, BAS, it does help but with a total outage (our UPSs running dry) we would face probably around eight hours to get everything restarted. That's the difference between Business Continuity (UPS, virtual farms, load-balancing and failover, etc) and Disaster Recovery (the poo really did hit the fan).

Indeed, BAS seems to work in a very perfect environment. No-one seems to know what he does, but from his writings here he clearly doesn't actually do this sort of thing. What the glossies say is not the same thing as what happens in real life.

In real life we do (for example) have UPSs without infinite runtime, or UPS batteries that despite having been tested as good a few weeks ago fail when needed, and so on. There's no way you can eliminate all SPFs (single points of failure) - or more correctly, you probably can but at a cost that's disproportionate to what you are trying to achieve. That's what the analysis done for a BC plan tells you - where to draw the line.

Even people who have experts employed full time to do this, and have spend many millions on multiply redundant power supplies etc, still get caught in the dark. No I can't be bothered going and looking for the stories of data centre outages.

But back to my earlier question BAS. A and B are real - A is the border router, B is a DNS server. A won't start correctly without B, and B can't resolve external addresses without A. Startup is actually quite simple - start A without certain elements running (route packets without some of the filtering and accounting functions), wait for B to be running, then start the missing services on A. Easy peasy, could be automated, but for the time it takes and the infrequency of it occurring - it would take many many years to break even on the man hours expended.

But of course, this is all a long way away from the original problem of someone yanking a power cord on a server because they were too ignorant to realise that it's not the sort of thing to do.
IN that case, the server was exactly the sort of thing where you don't want to virtualise it and put it's storage on your shared storage box. Who on earth (apart from BAS) wants to put their BACKUP data on the same storage as their live data, and be able to switch the VM of the backup to the same box as their live server ?

SimonH2 · 10 Mar 2013

Oh yes, just checked some uptimes.
It's over a years since some of my hosts have been rebooted. Given that this whole "cold start" thing only takes (from memory) an hour or two at most (of which some would still be needed even if you fully automated all the startup dependencies), how much time would you spend eliminating a couple of man hours for a job that hasn't occurred in over a year ?

ban-all-sheds · 10 Mar 2013

SimonH2 said:
Indeed, BAS seems to work in a very perfect environment. No-one seems to know what he does, but from his writings here he clearly doesn't actually do this sort of thing. What the glossies say is not the same thing as what happens in real life.

Indeed it is not.

So I tell you what then, as real life is imperfect we won't virtualise, we won't automate, and we won't do anything at all to help avoid unplanned outages, be they unplanned, or avoid planned ones.

But of course, this is all a long way away from the original problem of someone yanking a power cord on a server because they were too ignorant to realise that it's not the sort of thing to do.

If you recall my suggestion about virtualising so that you can use live migration to empty a server was in response to the problem of planning a shutdown, not as a way to deal with the problem of a server failure.

IN that case, the server was exactly the sort of thing where you don't want to virtualise it and put it's storage on your shared storage box. Who on earth (apart from BAS) wants to put their BACKUP data on the same storage as their live data, and be able to switch the VM of the backup to the same box as their live server ?

And I've suggested those things where, exactly?

SimonH2 · 10 Mar 2013

ban-all-sheds said:
SimonH2 said:

Indeed, BAS seems to work in a very perfect environment. No-one seems to know what he does, but from his writings here he clearly doesn't actually do this sort of thing. What the glossies say is not the same thing as what happens in real life.

Click to expand...

Indeed it is not.

So I tell you what then, as real life is imperfect we won't virtualise, we won't automate, and we won't do anything at all to help avoid unplanned outages, be they unplanned, or avoid planned ones.

I didn't say we won't do any of those, but that in the absence of unlimited budgets and unlimited resources, we don't aim for perfection - only what is "reasonable" taking into account all the factors.

But of course, this is all a long way away from the original problem of someone yanking a power cord on a server because they were too ignorant to realise that it's not the sort of thing to do.

Click to expand...

If you recall my suggestion about virtualising so that you can use live migration to empty a server was in response to the problem of planning a shutdown, not as a way to deal with the problem of a server failure.

In that case, the server was exactly the sort of thing where you don't want to virtualise it and put it's storage on your shared storage box. Who on earth (apart from BAS) wants to put their BACKUP data on the same storage as their live data, and be able to switch the VM of the backup to the same box as their live server ?

Click to expand...

And I've suggested those things where, exactly?

Where you wrote :

ban-all-sheds said:
ericmark said:

Asking people to log off to be able to test is common and with servers finding windows to do work again common. But switching off is still a problem.

Click to expand...

So virtualise them all, then you can move the VMs and empty the physical server.

Put forward, it would appear, as a universal answer to any problem requiring a server shutdown. That IS what you appear to have been pushing. You may not have meant that, but it certainly came across that way.

Contrary to what you seem to be suggesting, there in fact a great many situation where virtualising everything and using shared storage is not the "best"* answer. And for many situations, just shutting something down can be simpler/better than shifting VMs around.

* For whatever criteria you define "best".

EDIT: I assume you have more than just a hammer in your toolbox - I'd assume screwdrivers, pliers, etc.
We use more than one tool at work - we don't just have one hammer (full virtualisation, with full management stack and shared storage etc), we have a range of tools which are chosen according to the job in hand.

ban-all-sheds · 11 Mar 2013

SimonH2 said:
I didn't say we won't do any of those, but that in the absence of unlimited budgets and unlimited resources, we don't aim for perfection - only what is "reasonable" taking into account all the factors.

I can assure you that you did not start out giving the impression of anything other than implacable opposition to and rejection of those.

SimonH2 said:
ban-all-sheds said:

So virtualise them all, then you can move the VMs and empty the physical server.

Click to expand...

Says the man demonstrating a politicians level of knowledge of technical matters

That to me, reads as nothing less than an unequivocal rejection of the idea of using the live VM migration facilities offered by many VM managers in order to avoid service outages during planned server shutdowns.

And nothing less than an unequivocal assertion that anybody who suggests such a thing doesn't know what he's talking about.

ban-all-sheds said:
Actually, yes it's very common to have dependencies that dictate services need to be started in a set sequence.

Click to expand...

In that case, if it's a problem, you need an automation/orchestration tool.

I've added some emphasis, as you gave every impression of not having read that part.

If it's a problem.

Just what is so objectionable, or so indicative of ignorance on my part, about suggesting that if dependencies make the need to properly sequence startup a problem then a tool to automate it would be of value?

You didn't like it though, did you, trying to twist what I wrote into a claim that virtualisation and associated tools were some kind of magic wand which you could just install and with a few clicks have it up and doing everything, and that therefore I didn't know what I was talking about.

SimonH2 said:
Sorry BAS, there is a saying that when you are in a hole, stop digging. You clearly know the buzzwords, but also clearly don't have practical experience - either that or your identity is revealed as one of those IT tools salesmen who will happily sell your tool as doing all that and more (probably even makes the coffee) but which actually turns out to have a lot of gaps.

You came up with fatuous "objections" such as (paraphrasing) "well you might be able to move VMs, but that needs you to choose a virtualisation environment which provides that", or "you need the right infrastructure" or "how can you move something if there's nowhere to move it to?.

And when I responded (paraphrasing) "Well, durr, yes of course" (in much the same way as I would if someone had asked how to take his family to a holiday destination, I'd said "you could drive", and you'd come back with "Huh - you sound like a car salesman, or someone who's just been reading the glossy brochures from car makers, not someone who actually knows abut using cars. Saying 'you could drive' is no good, as he'd need to have a car with enough seats and boot space. He'd have to know how to drive, you can't just hop into the drivers seat and press a button, you know. And there would need to be roads for him to drive on and space to park his car.")

And you're still doing it.

ban-all-sheds said:
SimonH2 said:

Who on earth (apart from BAS) wants to put their BACKUP data on the same storage as their live data, and be able to switch the VM of the backup to the same box as their live server ?

Click to expand...

And I've suggested those things where, exactly?

Your answer?

SimonH2 said:
Where you wrote :

ban-all-sheds said:

ericmark said:

Asking people to log off to be able to test is common and with servers finding windows to do work again common. But switching off is still a problem.

Click to expand...

So virtualise them all, then you can move the VMs and empty the physical server.

Click to expand...

So I'll ask you again:

Where have I suggested putting backup data on the same storage as the live data, and where have I suggested moving the VM of the backup to the same box as the live one?

skotl · 11 Mar 2013

ban-all-sheds said:
SimonH2 said:

Where you wrote :

ban-all-sheds said:

ericmark said:

Asking people to log off to be able to test is common and with servers finding windows to do work again common. But switching off is still a problem.

Click to expand...

So virtualise them all, then you can move the VMs and empty the physical server.

Click to expand...

Click to expand...

So I'll ask you again:

Where have I suggested putting backup data on the same storage as the live data, and where have I suggested moving the VM of the backup to the same box as the live one?

I'm not sure whether you're just looking to pick a fight here BAS so, if you are, lemme know and I'll just get the popcorn and enjoy the show.

But I'm not sure what you're taking umbrage at here (other than the suggestion that you don't know your VM technologies) - I didn't see any previous discussion of you advocating what to do with storage.
Let's rewind to the original point; if you have to switch a physical server off to carry our maintenance / electrical testing, then that can have a knock-on effect for people using that server (or its dependencies).

Virtualising it is a useful step and then, yes, you could evacuate its guest VMs to other hosts and bring the physical box down. For that to work effectively, though, you need n+1 or n+2 physical hosts (where n+2 allows a box to be taken down for maintenance while still leaving capacity for failure of another host). You also need shared storage to host the VMs on, and you need one of the heavier-duty virtualisation solutions (ESX rather than ESXi, for example).

And then we have the rest of the hardware which, presumably, also needs to be maintained and tested. That means multiple paths thru multiple fabric switches, load balancers and ethernet switches.

This is a pretty major investment in hardware, (VM) software licences and the expertise to maintain it effectively.

So, I am all for virtualisation - my business couldn't operate as competitively without it - but it's not the case that virtualisation alone removes the impact of performing physical maintenance on kit.

diynot

If you need to find a tradesperson to get your job done, please try our local search below, or if you are doing it yourself you can find suppliers local to you.

Select the supplier or trade you require, enter your location to begin your search.

Are you a trade or supplier? You can create your listing free at DIYnot Local

Appliance testing - none of you would do this would you ?

Similar threads