Although the 2017 hurricane season is over and many organizations have restored operations, for some there’s an approaching storm that most won’t recognize until it’s too late. Organizations that have mission-critical electronics that have been shut down because of power outages (or running on generators) need to take note – there’s a very real threat of equipment failures brewing. The storm ahead can be endured (or even eliminated) as long as it’s appropriately addressed before it becomes even worse than a Hurricane.
Why electronics fail…
Even if your data center or other mission-critical electronic equipment wasn’t directly exposed to rain or flood water during the Hurricane it may have been compromised by environmental conditions that were imposed after the storm passed. Especially true for installations in the tropics and near the ocean, sudden loss of environmental controls (e.g., air conditioning) can spell disaster for electronics.
Consider the following scenario:
A cell phone company starts to experience unusual failures of its tower equipment months after the passing of a major Hurricane… on equipm
ent that wasn’t believed to be damaged. Technicians start to notice that the components that are failing all exhibit obvious signs of corrosion on unprotected metal surfaces. As they investigate the origin of the corrosion they conclude that the enclosures were not breached by wind-driven rain – the problem came from something within the enclosures that were designed to protect the sensitive electronics. Only after engaging experts in recovery of electronics did the company realize why they were losing so many electronic components to the damaging effects of corrosion.
Telecommunication companies utilize Remote Terminal (RT) sites to secure the electronic equipment at their cell towers. RT’s are self-contained data centers, generally designed to be installed in remote – and very harsh – locations (like the top of mountains on Caribbean islands). Loaded with electronic equipment that produces a lot of heat, the RT’s need to be cooled in order for the equipment to operate continuously. Ideally, the internal environmental within the RT are maintained at a constant temperature of 65˚F and relative humidity (RF) below 40%. While designed to sufficiently be sealed to prevent contaminates from entering the RT, the units cannot be hermetically sealed… over time there is always a build-up of whatever microscopic contaminates are outside the RT.
When warm, moist air comes in contact with cooler surfaces, the moisture condenses. That’s because the cooler area surrounding cooler surfaces cannot hold as much moisture as warmer air. Each time the generators ran out of fuel, the RT lost environmental control. Each time the RT lost environmental control, condensation formed on all cooled surfaces. Each time the equipment was exposed to condensation it was absorbed by the build-up of microsc
opic contaminates. Each time the contaminates absorbed moisture there was a chemical reaction… the resulting reaction created a weak acid (HCl) that began to degrade unprotected metal surfaces. While it generally causes minimal damage, it becomes problematic when it affected critical components like circuits and interconnection points.
How to steer clear of the storm:
The single most important thing to do is to be proactive. Instead of reacting to problems after-the-fact (like they did on the cell tower RT’s), you need to be able to recognize the “tell-tale” signs of potential problems… like the unusual service tickets that were opened, but declared NPF (No Problem Found). In retrospect, it was apparent that the cell tower failures could have been prevented had the series of intermitted problems been recognized as the “canary in the coal mine” beyond the point of no return. For organizations that may not have the resources to forensically review failure trends and/or recognize potential threats it makes sense to engage an expert in disaster recovery of mission-critical equipment.
Fortunately for the Cell Phone Company, the sudden upswing in equipment failures that they were experiencing were held in-check by implementing an aggressive corrosion-control program in every RT installation. As time permits, the company plans to circle-back to each site and determine whether additional processing (such as decontamination or isolated component replacements) is necessary to restore the operational integrity of the equipment to “pre-incident” condition.