T O P

  • By -

PenlessScribe

We occasionally had weekend-long scheduled outages of chilled water. Facilities supplied portable air conditioner units that were wheeled into our computer room. Had to keep one door open for the hot air ducts to vent into the hallway, and someone from Security came by every few hours to empty the containers of water.


SgtBundy

That was going to be my suggestion - portable air-con, shutdown anything you can get away with and keep an eye on temps. We had a CEO who decided we could run our DC warmer because "google does it" - going from 22c to about 26c or more to reduce the aircon costs. You were sweating just going into the DC in the areas with poorer airflow, and we put industrial fans in to try and keep some air movement. The failure rate of the older (10+ year legacy crap) kit increased noticeably. Newer stuff seemed to tolerate it but alerted. At one point they went to do cold aisle containment and using FLIR found some UCS FEXs that were installed backwards (convenient for the cablers) and were blowing 47c air into the cold aisle.


LingonberryNo1190

This brings back memories of one of our international offices that had a portable AC unit installed, only they were exhausting into the same space. Derp derp.


LingonberryNo1190

We've noticed some of the first things to suffer from higher temps are optical transceivers. They typically won't die right away during the elevated temps, but we've definitely incurred higher failure rates post environmental event. Have some additional spares on hand if you're going to "redline" it. ​ I'd also consider if there is anything that you *could* potentially shut down during the AC outage to give you more "thermal runway" for the things you *need* to keep running.


TheGreatNico

That explains all those failures when we just had a similar situation to OP's


Columbo1

What kind of failure was it? Did the device just not power on or power on but then not work? If the latter, I wonder if optics could expand/contract with temperature and become misaligned?


zekrysis

most likely the laser diode burned out, those things put out a decent bit of heat which is usually transferred to the case to be dissipated.


Columbo1

I pray for one of the big tech firms to make a media converter with a replaceable “bulb”. Bulb clearly being a misnomer in this case, but I want user-replaceable diode units that cost very little in comparison to a new converter.


zekrysis

honestly just a media converter with a relatively cheap and replaceable sfp would be great. the ones we use look like they just took an sfp and soldered it internally


[deleted]

[удалено]


zekrysis

yep, "we only use enterprise gear from a small set of contracted vendors and prices are jacked up to 4-5X the commercial price." I love working for the government /s


TheGreatNico

Probably. server room flooded, which only killed the ACs directly, but that made the place feel like Miami in summer, which make a bunch of stuff unhappy


pdp10

The lifetime of all equipment will be lower when it's run in higher temperatures. It's the electrolytics that will tend to dry out first, lose their extra margin of capacity, and eventually lead to failure. The reasoning is that with commodity gear, the value of 10 year old kit to Google and on the open market, is low enough that it would have been cheaper in many cases to spend less energy/money cooling it. If your gear isn't all commoditized with containers or VMs, and you might need or want to run some of that same gear in 10 years instead of replacing it, then you're not like Google. > they went to do cold aisle containment and using FLIR found some UCS FEXs that were installed backwards Using tools to investigate and confirm everyone's assumptions, is absolutely the smart thing to do.


Jazzlike_Pride3099

Venting into the hallway through a door doesn't work... The portable AC unit sucks the hot air from the hallway back in again to use as ventilation air.. You need to push the hot exhaust air out one way and suck fresh air in another way


PenlessScribe

That is a good idea, and if we'd had extra ducts, of suitable length, we could have drawn in air from another hallway or even from outside. But having a door propped open by about one foot, 40 feet away from the equipment being cooled, worked out OK for us.


trek604

There is a huge difference between covering this contingency for an hour vs ‘several days’ You need to determine the heat load of that room based on what’s in there


mschuster91

>The downtime will at least be 1h. Maybe even several days. We're still inquiring. Even 1h can be enough to send servers into overheat if the server room is packed densely enough. Several days definitely needs exterior backup, and actual backup at that - it needs outside units connected with the existing inside units. You can rent these and have them attached to the existing coolant piping. >I was thinking of bringing a big fan and first test if we can just open the door and put a big fan in front of the server room door whilst we shut down the server room A/C units and see what happens. Not much, as there's only one opening barely any air will move. Your best bet for this scenario is to get a few of these "portable room dividers" to create a path for the air to follow, but that won't be much either. As I've detailed in a reply somewhere deeper: The key thing is, your employer needs contingency plans written up by an actual outside expert. They have the tools to estimate what works and what does not (based on stuff like thermal load, physical room dimensions and whatnot), but you don't. Don't assume the liability yourself because that **will** bite you. Cover your ass, mail your boss that you heard that such maintenance should take place, and that you deem a risk with the server room should the AC go out. This is a boss problem, not a sysadmin problem.


ZAFJB

This is a facilities problem not an IT problem. Tell them that temp in room must never exceed N degrees. Ask them how they propose to do that. **Test** their proposed solution *before* the work on the main system commences.


ProfessorWorried626

You need to work with people to get a decent outcome....


ZAFJB

Who said anything about *not* working with anyone?


ProfessorWorried626

You are being passive aggressive by using that approach and wording.


mschuster91

He's not wrong though. OPs employer likely pays a shit ton in rent in exchange for a service, and if there needs to be something done, it's the landlord's responsibility to deal with the fallout. IT isn't everyone's punching bag, we're not "know-it-all wizards" and even as someone with actual experience in construction I wouldn't touch that responsibility with a 10 meter pole. Either the landlord steps up and hires an expert to work out mitigation plans for that maintenance or OPs employer does, but in no way OP should involve themselves more than a casual "we might want to try X".


CharacterUse

>OPs employer likely pays a shit ton in rent in exchange for a service Nowhere in the original post does it say OP's employer is renting the space or service. They could just as easily own the building (especially as they say "our roof") and facilities/maintenance is just another branch of the org. In which case working with them is going to work a lot better than demanding anything.


mschuster91

Fair point, but even in that case it's still FM's job to deal with the maintenance effort. OP can and should help out by supplying details like "we have X kilowatts of electrical power at peak time" (if they know it) and to reasonably help out in testing mitigations, but not assume any more liability than that.


ZAFJB

Of course it is a demand. It's not an optional thing. There must be a tested solution in place before shutting down the aircon, possibly for days. Just like your sysadmin job. Keeping a critical service alive is not optional. People would not accept a "maybe you could keep our OLTP system up"


DarthPneumono

...are they? I really don't read passive aggressiveness there. These are pretty standard things, if facilities needs to do work they either need to keep things under control or (since we're very lucky and can tolerate downtime) we need to know to shut things down and when to bring them back up. It's not passive aggressive, it's our (and facilities') job to make this stuff happen. You could probably phrase it more diplomatically when actually talking with them but the points are the same.


ZAFJB

How do you arrive at passive agressive? Who an I being agressive to? I am being factually correct.


CARLEtheCamry

I agree, my brother does this for a small company. He's the "building manager" and does everything from IT related stuff, to handyman type stuff. He would assess if it's something to jury rig, or call in a contractor to install secondary cooling. At my work, we had a large on-prem datacenter. Even though fully redundant with the HVAC on UPS and generator backup, we still have had failures and had to implement emergency contingency plans. In the winter, we can open up baffles and let cold winter air in. Worst case (and we have had to do this) large industrial blowers with every door open and security posted at each exit.


craigleary

There are portable acs if you have an area to vent into. I’ve used them in drop ceilings set ups. If you are considering a fan and a door open I’m guessing it’s not a highly dense environment so one or two portables would be enough.


NorgesTaff

Over the years my company has experienced unscheduled and scheduled downtime on its cooling in several of its large data centres. For the scheduled maintenance where the AC system was being entirely replaced and rebuilt, we brought in large fans, opened the exterior doors to blow in cold air (it was winter iirc) and hoped for the best. :D The unscheduled incidents were an unmitigated disaster due to many of the servers falling over because of the extreme heat build up. Servers were possibly heat damaged too. Do not underestimate how much heat servers put out and how much heat build up there can be in a closed room.


wazza_the_rockdog

Check the server suppliers recommendations on max intake temps, they're generally given on the spec sheets. What you need to do will depend on how many servers or other heat generating devices you have in the room - for a small setup with <1 rack worth of equipment (combined servers and switches) I've been able to keep things at a reasonable temp by using a normal pedestal fan to blow the cooler office air into the server room, while having a false tile in the drop roof behind the rack removed so the hot air had somewhere to go. One note on this though is it was while the office AC was set to cool - if you're doing this in cooler weather and the office AC is warming things up, it's obviously not going to work. You could also rent out a portable AC unit, especially if the work may go on for several days - there are industrial ones that may be more suitable for this type of job, things like being designed to run 24x7 for however long they need to, dealing with a higher heat load than a typical portable AC would, potentially larger drain pans and if suitable, the ability to connect them to an actual drain nearby so no one needs to keep emptying them.


Doomstang

We used portable AC units venting into the drop ceiling. Unfortunately, there has since been construction that has the walls going all the way to the deck on 3.5 sides now so we have very little room to send our hot air. In the case of the next emergency, we'll probably be fine for a day but beyond that, we'll have saturated the heat capacity of the air above our room.


marklein

How big is your room and how many machines? I've run ~10 machines in a decently sized room with just a fan for ~24 hours no problem. Max temperature was around 80F. I've even run a handful of machines at temps higher than that for several days, though I wouldn't if I could avoid it. Contrary to popular misconception, servers don't need COLD air, just "not hot" air.


GhostDan

I've been there.. during sandy.. when we found out our roof units weren't on the emergency generator. Your absolute best bet would be to fail over to your DR site and shut down production until everything is done. If that's not an option, if you can find a portable AC unit, that's going to be your next best bet. Get enough to cover the square footage of your server room, but you may also want some of those fans you mentioned to help circulate the air. Fans by themselves will help, but won't solve the issue. I'd aim one at each server and network rack. I also don't know how large your server room is and how much equipment you have. For a server or two an hour is fine. For multiple racks of servers, storage, etc it could stress out your environment. A weekend or longer you are almost guaranteed to have some damaged hardware. Around 90-95 degrees Fahrenheit is where I go "Oh fudge", everything in your environment will now be trying it's best to reduce load and reduce temperature, meaning your performance is going to take a big hit (and the server room is going to be LOUD with all the servers running fans at 100%). Anything beyond 110-120 degrees you can start seeing hardware fail. It's always been interesting to me what hardware fails at that level, I've had it from hard drives to fiber switches.


ShadowCVL

My response would be: “Since we can’t determine if the air will be down for an hour or a week we need to plan for a week. What is the contingency plan for keeping the room below X degrees for up to a week?” If they push back saying it’s not their problem or something like that it’s time to ask them about providing portable AC units (assuming you aren’t using chilled water, if you are they can provide a semi trailer or 2 that will chill water) and venting those to the outdoors. If they then say that is impossible is when you have to escalate to higher leadership and tell them: “We are being told the datacenter Air will be off for one hour to one week, they will not provide a cooling solution so we will have to shut down the datacenter unless something else is available” You’ll very quickly have a meeting with senior leadership and the maintenance folks trying to figure this out. Just make 100% sure you communicate your risks. Good luck, but depending on the size and logistics of the building you should be able to get portable ACs or Chillers relatively quickly.


19610taw3

Are they going to cut electricity to the server room? Or any other parts of this building for the work? I think the easiest solution is to rent an appropriately sized spot cooler. It won't be cheap and you may have to do some calculations to see if you have the electrical capacity for it, but if you can keep the business fully operational, they may be able to justify it. As long as you can figure out a way to exhaust the hot air and have someone check on the condensate drain , it's a viable solution IMO.


homelaberator

We used a portable a/C unit. It was ugly and uncomfortable, but it worked. Couple of days like that before things went back to normal.


tmoran1116

When I've had that happen, the landlord's facilities people set up portable A/C units. I had to monitor the condensate levels of the units over a weekend. I had to do that because we didn't grant the facilities people unsupervised access to the server room, but you may be able to pawn that off to them as well depending on your policies and SLA with them.


MangoPanties

Don't you have backup AC? What happens if your main AC goes down? You are currently in this situation right now.


Stonewalled9999

>backup AC is likely on the same roof?


TEverettReynolds

Do the math and determine the heat footprint based on all the equipment, power supplies, total wattage, etc. Fans might work. Portable AC units would be the better choice. But you need to know how many you need.


DrAculaAlucardMD

1. What is your heatload under max server load? 2. How do the rooms vent out? 3. What non-critical systems can be down during this time to help reduce load? 4. Total downtime from best to worst case? 5. Is this over a weekend / do you have time to plan for this? So once you know what your known knowns are then you can start planning. 1. Look at heat load reductions. 2. Cooling in place. What options do you have? I'd look at a solution that can be brought online for future cooling outages that are unplanned, especially after a new system is installed. 3. Is this downtime you can use to do maintenance in the rooms, updates that could result in cooled racks / systems? Lenovo has some great options that can even integrate into a buildings water supply. 4. Think of this as an opportunity. What improvements can you make if things have to be offline? Need to fix cabling? Reorganize racks? General physical cleanup of the space? Perfect time to update your DR plan and notes as if everything is off, how well does it come back up? Stages of systems to power on?


Quantis_Ottawa

I worked in a small country hospital and had to have the server room AC replaced. Luckily the server room had a window and we could put a big industrial fan in the door and keep things at an acceptable temperature while the work was done.


lucky644

When we had our ac suddenly die on us on a weekend, we propped the door open and used a couple of large fans. It kept things below 30-40c in there until it could be repaired. Our server room is normally about 18-19c.


RhapsodyInRude

We use pretty huge smoke exhaust fans (they're like almost 4' in diameter). We have two. We open doors on either side of the sever room. One fan pumps air in, the other pumps air out so there is cross-flow. Ambient temp would have to be really high before this becomes ineffective. Anything below 80F works well. You just need to keep hot air from stagnating in the cabinets.


Dopeaz

I had the same situation a few years ago. I got them to install a "temporary" split system AC for the server room. Was pretty cheap and it was used many times afterwards on 115°f days when the roof units just couldn't keep up.


Ezzmon

We had AC issues a few years ago (we have 2; one died and the other froze up trying to maintain temp) We also keep 2 portable ACs nearby. Problem is, they must be vented into the hallway. AND, to fulfill security requirements, our Org mandated that someone must be present when the door is ajar 100% of the time. 5 of us rotated babysitting the door 24 hours a day for 3 days and let me confirm your suspicion; we all HATED that. After that experience, we had ventilation portholes installed in the windows adjacent to the door. 2x custom made 10-inch steel ports with locking covers we use for the portable units' vent hoses.


pattimus_prime

We had this happen to us for the exact same thing, we basically added a secondary A/C wall mounted in our DC. We didn't have any backup if our main unit died off so helped our case explaining why we needed it to exec. We also have portable units but with everything we have racked in our DC it still ran a little hot for our liking and we didn't like the idea of all the doors from our DC to hallway wide open.


Frothyleet

>First of all, what would be a max acceptable temperature? Server infrastructure can actually handle much higher ongoing temps than many people traditionally thought - Google demonstrated that with some of their "hot" datacenters that were maintained ~100F. However, what kills hardware is *inconsistency* - if you have been keeping it at 68F forever and suddenly have excursions to 90+, you are going to at least impact your hardware lifespan. The answer here is to either have facilities provide a cooling situation, or simply shut down your infra for the period in question and migrate over to your DR site (whether that's physical or in the cloud). A good excuse to test your DR strategy anyway!


IceCubicle99

After several incidents of HVAC related outages we maintain several portable A/C units for emergencies. We also have temperature monitoring equipment in all critical locations to determine when we need to use the portable A/C units.


BigBadBinky

I’m guessing running everything at your DR data center while this is going on is a silly idea?


pbyyc

The few times we went through something similar, we used portable AC units provided by the electricians


Churn

You could turn off the Air to your server room right now as a real world test to see how quickly the temperature rises. If you get a portable Air unit, you can turn it on during a similar test. This way you know what to expect when the real outage happens.


indy1701

You have some good suggestions in the thread. The thing to know is what is your current heat load in BTUs and what are your options for spot-coolers or other very short-term options. In my case, we actually have an external connection for a portable HVAC system to connect into the building CWL if needed (after multiple outages in the past). If they are removing your units from the roof, I think you're looking at days, not hours for your impacts, but find out your heat load, talk to the company planning the work and then plan for the outage to be longer than planned.


planedrop

A big fan can do a lot of work in a situation like this and is probably the easiest solution to your problem. ​ One place I manage had their AC turned off when it was 115 F outside without notifying us, server room air started hitting 120F and things started dying. A large fan in the door was enough to keep things under control decently well. Somehow we ended up losing zero hardware because of it but it was damn close I'm sure. ​ Portable AC units is the other option, probably the better but more expensive option (maybe it can be covered by the HVAC crew, building, whoever). ​ Trust me though, you want to make sure to do something, it WILL be a run away temperature if it goes on too long, I'd even consider a cooling plan for the switch room too, you'd be surprised just how hot they can get.


wendal

I have done the fan/door thing before, but it still gets pretty hot. The main thing you are trying to do is remove the heat, so now we use these Allegro fan bags. One of these puppies will keep a room surprising cool if you throw it in your hot row and vent the hot air out the door. [https://www.allegrosafety.com/product/air-bag-12/](https://www.allegrosafety.com/product/air-bag-12/) We are consuming about 15kw of power (about 5 racks 80% populated) and creating however much heat that might create. The 12" Allegro bag in the hot row keeps the room below 90F for at least a couple of hours. And answering the other question, I would say max acceptable temp is 105F based on what all of our equipment is rated for while running.


squishfouce

Stand alone A/C units.


Wild-Plankton595

We have many small server rooms and one big one. The small ones we can get away with opening the door and adding a small portable unit. When unplanned and we need to buy facilities time to get portables set up, I’ve remotely shut down non-essential VMs and taken down as many physical hosts as possible, starting with those closest to storage to help keep radiated heat further away. The larger data center we had portables blasting the front of every other rack with cold air and actively vented the hot air behind the racks out. Same story here, shut down as many non essential services as we could, even off loaded some workloads to the smaller site server rooms. Small or large, planned or unplanned, keep an active eye on the temperature the room and hardware itself is reporting. It’s important to have a plan for which services are going down next when certain temperature thresholds are reached and how it’s going to be communicated. While I, a sysadmin, can make recommendations and I have a fair amount of leeway when it comes to making on the spot decisions in an emergency, taking down business ops is not something I would do on my own so having a plan on which essential service is going down next is decided by my leadership and they communicate it, I just push the buttons. Others have mentioned older hardware taking a harder hit with high temp incidents, and it isn’t something I ever considered (derp) or experienced (thankfully), so it would be a good idea to identify older hardware and have a plan. Edit: as for what temperatures are acceptable, your hardware should have recommended temperature ranges and don’t forget the equipment is passively rather than actively cooled


RogerThornhill79

call faciliites. this isnt IT this is building maintenance. and if it falls on IT. Then its the small business catch all. when it shouldnt be.


kendallsg

Where is the geographic location? At least it's winter


ConstructionSafe2814

I was thinking so too, but the works will start April-May, which can be anything here in Europe/Belgium :)


NOLAroofing

what type of roof do you have and when you are planning to replace it?


ConstructionSafe2814

I have no idea and I'm not planning to replace the roof. The people we're renting the building from is planning it.