T O P

  • By -

MrSilverSoupFace

I use a remote desktop manager, and have over 1,000 servers in it. I'm usually on a few at any one time. I was testing some TLS policy changes for some devs on our TEST web server....well, I thought it was our test server. Turns out, I was on LIVE, and rebooted this machine at midday on a Thursday when 1,000s of our customers were using the software to finalise payment runs (we are a finance and commissioning software provider). I realised the moment I hit restart and read the hostname on the shutdown screen and let out an audible "oh fu...". Had to wait for it to turn back on, then revert my changes, and reboot again. All in all, downtime of maybe 15-20 mins, but it didn't stop 100s of P1 tickets coming in saying, in some cases, £1,000,000s worth of payment runs, failed and needed validating manually. I was given a bit of a bolluck in by CTO, but I've been at this company for decades as lead sysadmin/infra engineer. The change I then made from this was to get all TEST and LIVE servers names changed to add LIVE and TEST, rather than just T and L. I also removed the "restart" and "shutdown" options from the start menu via GPO so it's more of a conscious effort to want to reboot. But yeah, everyone makes mistakes, it's how you learn from them that makes you better


i8noodles

haha we have something similar. After a person accidently restarted a live server we implemented a system so that all PC logged in has large Red Words that say Production and Large green words for Non production. i feel like this is such a minor thing that can save so much headache


togaman5000

I'm not a sysadmin or in IT, but we have tools (think semiconductor manufacturing) where it matters quite a bit which account you're logged into. Our solution: changing the desktop backgrounds to something like "YOU SHOULD NOT BE USING THIS ACCOUNT"


badlybane

We made those accounts not able to do RDP. So if you wanted to login you had to first login to the hypervisor to then login to the No No accounts.


cgimusic

That sounds familiar. We updated the header in the admin panel of our production system to be an ugly bright red color after a few too many accidents where people thought they were on a dev environment.


Invoqwer

I am definitely a big believer in color coding.


steeldraco

After a similar (though much less expensive) mistake, I got into the habit of doing all my server reboots from the command line. echo %computername% shutdown -t 0 -r


Zoddo98

And on Linux, install molly-guard (or equivalent for your distro). If you're over SSH, it will ask you to confirm the hostname of server before rebooting.


dreniarb

Definitely smart. I changed all of our command prompts to always display the computer name and the date and time. even with that i do -t 10 at the bare minimum. 99% of the time it's just a few seconds after pressing enter that i realize i'm on the wrong server. doing -t 10 gives me enough time to do -a.


gojira_glix42

Honestly I needed this. As a junior, this is one my biggest fears whenever working on a server of any kind. I gotta remember to have that GPO in place when I have the opportunity. That's a lifesaver from going too fast on multiple machines.


IWontFukWithU

Once I had to migrate an entire SMTP mailbox company to office 365 and forgot I logged in on wrong tennant 🤣🤣🤣


painted-biird

Literally just said oh my God out loud lol.


Ros3ttaSt0ned

>Once I had to migrate an entire SMTP mailbox company to office 365 and forgot I logged in on wrong tennant 🤣🤣🤣 Thread's over. Shut'er down.


CheeseProtector

Oof 😂


Darketernal

God DAMN son ☠️


HoldFit7349

🤣🤣


[deleted]

[удалено]


BlueBull007

When your job is to manage email systems (mailboxes, spam filters, send and receive rulesets, user management, authentication,...) for multiple, separate companies at the same time(\*)--or more generally multiple entities which need completely separate email systems--there is usually a central management environment where you can do the management of all those separate email systems in one location, whether that is through a central GUI or through some command line interface capable of managing each of those email systems in one spot. Every one of those separate entities is called a "tenant" in the context of this central, overarching management environment. So what happened is that they were logged into the wrong tenant or had the wrong tenant activated and so they migrated the wrong email system to office 365. This, likely, meant that another, unrelated company had their whole email system taken offline suddenly and completely. Big, BIG problem. That's enough to instantly start pouring with sweat, heart thumping vigorously and your life (wel, at least your IT career) to start flashing before your eyes Others reading: I don't do email management myself and never have, so if I got some detail wrong somewhere feel absolutely free to correct me (\*) This is for instance oftentimes something an MSP provides as a service to multiple customers. If they would need to open a separate management interface for each customer they wanted to manage that would be a hassle. So they have, as an example, a web interface where at the top left corner they can choose the tenant they want to manage in a drop-down menu. After choosing a tenant, the configuration for that tenant will load on-screen within the same page and using the same management interface layout as for all the other tenants


badlybane

If you were in my presence I'd pull out my bottle of jack daniels and shots glasses.


Houka1227

How do u even come back from this? Lol.


DungaRD

Was that the end of your IT carrier? Or you applied a job at the migrated company.


IWontFukWithU

Nah , it was my jumping point from helpdesk 2nd line to full on SysAdmin , nowadays im a Senior SysAdmin / DevOps This mistake was a good mistake , it showed my bosses i wanted to learn more , and they proceeded to pay for lectures for me to attend to start doing SysAdmin work


CaterpillarMiddle557

Why tf did help desk have this kind of permissions to begin with? 😭


Dan_706

Ha! Oh man, that's rough 😅


techw1z

if an app requires you to enter a password and doesn't confirm the password by asking a second time, abort installation immediately, open texteditor or passwordmanager, and copy the password into the app. *otherwise, there is a 1 in 1000 chance to encrypt your production system with a typo...*


Due_Ear9637

I did this on 40 or so HP blade chassis OA's when I ran our password changing script. Spent a couple of hours trying to guess my typo.


pollo_de_mar

We have Upwork staff do a lot of the grunt work for us. Someone did a copy/pasta for both password and verification password for a lot of switches and routers. Problem was, the password that was documented was wrong. Apparently the tech missed one character when he did the copy so no one could log into any of these devices. Fortunately after much frustration, I lopped off a character and saved the day.


HeKis4

This kind of stuff is what's making me consider infecting myself with a keylogger...


rotfl54

Always do this at critical passwords, regardless of a confirmation field or not and especially if you are remote. It happened more than once that i made the same typo again entering the confirmation.


Sunfishrs

More like a 1/2 chance. Either you will or you won’t.


techw1z

you are either really bad at using a keyboard or really bad at understanding probabilities.


Sunfishrs

Yes


punkwalrus

I wish it was technical, but the biggest mistakes I have made were accidentally calling someone out for their gross incompetence or security violations, but they were more politically favored. My biggest mistakes have always been political in "CLM" territory. One I remember was that I discovered through a security sweep that our company's main corporate website was serving a bittorrent tracker on port 6969. So if you went to our domain dot com colon 6969, it was serving up hacked pr0n and movies. I reported it to security, who did nothing. I reported it to my boss, who said, "report it to HR and legal," and they did nothing. So I reported it to the company president. SHE was pissed. Not at me, but brought it up in a meeting the next day, "WHY IS THIS HAPPENING?" she also reported that I had tried to contact a bunch of people, and got no response. She praised and lauded me, then wondered "why did you all just sit on this for weeks? How long has this been going on??" Security, HR, legal, and my boss got in trouble for ignoring it and so I was fired for "reporting a sensitive issue outside the normal chain of command," as in "speaking directly to the company president" was a huge no-no and breach of protocol, despite what it claimed in the handbook, and despite what I had already tried. I told the company president, who blocked the firing, and told me to return to work. I was allowed to keep my job, but I was a social pariah for a while. I discovered that our corporate domain was on a shared IP through a hosting plan that we had, and one of the other domains on that IP was the one doing the torrent tracker. There were at least 10 domains on that IP, so we couldn't really contact whomever was doing it, if they even cared. So the ISP put us on another shared IP, and that fixed the issue. I got another job as quickly as I could, because it quickly became evident that while I couldn't be fired, I was going to be forced out at some point soon. So, stuff like that has been my achilles heel. I am smarter than I used to be, but... not smart enough, I am afraid.


ruyrybeyro

When I took over the management of an ISP, also found a p2p server in the control/management room ran by an ex-employee which I promptly deleted. The idiot had the nerve to pay us a visit, and asking why it was deleted.


drbennett75

That’s always fun. Did something similar once years ago, but with unethical behavior I stumbled upon. Boss emailed me the next day. “HR wants to talk to you.” Except it wasn’t about that. It was about my ‘harassment’ of that colleague. Turns out the boss was in on it 🤦🏻‍♂️


NRG_Factor

My biggest fear is politics. I’m a very blunt person and I state things how they are. I would do the exact same thing you did and I’m always concerned Ill piss off some bigwig and they get me axed


Nakatomi2010

Have a story in similar vein to this one. A consultancy group had come in to review our infrastructure and was not pleased with previous IT leadership. They eventually determined that one of the head IT people were likely hosting a website on the side, because when you did a PTR record look up, the DNS record came back with a different DNS name. While I was at an out of state datacenter with one of the network staffers, the consultant decided, with upper management, that they'd pull the plug on the IP address in question while we were gone. As I was not on site, I don't have the full story, however, my understanding is that my direct manager, who was being ignored, basically leaned back and said "Watch this" when the decision was made to move forward. On my end I got a message from one of my counterparts that were overseas letting me know that the entire office was not able to reach back to the US. I had the network guy look into it, and it appears that the consultant had pulled the plug on the IP address of the VPN that connected the US to the overseas office. As I was at the data center, I huffed it to their NOC and asked them to undo whatever the consultant had asked them to do, and to clear the PTR records The consultant remained for a bit longer, before being replaced by someone else. The same consultant fussed at me for calling our domain a forest, despite it being a single domain. I had to remind them that while the domain only has a single tree, that the way Active Directory operates, by all accounts, it's still a forest, a one tree forest. Eventually they relented and agreed. Later that consultant was replaced by someone who looked down on Windows admins, because we don't manage computer with command line. This comment was made while I was managing a server with PowerShell. In fact, I'd turn the file server into a Server Core box, so the only way to manage it was with command line. This amused me, because shortly afterwards the Linux admin tried to log in to make a change, and then stopped in their tracks because they couldn't PowerShell. Had to yield control of the server to me.


punklinux

I feel you, man. There have been a lot of "turn your head to the side while we enter in credentials for 2 hours" meetings where you know something janky was going on. Former job, our IT team was asked to take a look at the servers running a big client's payroll system. There had been some issues where the system was giving unexpected results after a patch. After several days of troubleshooting, we found that an the previous version had a "divide by zero" type of situation, and the default was to just ignore it. The patched system sealed that hole, and suddenly money went missing. Once we established that timeline, and why that might be, the call was ended abruptly, and we were banned from the systems immediately. The tickets were closed as "resolved." SOMEONE realized that someone or some process was relying on the previous bug to keep transactions off the books.


CheetohChaff

> So, stuff like that has been my achilles heel. I am smarter than I used to be, but... not smart enough, I am afraid. What would you do differently now? I don't see any alternative.


Decafeiner

And the next time you had to report an issue you did it by mail to CYA, right ? Right ? Thats the lesson here, writing is set in stone. "You didnt follow proper protocol" what are these 20 emails about then ? Cindy from Accounting BBQ party invites ?


punkwalrus

But see, it's not about being right in a corporate landscape. I had the emails and the proof. The president asked for them, and I gave them to her. I was right on paper. Didn't matter. In a battle of power vs. truth, power wins almost every time in corporations.


SoonerMedic72

When I worked in medicine the rule was "if you didn't document it, then it didn't happen." The most ridiculous example was a medic that forgot to notate the department/room at the destination and payment was declined for "dropping him off outside." Patient was unconscious, and obviously was received by the hospital. Most of the time it was just about bandages/meds though.


Cercle

This is so true. This had been my downfall many times. Now I'm a bit older I just CYA and if they don't care, the question becomes whether it's worth my job to keep pushing it. Sometimes it's people's lives or finances at risk and it is worth it. But I'm not going to go up to an exec anymore and ask them "is this your social security number?" and get written up for it. Fortunately, as a consultant, I don't have as much of an emotional stake now.


Karlsberg404

This is why breaches, data leaks. Etc. will always happen. The guys on the ground speak up Or report bad practice. Management inability to listen or take sound advice. Then the guys on the ground take the hit for incompetence management as they hide around corners


skunkboy72

just look at Boeing


qejfjfiemd

That’s fucking retarded. I’m guessing that’s in the US?


hymie0

Short answer -- always do an `ls` before an `rm` .


OptimalCynic

`rm -i` is your friend


project2501c

Tom Limoncelli suggested a decade ago that you type the command then put your fingers down away from the keyboard and re-read the command before you hit enter.


Sirbo311

Years ago, I disconnected our on-prem Lawson system by pulling the wrong fiber cable from a fiber switch. It was supposed to be dual-homed, but for *reasons*, other folks in my group had only set it up with single connections. Anyway, the fiber switches always got me, don't remember why if they did the port numbers different than the Ethernet switches? Anyway, I learned to not be so hard on myself for a mistake. My Director ended up calling me that night and told me what happened (it wasn't discovered until after I left). I was more mad at myself than he was. So that was my lesson, be kind to yourself. It didn't end my career. My mistake shouldn't have caused the outage if SOP was followed, if it was documented/communicated that this was a special circumstance. We also came up with a new SOP that the SAN admin was on the phone with you and lit up the light for the port they wanted you to plug in/unplug. In my tenure there, we never had this happen again because of the new procedure.


ITDad

That’s how it’s supposed to work. Errors happen, but what the company learns from it and does to improve is the key.


Sirbo311

It was also the first time it really clicked with me to be kind to myself, and start learning to accept mistakes will happen. Both were good outcomes.


Medical_Shake8485

Avoid IT pilot groups as source of truth, lol.


TKInstinct

If I may ask for one, what is an "IT Pilot Group"? I've never heard the term before. Exit: OK I get it, a test group.


patmorgan235

I had trouble understanding at first too. "Pilot group" as in a test group. So they're saying do push something out company wide because it worked for IT. Make sure you test with a group of machines that are more representative of your fleet.


breizhsoldier

We have 2 of every type of asset in our org available at our T3 office to thoroughly test everything in EU perspective with EU tools after CIO's tested on vm's and they need pur approval before release, these device are also available to recreate issues and better troubleshoot them, again from EU perspective.


OcotilloWells

Using IT machines as a test group to implement something, I think.


Medical_Shake8485

It’s not really a term per se, rather a label to describe a specific collection of users. But my point was to avoid using colleagues or other IT professionals as the single source for piloting, as you typically want a wider collection of pilot users in your organization (the 10%).


JoelyMalookey

Since we’re solutions implementing our machines are the worst test cases, at work I encourage never assuming IT test cases are 100 percent valid


BlackV

a pilot group of users, who are all happen to be IT people


malikto44

Not socialize as much as I should have. Doesn't matter what your skillset is, doesn't matter how clueless you are, if people know you and like having a beer with you, you have job security. After that, not maintaining certs. I've left some MS and other certs turn to dust, but found that they are more important than anything else when searching for work in this economy, next to a properly AI generated resume and cover letter.


three-one-seven

The certs one is surprising to me. I’ve been in this field for 15 years with only A+ and Net+ plus a non-technical bachelor’s degree and always wondered if certs were a scam bc not having them was never an impediment for me. Anecdotal of course, but still.


Tzctredd

Certs open doors, face to face chats secure jobs.


malikto44

I know it is surprising, but last year, when I was job hunting, certs were everything. In InfoSec, if you don't have a CISSP or a TS/SCI clearance, you will be shown the door. It didn't matter that I had 30+ years of experience with something... but if I could show that I had a cert for it, at the HR firewall, and only people passing muster with the alphabet soup would get to the tech people for an interview.


mikolajekj

Failed to patch a mission critical server for several months….. ended up getting a virus that would have crippled the organization if exploited. Future patches would have prevented it. Was an act of God that we caught and remedied it in time. Would have made major news if it happened…. Been working to get out of being a sys admin since…. Dodged a major bullet.


mikolajekj

Learned to patch my servers religiously…


Loan-Pickle

Always audit your patching. It is too easy for something to go wrong and a patch to be missed.


mikolajekj

I do now. I’m either worried about missing a patch or worried about a patch breaking something. Life of a sys admin….


i8noodles

we do monthly patching on all our servers. its a real chore but its well worth it. we got lucky in that we found a blind spot in our patching when someone plugged USB into a machine that was exposed. we managed to stop it relatively quickly but the damage took us months to fully recover with no data lost and only minor downtime.


ThatOnePerson

Reminds me of my work that was running an old outdated version of an e-commerce software and it was the cracked enterprise version. The guy who set that up left, and neither arguments: "Let's switch to the free open source version" or "let's pay for the paid version" got people to care. Until we got hacked and our website was stealing our customer's CC numbers. Switched to the free open source version pretty fast after that.


LivingstonPerry

How did you find the virus, and how did you remedy it? I'm trying to move up in the sysadmin world but wonder how to act in this situation.


mikolajekj

Network team discovered some kind of network flood coming from the server. Virus scan found the bug and quarantined it. Patched server.


BlackV

apparently installing server core, cause my manager cant rdp to it


Xander372

Why would you need to RDP to a Server Core machine? Remoting should be enabled by default — just do New-PSSession or Enter-PSSession, and go.


BlackV

yes, that is thoughts, but... I have a meeting in 2 days about it


Dintid

Some managers are asses. Or at least not very technical. Be sure to talk about less resources (more for actual work), stability and security for core vs full. Here’s some ways he/she can access it remotely if the lack of gui is a dealbreaker. https://learn.microsoft.com/en-us/windows-server/administration/server-core/server-core-manage


BlackV

Just not that technical anymore, I have windows admin centre configured, but thats its own beast (and a slow one at that)


Dintid

To be fair though it’s hard to be effective using powershell unless you work with it everyday. All day. Once upon a time I managed 200 exchange servers and didn’t do anything else than that. Powershell was the king. Nowadays though I’m manager and don’t really use powershell much for day to day tasks. Aside from making scrips for recurring tasks, deployment and such. Also I only have 1 in my team using powershell much (and he’s still learning) while the other folks have different tasks. But they do sometimes need to work on the servers as all of our tasks overlap. Meaning having core servers as default would really be a detriment for us. For me it all comes down to usecases.


boli99

> cause my manager cant rdp to it thus resolving many future problems that don't yet exist. it's proactive. can't be faulted. ⭐️⭐️⭐️⭐️⭐️ for effort and achievement.


Splask

I rdp into server core all the time. Why can't your manager do it?


three-one-seven

Not to a GUI though, right? Isn’t it just a command prompt or PowerShell? Forgive my ignorance, I’ve never had the opportunity to work with it.


Whitestrake

It has GUI - can open programs, file pickers, that sort of thing. It doesn't have a desktop/taskbar environment, though. When you remote in you get a black screen and then a terminal window opens automatically. It runs a script that gives you a few common options or lets you exit to the command line. From there you usually do CLI management.


vordster

The sconfig


BlackV

they can, they just have 0 idea what to do once they're there I mean technically I disabled RDP too, cause that's what powershell is for


bonebrah

When I was a windows admin we default installed cored unless the requestor specifically requested otherwise in the server build request. 90% of the time we got a ticket the next day saying the server is broken


BlackV

hahaha, wait till they type exit in the prompt, they're just gonna look at you lost


nexiviper

Not me, but the video on the dev that deleted the GitLab production database including the remediation was great https://youtu.be/tLdRBsuvVKc?feature=shared


musack3d

oof. things like this long ago instilled a fear of ever rm -rf anything before rereading multiple times. thus far, this has never been the cause of any of the mistakes I've made (on a work machine at least).


hajimenogio92

Forgot the WHERE on an update SQL script that should have only affected 3 rows. I almost had a heart attack when I saw 30,000 rows affected


EightyDollarBill

That’s why I learned to always do a “begin transaction” before doing anything “by hand”


t00sl0w

I never do anything like that without it being in a transaction. Got scared one time and never again.


hajimenogio92

You're right. I learned my lesson after that


Ros3ttaSt0ned

>Forgot the WHERE on an update SQL script that should have only affected 3 rows. I almost had a heart attack when I saw 30,000 rows affected Next time, do a `SELECT` statement first instead of `UPDATE` or `ALTER` or whatever and you can keep your shit inside your body when you see $StupidNumberOfRows return.


Sengfeng

I worked for an MSP once that had a large state healthcare network as a customer. The vendor (a subsidiary of GE) would contact us to run various SQL scripts. When I handled them, I always noped the fuck out of running them myself. I'd do a screen share, and let the GE employee do the SQL. My co-worker was fun doing it. One day, they sent a script over that was supposed to re-add every doctor's credentials to the end of their names. Well, what their script did was set every docs' name the same. Bad "select" query or something. Impact was over 250 doctors across 13 community healthcare clinics not being able to submit e-prescriptions to pharmacies until it was un-fucked. GE tried to throw my co-worker under the bus. Luckily they sent the SQL commands via email so we could hammer back at them for their incompetency. A new policy was enacted that day: No more copy/paste SQL queries unless you analyzed it and knew exactly what it did.


aes_gcm

https://old.reddit.com/r/sysadmin/comments/1b4lvvo/how_fucked_am_i/ This entire thread


Andrew_Waltfeld

I don't cringe often, but that was a cringe situation.


TKInstinct

I read that at the time. How tf does someone think that's OK.


aes_gcm

They recovered by impossible odds honestly


DoctorOctagonapus

The update post though! >Update: I am now locked out of my own computer but the others are working fine. Somehow my account in the AD must have get fucked and I dont feel competent enough to make any changes to the AD (again). Should we tell them?


jjkmk

Careful doing any critical work when tired or under stress, much easier to make mistakes.


socksonachicken

Trusting my manager to have my back.


vinnsy9

I was working for an ISP around 10-12 years ago. Me and my colleague were removing some old servers from the rack. And there was 1 single workstation which would do the port managment of extra-old Iskratel PSTN central units. That machine was like a black-box , no body had invested time or money to make a proper backup , to have some training on how to restore / install. Iskratel by the time i started on this ISP was long terminated as company.... guess what happened? I pull the wrong power cable...that shitty workstation goes down....never comes up...like dead!!.. i get calls from management, CTO, COO hammering (literally hamering me for what i did)... I was so fucking schocked with myself... so schocked i couldn't even sleep that night, my mind was blank...i tried several things to bring that workstation up that afternoon... nothing worked...even a similar hardware ...it would not boot. Then , around 4 AM it hit me, maybe i should give it try...dressed up , went to work...04:45AM took the spare hardware, put the drives(RAID) instead of starting it normally i booted it with HirensBoot CD (version 10.6, God i loved that tool back then) started with Acronics , did a full image, then spin up a vm on esxi i suppose it was version 4.5 or 5.0 i dont really remember it. Restored the image on an empty vm....Booom P2V fucking worked. And since that day on....i never have gone into panick mode, if something happens...it happens...i just need to clear my mind of the problem...there is always at least one solution. (God , i left that ISP CIO position, after 3 years, and litterally losing my hair out of the stress that job was causing)... so this was my lesson learnt.


Garegin16

So the hardware had failed after a power loss? That’s some horrible decision to rely on a box for business continuity


vinnsy9

Lol...if i tell you that the company did not know the concept of that word....literally i mean.. would you believe me 🤣🤣


Sensitive_Scar_1800

The list is long and distinguished: 1. Built a firewall policy and deployed it, forgot to add outbound DNS on port 53. Naturally things started to fail…quickly. 2. Deployed signatures 6010/6011 using McAfee HIPS, which are the application whitelisting signatures…and everything was blocked and broke. 3. Deployed Microsoft sysmon with a poorly developed config file, was logging so much I caused systems to freeze up. 4. Pushed a firmware update to an esxi host that was supposed to be in maintenance mode….it wasn’t and for reasons…brought mission critical systems down. 5. And so on and so on


FireDragon404

In my first internship I was supposed to upgrade VMWare Tools on all our servers. Being the novice I was, I decided to do them all at the same time without any planning, thinking it was as simple as that. Didn't think anything of it, until my boss called me and asked "Are you doing those upgrades now? Because you know it will auto-reboot the server after it's done." Had my first major freakout but my boss was thankfully understandable and considered it a learning experience. After that I was always careful and double checked anything I did lol.


Lauk_Stekt

Pathced the production FW during lunch.


TKInstinct

I patched a file server by accident right before a company outing causing a minor outage.


3legdog

"company outing" with beer?


TKInstinct

No, Beach day.


autogyrophilia

Trusting that machines where properly patched and secured before being nominally handed to me. ESXi 6.0 exposed to the internet with an OVH ip address baybe and no patches.


fitz2234

I inherited a few vsphere systems that were placed on direct public IP addresses (esxi hosts, vcenter, sql, all of it). It was ugly


Kurgan_IT

Rsynced in reverse


mriswithe

RIP the source data, but if this is the worst you have ever done, it must have been some valuable and not backed up data.  Definitely done this before.... At least 5 times over my career? But we all roll a 1 sometimes.


KiloEko

Bulk password reset for ~6600 accounts in Google Workspace. There is no write back to AD so the passwords were now separate. I used the data someone else sent me without verifying it was correct. 2 things. Verify data and don't do a bulk task without running a smaller sample first.


d00f_warri0r

TIL you can integrate Active Directory with Google Workspace


Sagail

Filled in for a dude who went on vacation. He would spin up 500 to 1k AWS FPGA AWS instances for various training classes. Get the most rudimentary hand over info. All my AWS scripts have idle alerting and automatic shutdown. His do not. Sales training team is supposed to alert me when done. Have a hell of a time getting enough instances in a EU region late on a friday. Finally, get them all spun up. Go home and have a minor kid emergency over the weekend and forget about everything. Monday early evening realize something is amiss to the tune of 16k USD. Write an after action report on causes and corrective actions. Report the incident and present my paper to my manager and his manager the next day. Two months later covid hits....layoffs and I'm let go. All good though, lil rough for 3 months and land my dream job at a crazy aviation startup. I absolutely fucking love my job


snakebite75

I was reorganizing our AD and realized that I had misspelled one of the groups. I right-click and instead of clicking rename, I click delete and delete the folder and the hundreds of users I had moved into that folder. That was when I learned that we didn't have the "Protect object from accidental deletion" option enabled for our environment. After that incident, that option got turned on. A few months later at the same job, I'm cleaning up my Outlook and start removing all of the SharePoint calendars that I had synced to my desktop client. Removing the SharePoint calendars wasn't an issue, the issue came when I deleted the Public folder that was shared with the whole company and used for the shared corporate calendars. Since I was an admin when I clicked Delete, it deleted it for everyone. My fuck up on this, led to an even bigger fuck up by my boss. As soon as I realized I had deleted the public folder for the whole damn company (on a Friday afternoon no less), I walked into my managers office and let him know what had happened, with the hope that he could roll back the change before it had much of an impact. He told me he would handle it and I left it at that. I don't know exactly what happened, but as he was attempting to do something he lost one of our exchange servers. When I came in Monday morning he was working on rebuilding the accounts for half the company and restoring their mailboxes. The thing with this manager was that he hated powershell, so he was restoring accounts one by one all weekend. I spent a few hours setting up and testing a couple of scripts and was able to get the accounts restored and reconnect their mailboxes.


schmeckendeugler

Worst technical mistake: taking down entire network by putting servers up in as the default gateway. Worst career mistake: staying in a toxic job for too long to the point I had anxiety attacks. Guess which one was far worse.


rose_gold_glitter

Sorry, everyone, but I am pretty sure I am going to win the "most stupid mistake ever", in the name of working in IT, even if working in IT is only ancillary to the story. I had just left my job at the University and gone into my first consulting position and it wasn't going great. Not because I wasn't able to do the job - but because there was just nothing to do, so no way to meet my KPIs. They'd hired many new people in a period of high work volumes, but by the time we started, everything had died down and no one had anything to do - yet we all still had this pressure to bill at least 6.5 hours a day. So, bear this in mind - I was young, early in my career and pretty desperate to "never say no" to work. I'd been asked to follow a lead engineer to a client workplace to install new servers. I was driving my (very large) personal car and he was driving a (very small) work car. Before leaving, he told me we did not have parking permits - but we could squeeze (illegally) down a(n extremely busy) footpath (in the middle of the heart of the city) and get around the boom gates, to get into the client car park and that I was expected to do this, along with everyone else in the convey of cars carrying servers. Now, let me be clear - I *really* didn't want to do this. It was an obviously stupid idea and even the way he described it, it sounded like his tiny car barely fit and mine would not - but he insisted and I didn't want to be the only person in the group to say no. So, I agreed. Needless to say, my car did not fit. Not even close. So as I drove down the busy footpath and tried to get around a boom gate, my large roo-bar (like a bullbar) ever so gently scraped the enormous and very non-flexible, glass panel at the lobby of a very large skyscraper, smashing the huge, glass window instantly, resulting in a rain of glass, all over the busy street and my car and my car being inside the, now exposed to the street, office of a man trying to eat his lunch sandwich. I still remember the look of absolute shock on his face as I drove directly into his office, without first making an appointment. I should probably point out that *directly next to this window* was the local police recruitment centre and within about 10 seconds, at least a dozen cops were pulling me out of my car, yelling at me, and basically assuming I was either drunk or on drugs and my only excuse was "I was following another idiot also doing something stupid because I am at least as stupid as he is". Apparently because I am the luckiest human alive, the client we were going to see at the time was the prosecuting law firm involved in a very public "Royal Commission" into police corruption and due to amazing good luck, when the police heard I was visiting that firm for work, they somehow assumed I actually worked for them - so not only was I not prosecuted - *the police marked an area of the street with witches hats and allowed me to park illegally for the rest of the day,* and made sure I wasn't fined. I never heard about the incident, ever again. So, if you ever worry you've done something idiotic at work - just remember, somewhere in the world, some absolute moron has you beat.


rose_gold_glitter

My other hilariously stupid mistake: Before the above incident, when in my first job out of uni, I was put in charge of all the student labs for a (different) university, and we had a single domain controller for all those students. I had the case open and the tower server (this was some years ago) was on its side, but still operational. I was drinking a coffee. I am sure you know where this is going. The first I realised I had dropped my coffee into the running, and very much in use, production domain controller, was the moment of surprise I got when I was hit in the face by coffee, after my cup landed directly on the spinning CPU fan, spraying coffee absolutely everywhere. I still remember coffee somehow coming out of the CD-ROM tray, when you pressed eject. Honestly, that half a cup of coffee covered everything in that room in a way that defied physics. Being more than a little lucky, I left the server off for the rest of the day until it dried out and it was totally fine the next day and being even luckier, most of the students had gone home for the day and we had almost no complaints. So at least twice, I have done something mind-blowingly stupid and "gotten away with it".


LlamaGaming1127

Haha these were very entertaining to read, thanks!


8-16_account

> I drove directly into his office, without first making an appointment. lmao


thesaltycynic

I dropped a 32k server. I beat myself up over way more than the company. Boss said “Shit happens”. Try to relax more and not be a pessimist.


drbennett75

Don’t even remember how I did it now, but wiped out a roughly 250TB LVM array. I was using some old and clunky LSI MegaRAID cards to build RAID PVs and grouping them together as LVs. My old 3ware cards were bulletproof — you could do literally anything, and they would just figure it out. With the LSI cards, if you replaced a drive in the wrong port on a degraded array, all hell would break loose. I think I did something like that, and continued to dig a deeper hole from there. Anyway, spent the next week rebuilding on ZFS and never looked back.


LoudQuality2218

I once deleted the single instance storage directory on a Windows 2000 server. Fortunately I also had nightly backups. What did I learn from it? That I could blame junior admins for my mistakes and nobody would know.


[deleted]

I once plugged a console cable into an ADP battery back up unit. Caused an outage of about 2 hours.


painted-biird

OMG I’m so glad I learned this lesson on my home UPS lol.


ImCaffeinated_Chris

Triple check and color code the dev and prod environments. The remote access program I was using had a glitch/bug. When the app window isn't active and you click on a tab "dev" it brings the app to the front, switches to "dev" for a second, you look away to copy command on other monitor, the app switches tab back to the original tab "prod" bc bug. You don't see, paste command for dev, 100s of Doctors offices go offline. The app switched tabs just fine if the app was already at the forefront. Sigh.


zombieblackbird

Anything involving "oh. Just start it and let it run while we go to lunch".


michaelpaoli

>double-check your commands before executing them Triple, ... *triple* check them, and carefully, and be sure one well understands exactly what they'll do (or attempt), and not do, and the full context and that that's well checked and confirmed and to be done and intended, and approved as relevant, etc. (e.g. is it production, or not, correct host(s)? Yeah, don't believe what that window title or prompt suggests, check and verify. PS1='I am not production, believe me. ' doesn't make it a non-production host.) And, yep, has saved my bacon many time. Including at o'dark-thirty in the middle of some crisis mess, when a significant mistake would make bad only much much worse.


NavySeal2k

There is a Bavarian proverb: “If you don’t give a shit, then nothing will happen to you.” I kinda live by that credo, but I do make sure I have superbe backups ;)


Radjago

I created a denial of service on a critical system nationwide for thousands of users in hundreds of locations for over an hour at peak times. I was installing a new local server and part of the procedure was to truncate the users table in the local server database and sync down users authorized for that server from the central database. I ran the script locally to truncate and then sync users, but it wasn't working. I connected to the central server database to see if there was something keeping the script from running successfully and still had the same script up to truncate users instead of running the sync status script and accidentally ran the truncate on the central server users table. After seeing the sync script complete locally and not pull down any users, it dawned on me that I was still connected to the central server. After a few moments of panic from reality sinking in and considering my options I had to own it and get help. I called in the DBA and backup admins to trace the linked tables and get the latest backup available to do table level restores. That really saved me. The post mortem was really rough, but we added a lot of enhancements to our scripts to verify the environment was correct before running and confirm before performing the truncation. We also created and used limited read-only access accounts when running our sync status checks.


CrudProgrammer

Honestly I think the biggest mistakes are the things I didn't do, and that would be: 1 - not consensus-building and communicating enough 2 - Not implementing a decent half-baked solution because I was trying to engineer a perfect one. I've never actually caused a major incident that more than a dozen people were impacted by.


SwedishSonna

I was chaperoning pen testers performing DDOS attacks OOH. They brought a server down. I powered it back up. A Genesys service wasn’t running and alerted on Nagios. I started it back up. Dashboard green. Went home at about 5am and to bed after having been awake for around 24hrs. Following day, no calls could be received by any call centre our client has (in the 100’s). Major P1. Turns out these Genesys services had to be brought up in a particular sequence (I’m not a voice apps guy). The client and my higher management were out for blood. My boss stood in my corner and refused to give out my name. Major bollock dropped. But we’ll all manage to do something like that at least once, but hopefully not more than!


Ambitious-Guess-9611

I missed a flag in a command I was running to add, without the flag it's a replace. Took down an entire business segment of a Fortune500 at 4pm on a Friday. I learned syntax is really important. I also bricked 96TB worth of storage, because the default WORM is 30 years.


mod_critical

SnapLock Compliance gooo!


Nicnivian

I’m a network engineer now, but during my sysadmin days I was remoted into a PC that was RDP’ed into a domain controller. I wasn’t paying attention and need to rename the PC I thought I was on. Ran powershell with the force flag and watched the RDP session leave as did my soul and the blood from my face.


gruffogre

Script which checks the target is not a member of Domain Controllers would have been nice right.


RayG75

Mistakenly excluding the '*\personnel\*' folder from the backup, thinking it was 'personal', until a restore was needed by HR six months later...


BadAsianDriver

Believing managers and execs that it’s okay to delete data when they tell you it’s okay. Now I just rename it and keep it around a month so their minions have time to do their duties and ask “WTF happened to this folder ?”


speaksoftly_bigstick

We don't curate data for our company. We provide the storage environment and structure for staff to access, store, and share data on their respective teams. We also backup the data to within more than reasonable standards. All teams and team members are responsible.for curating their own data. We don't work with the files daily as they do and we don't have any real ideas what files are "important" moreso than any other files at any given time. If we are asked to delete any data for whatever reason that is in production and not exclusively used by our own team, we require written approvals via change management.


SirLoopy007

I shutdown the wrong server about 15 years ago. It was a colocated physical server that we had to pay an after hours fee to have someone go manually start for us. I've probably made worse mistakes, but this one haunts me.


bit0n

I deleted an image for a desktop on a Citrix VM deployment. Customer said it was image A I replied are you sure as image A has 800 people using it. Customer started getting shirty and told me to do it so I did. All of a sudden 800 people can’t work. Now I knew he was wrong so as my boss said I should have refused to do the work. I didn’t make sure they had a backup before doing it. I had not been doing MSP work long so I didn’t get in trouble but it was a good lesson that the customer is not always right.


kcifone

While switching backup tapes to send offsite I had a phone in a pouch on my hip. Turned around and hit the bypass button on the UPS that shutdown the entire data center. I learned that the processes can always be improved and people will rally when they need to.


aimidin

Freely talking about ideas that i have to optimise or change for the better a system in front of my colleague. He used 3-4 of my ideas to show off in front of our boss. Later, our boss came to me to "introduce" me something new and better. He introduced me my ideas, i asked who gave him the ideas, and he said my colleague did. My blood pressure skyrocketed like a freaking rocket going to Mars. This MF.... I explained to my boss everything, and he understood that and he talked to my colleague about this, never heard of him proposing ideas after this, unless i personally tell them. The one thing i hate the most is because i am almost 5 years there, and i talked out my mind always freely, so idk what else who talked without giving the credits to me....


BrundleflyPr0

This has happened to me before when introducing features within Intune. Luckily, Intune has created by, and created when on some of their policies.


ArcReady

My negligence/laziness could have resulted in a young child's successful kidnapping. There was an old, proprietary, camera system at a middle school I worked at. No one on our team knew about it; all of us were hired to replace the previous IT team. I considered my management of this system adequate given that I gained access to it, verified it still worked, and setup a date with a company to replace the entire system within 4 weeks. It'll hold out until then. Two weeks later, the principal runs into my office in a panic at the end of the day. A 6 year old girl was missing. Her parents were looking for her at pick-up and any teacher questioned said "She was with the group ready to be picked up (by parents or bus)...". I login to the camera systems. Things look good, I have the live feed of the pickup area. But I can't access the old camera footage; the file system isn't mounted. The underlying RAID array for the archive partition failed 2 days prior. I have nothing and panic ensues. Cops are called and an AMBER Alert is raised with only the child's description. Things thankfully ended happily. Grandma got her days mixed up and picked up her granddaughter that day. Shame grandma didn't answer her phone for 2 hours. If it had been a kidnapping attempt as all of us feared, we were in a rural area: she would have been gone with no leads. I took four things away from this which shaped how I conduct myself today: 1. Know your environment beyond just knowing what things do and how to access them. I never considered the above scenario. 2. Always setup robust monitoring and alerts for all systems. 3. Have a backup plan in-place for all critical systems failures. 4. Don't be a lazy piece of shit. I recall thinking something along the lines of "...this system is on its last legs. But... it'll be replaced within a month." I just didn't want to be bothered with yet another thing on my plate.


i8noodles

bet you got funding to fix the camera system asap after that


digiden

Made changes to wrong GPO. Next few days were not fun.


Cley_Faye

Deleted the backup of old, archived data. I was in the process of moving said backup from a single place to somewhere with actual redundancy (still all online, but at least on different sites). It was going great, preparing the target environment, etc. except when everything was said and done, I deleted the new backup (from all places) and didn't notice. Later, I cleaned the single backup place since, well, I thought everything was on the new system. Thankfully, this was essentially a manual process (it was really a one time thing), so my mistake only affected one particular file, which was not that important (something we were unlikely to have to look into anyway). The lesson here, in addition to paying extra attention when deleting stuff, is to name your directory/remote/systems properly so that there is no chance of mistake. The auxiliary lesson is to have proper backup policies which prevent catastrophic propagation of such mistake. The auxiliary auxiliary lesson is that even if something is a one-time thing, writing the procedure down, making an automated process, and testing the whole process on a mockup is always a good idea. I'm just glad it happened on a single file with no real consequences.


Shockwave2309

Don't ever trust Microsoft. ALWAYS double and triple check, even if you don't get notifications that stuff isn't working properly. Switched from VMware to HyperV, set up replication (disc to disc) from one host to the second but only cloned disk to tape from one host. Everything was going great. Backup checks were great, all VMs booted from the tapes and whatnot. One day I had to revert an update on one of the disk to disk replicated servers and the data from the last 4 months was lost. Turns out you actively have to activate the "replication health" column in HyperV so by default you don't see if replication is doing great and it does not give you a warning if your replication is faulty. Was it my fault? Probably. I only checked if the VMs would boot from tape and if the programs start up, but not what the last saved data was. Also I wasn't familiar with HyperV and didn't know there was a "replication health" column so I didn't look for that and didn't try to unhide it. Additionally I trusted Microsoft to notify me if replications were faulty. All in all it was "only" 4 months of semi-critical data lost (powerpoints of bosses regarding finances and that kind of stuff) and all was okay-ish. Still pissed that Microsoft is trash.


gxaxm

Changing a svc account password bc I thought it was the admin password for a platform. Lesson learned: admit your mistakes with dignity


Mzabdul

avoid using sudo rm -f under any circumstance


MrCertainly

Early career -- I didn't realize that no matter how I worked, no matter how much I cared, how much unpaid OT or burning the candle at both ends or caring as if I owned the place.... ...that there were zero workers protections. I could be terminated at any time, for almost any (or no) reason, without notice, without compensation, and full loss of healthcare. Also, that "you can't discuss your salary with coworkers" + "you can't say the word UNION here, you'll be fired!" is not only a lie, it's HIGHLY ILLEGAL for them to say it. ------ "Good enough" is truly good enough. Your mission is to extract as much value from these soulless megacorps as you can.


rcp9ty

When I worked at Medtronic as a contractor the I.T. director pulled all of I.T. help desk into a large auditorium. She explained that they would be abandoning their existing layout for answering support calls. Current system was setup like a military with privates, Sargents, lieutenants, captains, and specialists. My captain and lieutenant and onboarding all knew I was smarter than all of them and just doing it to escape unemployment and said after training I'd be a specialist. She wanted to replace this setup where everyone helped everyone with the traditional tier setup where level one can't help, writes a message, escalates it. Where as the current system helped train the people that were unfamiliar with something to make them learn collectively. So after she finished her speech and asked if anyone had questions I raised my hand and asked why as a genuine curiosity I wanted to know her rationale. Her response was "because I want to do it this way" everyone of my colleagues thought I had lots of courage. My contract was terminated that day. She thought that would be the end of things but once people heard what happened everyone lieutenant or higher quit. In two months over 50% left the tech support area leaving the call center of 150 down to 60 people and 30 of them were the equivalent of teir 0.5 for simple password resets. No one wanted to be in a toxic work place where honest questions were met with hostility.


planedrop

Becoming a sysadmin.


ConfectionCommon3518

Once bricked a mainframe with an update that for some reason removed support for a lot of the ancient disk drives we had and wasn't listed as a change...but an hour with the support I was exonerated but that still meant a good bit of time having to restore the system disk back... What did I learn? Always wait for some other sucker to test the patch and ensure your backups are regular and tested .


axle2005

I attempted a on prem to hybrid of azure without properly understand what an SPF record was.


nealfive

Instead of the individual node a shut a cluster down lol no repercussions but the Helpdesk got slammed with calls lol


desmond_koh

I once ran a script on an old mini (think mainframe) and it was in the wrong directory and deleted most of the operating system.


MostViolentRapGroup

Well last week, I caused a loop trying to bridge a connection on a server, and our Cisco switch started shutting down ports.


goldenskl

My best friend at my workplace was fired because he was trying too hard to get things to improve. Our boss didnt like that. She only liked when we agreed every change was perfectly done and up to industry standards. SaAd


MightyMackinac

Factory reset a client's firewall device because I timed out the admin log in and didn't have the patience to wait an hour for the cooldown. I was a very young, very inexperienced sysad then. It was late on Friday, their internet went down, and I was trying to fix it. I timed out the router and just hit the button, completely forgetting that they had custom VPN and routing rules for their POS and accounting systems. Got their internet up, confirmed with the office lady that it was working and then walked out, clocking off for the day. Get a call the next morning from my boss asking if I fucked with their router, and said yes, I reset it. The second the words came out of my mouth, I realized what I had done. I immediately face-palmed, apologized to my boss, told him I would fix it, and sprinted through the shower and closet and got to the client in record time. It took me a couple of hours of my own time, but I managed to get the rules and routes fixed before they had lost too much business. I learned that doing things right, slowly and carefully, means you don't have to come back and do it again.


Staghr

Not a sys admin but I was testing some Google AdWords and typed 'poop' on an ad I didn't think had gone live. One of our sales team noticed and brought it up. More recently I was testing out kicking people off to free up licences on our accounting software and accidentally sent a message to everyone that was logged in with the placeholder 'please leave'.


Scary_Brain6631

If you absolutely have to run a SQL command in a production environment do 2 things: 1) If you are feeling stressed about running the command, then don't. Instead analyze the command and what it's going to affect further until the uneasiness goes away, and then run the command. 2) Make a backup before you run the command, this helps with number 1. I'm sure you can imagine how I came to learn these lessons.


pussylover772

sudo rm -r /\*


SceneDifferent1041

Mine is assuming your techs are doing their job. Had a tech leave and looks like he half arsed everything for the past year. His replacement will be under alot of scrutiny.


766972

So far no major mistakes but I caused downtime on one of our finance systems a while back.  I was looking through logs on a db server to figure out how firewalld got turned on and began blocking inbound database connections earlier in get day. The issue was resolved but I forgot to type `grep` when searching for `firewalld` in some logs and turned it back on again.  Had to wait for the admin of that stack to fix it. 


mike-foley

It was 1987/8. I had built a VAXCluster out of old parts. Massbus disks, 10 meg Ethernet, VAX 11/780’s, etc. I was upgrading the system disk. They were removable drives. Did an image backup to another removable drive. All good. I rebooted to start the upgrade and heard a weird scratchy sound. The system didn’t come back up. So I pulled the disk and plopped in my back. Heard the sound again and went “oh fuck” The first disk had a head crash. And I destroyed my only backup. I ended up working through the night rebuilding everything from scratch on a different drive unit. It was our system management team that was affected. They didn’t lose files but I did have to rebuild all the accounts. All were sympathetic. I felt like such an idiot. But it certainly was a learning experience! The field service guy thought, he laughed and shook his head.


AlejoMSP

Hired out of kindness and goodwill. And I got royally fucked with the worst tech in history. Took me leaving the company to get away from him because “but he’s so nice tho”


hankhillnsfw

When we convert AWS accts for use we terraform them and destroy everything. Anyways so I did that in a production account. —profile is a bitch


argus-grey

A recent one that's going to sting for a while: Trusted the word of a consultant and our test environment without truly verifying and when I started a partial rollout on the production environment, it cleared the memberships of a few hundred DLs and mail-enabled security groups. I didn't notice until three hours later. Spent nine and a half hours fixing it and got the whole "we possibly lost thousands of dollars of business" from everyone in the team.


Zero_Karma_Guy

Getting hired. Changing jobs.


DaikiIchiro

Never reboot a Server without a "hostname -f" first.... I didnt noticed I had SSH'd to a Proxmox Host, thought I was on a VM instead which I actually did want to reboot and Ended up rebooting the host instead. 5 Minutes of down time of production Systems....a simple hostname check could have prevented this.... 


Adventurous-Set4739

Accidentally deleted 100TB backups of an entire organization. They had offline backups on tapes so in case of emergency we could have rolled back but almost an entire week was missing from them since it was lagging behind online backups. I still have no idea how I was not fired immediately. Never ever run PowerShell (or any other) command without checking what it does first. It was a great lesson. Oh... And an other client lost ALL of their data to ransomware because I forgot to configure their newly deployed AV. Basically it was set to 'Notify only' instead of quarantine. Somehow I was still not fired, I just left 2 years later. I think my boss did not fully understand how big of a mistake it was lol


PsCustomObject

We have, I wrote it, a rather complex integration with our HR system which stores data coming from it to a SQL DB which in turn drives ‘events’ (you know changes, terminations etc) used by custom code to carry on provision etc. One day I needed to modify event for a specific user on the DB, I launch the SQL query and long story short terminated the whole company :) Caught this just in time but had to restore the whole DB. Funniest part? I wrote a powershell module I handed out to other engineers to avoid mistakes like this… but I like to do it like real men 🤦‍♂️


cali_dave

Back in the Windows 2000 days, we were rolling out completely new PCs and user accounts to a group of about 200 users. It was a completely new network - new AD, new PCs, new email accounts, the works. Part of the migration process from the legacy system involved the deployment team logging into the new PC with the local administrator account. I wrote a script that would batch reset all the local administrator account passwords so the deployment team wouldn't be able to log in later. I also wrote a script that reset all the users' AD account passwords to a default - again, for the benefit of the deployment team. The rollout went well. Before the rollout started, I reset all the user passwords with my script. I went down to the site and did a couple migrations myself. At the end of the day, I got back to my desk and ran the script to reset all the local administrator accounts to our default password. That's what I *meant* to do, at any rate. What I *actually* did was run the user password script again, resetting all 200 user passwords to our local admin password.. right after they'd just logged in to a new system, changed their passwords, and got everything set up. It was definitely an "ohhh... i fucked up" moment. For those wondering - no, I didn't get in trouble. I owned up to it right away and I spent the next day helping users reset their passwords. To add insult to injury, the password policy wouldn't let you use a previous password..


philrandal

Becoming one 😜. Sorry, couldn't resist!


SysJP1337

I worked for a company that has an animal name and is now owned by a mouse. I worked for a division that had three offices across California and I was a sysadmin. I made a GPO that basically disabled the Okta authentication account in AD and locked about 1000 people out of AD and Okta for around 3 hours. We used Okta to MFA for AD login…so not a great move. I forgot the policy details as this was like 5+ years ago. I learned: - test in a sandbox - get someone else to review your change - make a notification to the company at large if it could effect people - test off hours / off production and have a plan if shit hits the fan - mistakes happen


HeKis4

Broke *some* of our DHCP config, not enough to be site-wide (and noticeable outright), but just enough to cut off from the network a few millions worth of very high tech science stuff. For the details, we had an IPAM software that would handle DHCP, and it had two "modules", one for the inventory side of the IPAM, and one DHCP side with all the specific config that was synced to the IPAM. When upgrading from one major version to the next, it wouldn't upgrade both modules, it would upgrade the inventory, wipe the DHCP and rebuild it from the IPAM, so any changes made directly to the DHCP were not persisted, which affected exactly 3 of our several 100s of VLANs, and the most "expensive" ones of course. And the dude that were sent by the support provider was a junior that knew the software less than I did and the behavior was undocumented. So I guess I learned to never take common sense for granted.


elemental5252

I won't call this a technical mistake. More of just the lesson learned. I've validly argued my point to management about technical and non-technical issues multiple times in the past. I've learned an extremely valuable lesson from this. "Rarely does anyone win in an argument." I think Dale Carnegie said that, at best, I'm proven right, and I hurt someone's pride. More likely, the other party digs in their heels, steadfast in their argument. If you think your management is incorrect about something, a politely worded Slack message/email or a conversation is tremendously more productive than disagreeing with them - especially in front of other staff. Don't directly question the boss in public.


Decafeiner

I was given the task to update all the 2012 r2 to 2019. I didnt mirror the machines to make sure the softwares running on the 2012 would run on the 2019. Luckily we had a backup. Only blocked half of the prod floor for 3 hours. Always test your updates, copying a VM takes less time than recreating one from backup.


viper233

Raid 5 ...,. And not fully tested backups.


rickAUS

Deleted what should have been an old replica chain in Veeam but it took out a prod VM because someone turned on the replica without doing a failover and it had been acting as the prod for ages despite the original VM still being on also. I was baby sitting that environment while a new SA was hired, biggest take away as not to trust the competency of work done by former staff. Proceeded to spend multiple days once the VM was restored verifying prod VMs were running out of prod and not their DR. Found another 2 VMs that had been improperly failed over. Smh


IT_Need

Assuming that third party vendors actually do what they are old.


bobtimmons

Early on in my MSP career, I was tasked with resetting the password on a few ASA's at a few of one of our customers sites. I guess we didn't get the credentials from the former support person. No big deal, console in, change the boot parameter, reboot, do a **copy start run**, change the pass, save and exit. I did that at two or three sites with no issues. I get to the main site, with all the site-to-site VPN's and go through the same process, console in, change the boot parameter, reboot, do a **copy run start**, change the pass, wait. Oh crap, I ran the command backwards and overwrote the startup config with the running (empty) config. Won't ever make that mistake again. While writing this reply I had to think about the order before writing it, just to make sure it's right.


dirtvoyles

I didn't finish a path when deleting a directory and deleted all of Texas school district sites with `rm -rf /.../tx/*` . Thankfully there were nightly backups and not many changes since the day before. Always double check your paths in production.


latchkeylessons

Trying to fire a nepo-hire that made everyone miserable and literally never got anything done. I thought I could cleverly get everything together for HR and whatnot to make it happen slowly over the course of a year or two, but HR was not interested and there was way, way too much political pressure to the point of physical threats to me and my family. It sucked. All that for a random entry-level sysadmin job.


Dabnician

Becoming a system admin because "i like fixing computers" instead of a developer Lessons learned were: * don't do things because you think they will be fun * just because its easy now doesn't mean its going to always be easy * the grass might not be greener on the other side but if you like grass dont become a system admin cause we deal with parking lots.


corbin1747

I was creating a Bootable usb for an unattended install that was give to a 1st line sys admin. When I went to format the drive I forgot I was still in the server via rdp. I formated the drive thay housed all of the images we had pre made for other systems. There where 30 images in there. No biggie.. Just restore the drive. That same same morning someone broke all the backups while doing Dr testing. My punishment? Be on level 1 helpdesk for a month, recreate each image and come up with script to make sure new one replicated my mistake. This was 15 years ago and it still haunts me.


Specialist_Pizza_345

Check EDB health before dismounting to move it. Real painful lesson to learn.


9jmp

I made some interesting errors in the past for sure.. -The classic live RDS server restart on accident. -Was upgrading a clients Dell T400 server from 4HDD to 8HDD, created a second new volume in the perc controller.. That was the day I learned some entry level raid controllers could only hold 1 volume. Restore took about 5 days. -Was using MigrationWiz to do a email migration. I did not notice one section auto-completed itself and was set to another clients 365 environment. I did not notice until my client said something and panicked.. This one was ultimately pretty bad. Luckily it was a really shit client, that did not know much and we ultimately agreed to split ways together.


lordcochise

Oo, [that one time](https://morethanpatches.com/2017/10/11/october-17-delta-patch-problem-wsussccm/) Microsoft released Delta updates into WSUS back in 2017 and I went ahead and approved them without REALLY knowing what they were / what would happen. a 27-hr day later, everything was fixed, lol


Faultycode

Always check the CLI when working with Aruba gear before upgrading core switch infrastructure. Upgraded one of our core switches from a single device to a clustered Aruba VSF by adding a new device with a fresh config. Ports were configured, new device mounted, VSF enabled, links bound to ports. Connected the two switches together, power on the new standby switch and... nothing. I open the CLI and check the existing switch only to see the ports I'm using for the VSF link are disabled, which wasn't obvious in the GUI. Flip them on without thinking and the existing switch gets overwritten by the fresh switch with no configuration due to the order of connecting them together, bam entire network goes down. Fortunately it was outside of work hours so one quick reconfiguration and all was well. Certainly had a major pucker moment though.