Three Sysadmin Rules You Can’t (And Shouldn’t) Break (2024)

Comments on this entry are closed.

Felix FrankJuly 27, 2010, 2:47 am

Good one, I like the tone and agree to most of what’s said.

Using the command line is advisable, even though one should be aware of the advantages GUIs may or may not have (and if they have any, leverage them!). The really strong suit of the CLI arises from rule 3 – automation is possible, powerful and makes your tasks safer, as the margin for human error shrinks.

∞

Mike HallJuly 27, 2010, 3:31 am

The “Friday” rule.
Don’t schedule an outage for the last day of your work week.
– Use the last day of work week to do / review preventitive things to keep you from needing to be contacted during your time off.
– Most of the time “Murphy’s Law” seems to intervene on things scheduled for last work day that: have you departing for day later than originally planned, and failing to explain issues with outage to your co-workers. This combination normally leads to calls at home (or calls to return to work) on you days off.

∞

Prasad ParolekarJuly 27, 2010, 4:15 am

Really nice article.
and like the first line of 3rd point
“Lazy sysadmin is the best sysadmin!!!!!!!!!!”

∞

Greg RicksonJuly 27, 2010, 4:51 am

There is ALWAYS a better way of doing something … You just haven’t found it yet !!

∞

RichJuly 27, 2010, 5:37 am

Rule #2, right after perform backups: Document everything. All your system configs, all your system inter-relationships, all your processes. This, in effect is an extension of rule 1: backup everything. In the case of disaster recover, backing up data is only useful if you can recreate the system environment the data was running on. Also, you are backing up a another critical resource: you! If you were hit by a bus, could someone pick up where you left off?

∞

PhilJuly 27, 2010, 5:52 am

Never change the root shell, unless you have an alternative root account set up!

∞

Francisco FiestaJuly 27, 2010, 5:53 am

I see that’s more or less the way I was following. One question: How can the back up information be validated? Which method, script or programm?

Thanks.

∞

SlavkoJuly 27, 2010, 6:39 am

using shell is very close to automatization taks, because shell comands are more simple to automatizate than mouse clicks 😀

∞

CotamayorJuly 27, 2010, 7:48 am

I Agree with Francisco, how to validate data, I am new on sysadmin and still learning, so a good pointer would be greatly appreciated

∞

komradebobJuly 27, 2010, 9:03 am

The simplest solution to validate data is to restore from your backup media and compare that to the existing data. Most simplistic is to run a ‘sum’ on the files to compare. If the data is more dynamic, run a sum on the files in the backup before they get backed up and include that in the backup. Restore to a different directory and check it against what is restored.

To generate a checksum file:

find . -type f -print -exec sum ‘{}’ \; > checksumfile

then back it up.

Restore it someplace, run the same command (but put the output in a different file!) and diff the two checksum files.

∞

carlosJuly 27, 2010, 9:37 am

well, …. I think the author did not write “validate the data”, but validate the backup…

kind of confusing, ha…

In my opinion, one sysadmin must validate the media now and then, mostly periodically.

For example, if you have a backup of a particular aplication typically in tape, or maybe you can hire an on line service, it depends on your budget and/or your needs.

You can restore this backup to another server, now that prizes are dropping, or if you upgrade your server, you can have an “old” server only for this purpose.

Use wisely the checksum methods, md5sum, hash, cksum, … whatever that fits your needs, I am saying, make a checksum before the backup and another checksum in the new location. and compare.

Well, surely there are a lot of techniques to accomplish this.

At a glance this is only my “two cents”.

∞

wintermootJuly 27, 2010, 2:41 pm

You did it wrong, 1st rule is RTFM.

∞

BAJuly 27, 2010, 9:47 pm

Backups (and disaster recovery plans) are rarely shown the respect they deserve. A failed backup could literally put your company out of business yet often they are handled by junior sysadmins with no supervision. I always tell new sysadmins they will never truly know the importance of backups until they have to look a user in the face while telling them their data is gone forever. Gone because you didn’t do your job properly.

∞

HamiltonJuly 27, 2010, 9:55 pm

Very niiiice article, i can say that is one of best that i read here in thegeeksttuff. I’ll take notes and leave them very close to my PC, im not yet an administrator but that’s not reason for not apply those rules to my habits right now, especially rules #2 and #3

∞

KuricJuly 28, 2010, 1:09 am

2 things… validating the backup media is very important.. I know a sysadmin for a large hotel in a well known tourist area on the east coast, who had been running backups for years and after a hurricane hit needed to do a restore only to find out that the tape drive had a bad head in it and none of the backups ever had anything written to them…

Also along the backup line is before making ANY changes to configuration files Backup the current config so that you can revert back and start over if the change you make has an unexpected consequence.

Very good article …
-Mike

∞

ParanoicsterJuly 28, 2010, 1:17 pm

Rule #4

merge into #3 – Get a date and let the computer working

∞

Francisco FiestaJuly 28, 2010, 2:33 pm

Thanks a lot to all for your answers to my question. Very usefull.
I’m not really a sysadmin but at home I also have important information to keep safe and systems I wouldn’t like to reinstall from scratch again and again. So, I take seriously backing up, being sure that my backups will work when I’d need them and I find also important to delegate many repetitive tasks as possible to scripts or programs as I can. Trying to become lazy involves learning scripting, more linux shell and learning in general. So, I suppose laziness is the prize to knowledge.

∞

MikeFMJuly 29, 2010, 12:06 am

Redundant backups are important too. Expect that when your system fails your primary backup device will fail also. Unfortunately I’ve learned this the hard way. It’s best if your secondary backup is kept off site so if something happens that just flattens your entire block then you can drop the backup on a spare machine somewhere and be the hero when the company continues with minimal downtime instead of being out of business.

Our servers mostly are virtualized now so I even make backups of the entire VM and make backups on the OS level – if one backup stops functioning as expected for some reason then I have an alternate.

Employers tend to give admins crap for spending so much time and money on backups but when the sh*t hits the fan they are much happier that you were prepared.

∞

gus3July 29, 2010, 12:08 am

Mike Hall has it wrong. The admins’ needs are totally disjoint from the end users’ needs. The end users needs a reliable system; the admins need a system that isn’t unreliable. The end users need a system that won’t fail; the admins need to make sure that the system won’t fail, and so need the time to test the system in ways that might make the system fail.

I’ve been an admin, and I know that sometimes, the weekend is the best time for an outage. If a system will fail after an update, the weekend gives the most time to recover from the failure.

∞

MichaelJuly 29, 2010, 2:15 am

Rule #4: Chaos theory (“butterfly effect”) is for real.
…or: If it works don’t change it!

As sysadmin’s we all know that even simple tasks such as unplugging a network printer will lead to an unpredictable series of events which eventually will end in disasters such as email servers not working.

Even though the two things are not connected in any way, disasters may pop up 😉

∞

RedRyderJuly 29, 2010, 6:44 am

One good reasoning for rule #2 is when you have to administer a server a couple of time zones away and all your network traffic is goes through headquarters in the next state over the difference between command line and GUI(xdmcp, vnc, etc) can be hours of downtime for the customer.

Also on the topic of backup, at home least, I find the best option is to store everything on my networked RAID drives. Every so often I just change out one of the drives using my spare and store it in the fireproof safe.

∞

Andres ArenasJuly 29, 2010, 6:46 am

I found the backup strategy one of the most challenging tasks of all. It is not a simple as back up everything, or you will end up backing up all your users music, family and party pictures, and tons of crap. I agree that the most important part of the backup process is to test if you can effectively restore the data, in the end that is the purpose of backing up.

I would recommend to check your plan taking into account the purpose of the backup:
[1] Disaster Recovery
[2] Archive or long term preservation of data.
The first strategy has the purpose of saving the most current data to get your systems up and running as soon as possible with the minimum data loss. Usually you don’t old data for this purpose.
The second strategy is more complicated. Should you preserve all versions of your files?, for how long?, what data needs to be preserved and what data can be ignored (for preservation purposes).

A final note, specially for archival purposes it is important to backup in a tool/format that you can use in the future. Try to use standard tools and test if you can still restore old data with your new shiny tape drive or backup software.

∞

Go2DougJuly 29, 2010, 7:29 am

I have to disagree partially with rule #2, “master the command line”.

“Mastering” the command line implies that one should know by heart nearly all the command line commands and their associated options. Ever seen the book Linux in a Nutshell? There’s no way that somebody could memorize even half the commands in that book. Instead of using the command line for each and every task, I would advise learning it by heart for more common tasks.

By the way, in the *Nix world shouldn’t you be referring to the “shell prompt”? “Command line” is Windows jargon, isn’t it?

∞

Keith Edward BrownJuly 29, 2010, 8:34 am

Never, ever deploy version 1.0, or for that matter a brand spanking new product version that is significant to your daily operations until service packed (OS, backup, database, email server, etc.). Let the earlier adopters toil and suffer. Case in point, an IT services org deployed Exchange 2010 two weeks after it’s release. 4 weeks later there are still ongoing problems, including the pres/CEO not being able to open attachments on emails more recent than 6/22/10. Besides a product that is surely filled with bugs, the services org had only a few weeks of newsgroup postings for the sake of deploying and remediating this new service. And then consider how monumentally poor Backup Exec 12.x was for backing up Exchange 2007. Is there any reason to believe Backup Exec 2010 will be any better?

∞

Ron S.July 29, 2010, 9:54 am

Thou Shalt Not Maketh System Changes on Fridays
(Unless thou wishest to be work weekends)

∞

KenAugust 1, 2010, 9:44 pm

Just because you use the GUI doesn’t mean you aren’t comfortable with the cli.

∞

MarcoAugust 9, 2010, 7:47 am

I could add:
*Practice any change you’ll do on a not critical environment before try the change on production environment

On Rule #3 I could add to the description, that notifications are essential for ensuring the availability of the process

∞

satheeshAugust 11, 2010, 6:05 am

How can we automate the regular tasks. Is this done by writing scripts or any other means. if any one can explain it would be more help full to me and others those who are new in this field and want advices like this.
thank you
satheesh

∞

DougDecember 26, 2010, 6:46 pm

Restoring from backup should be a last resort. Yes, you should take backups, but recovery scenarios should avoid the need for them when at all possible. Applications and configuration should be deployed in a repeatable fashion, such that you can start from a a bare piece of hardware and know exactly how long it’ll take to rebuild the system exactly as it was before.

∞

OlaFebruary 10, 2011, 3:53 am

RE Friday rule: Always plan outages before Wednesday. You then have atleast 3 days for corrections of whatever can/will happen.

∞

skapeFebruary 25, 2011, 11:50 am

I agree with the rules, the third one is my very nature, I admit I’m lazy! Let the machine do that returning taks that are boring, and i’m certailnly messing it up because I’m human.

but there is a 4th rule that I hate but I’ve suffered because someelse didn’t followed it.

Document everything! You may be a single point of failure in the system. The people that step in to keep things going will have a hard time figuring everything out.

∞

See Also

How do I log on as an administrator?

RoyMay 6, 2011, 12:50 pm

Agree with 1 and 2.
automation not only makes you lazy but it also make you forget the less used procedures/commands.

Also, I would add a 4th one – Documentation. A good sysadmin will document everything so that another sysadmin (or the management) can understand what the previous admin has done.

∞

vikas maskeMay 12, 2011, 11:38 pm

This is really really nice information.

∞

Doug MeierMay 19, 2011, 9:20 am

lazy sys admin is great IF lazy sys admin documents his automated processes and trains junior lazy sys admin to lazily manage them. otherwise lazy sys admin becomes a point of failure for the organization.

∞

ErikJune 12, 2011, 2:57 pm

hi,

yes what happend to the “only be root when it is absolutely necessary” basic but important one…

∞

srinivasJune 17, 2011, 5:24 am

Hi Nataraj

Rules are very good . I don’t think there is an article on these three by you . It would be great if u can provide it as i am new to this Linux domain

∞

SolariaJune 25, 2011, 9:57 am

Rule #2: GUI’s are great, but what if the GUI isn’t available? As sysadm you will be called on to fix the system when major parts have failed… your only interface may be the CLI while you make repairs. The rule is “Always have an alternate path into the machine”. If you can’t connect to Xwindows, maybe you can SSH or RSH from another server, or use the Remote System Console (ILO, RSC, etc) through a remote KVM or serial switch. I’ve had to restart system processes using an SSH client from my cell phone, while parked in a supermarket parking lot. The CLI is always available, more reliable, faster, easier to document, etc… Sort of like using the “vi” editor: other editors may have an easier UI, but vi is always available.

For Windows, I install Cygwin SSHD on each server. Not quite the same as Linux, but much of the admin-ing can be done from the CLI, and automated.

∞

PatOctober 23, 2011, 4:13 pm

Here’s my own rules.
1.) Always check the logs.
2.) Google is your friend.
3.) Think twice before pressing enter.
4.) Don’t try to fix things if aren’t broken. (Most if not all always try to do things in their server and eventually end up messing it.)

∞

Hellmut WeberJanuary 2, 2012, 5:52 pm

IMHO an equally important rule for the sysadmin is:

Document every change you make on your system!

Clearly for every change made the reason for the change should be mentioned, and a reference to test results for the change should be given.
Probably a hint how to revoke the chance could be helpful, IF this is possible.

Best regards

Hellmut

∞

George ReimerFebruary 17, 2012, 11:49 am

NEVER, EVER, make a significant change to anything on a Friday afternoon!!

∞

DamirMarch 19, 2012, 5:50 pm

I wonder from where this prejudice about windows sysadmins and cli comes from?
Anyway, it’s all bunch of bullsh*t. Any serious (corporate) windows sysadmin knows his cmd/powershell, not to menition WSH scripting.
Setting up windows xp and iis over weekend does not make one windows sysadmin…

∞

IanApril 12, 2012, 5:07 am

I am also a new sysadmin now,Im handling manufacturing servers,but not totally mastered the whole system.I agree in all of the rules that i have heared here
and I will do this in practice.Now I do some scripts for automation of my common tasks,and also planning for a backup system.I will take a training this coming 3rd Week of this month(Storage Management),and this will be a big part of my plan to make a reliable backup system.I want to hear more advice hear,on how to become
a lazy sysadmin,that all administrative tasks was automated and running behind.

∞

MikeApril 18, 2012, 12:36 pm

Always “cover your ….”. Remember that all customers lie or leave things out when you’re trying to troubleshoot problems. Test, test and then test again!

∞

karthikreddyMay 5, 2012, 12:05 am

super .

∞

Karl SandfortMay 10, 2012, 4:43 pm

Just found your site and I like it very much. You present your knowledge very well. Thank you for sharing it with us.

∞

DigambarJune 7, 2012, 5:13 am

Excellent Rules of Sysadmin.

∞

PradeepsinghJuly 20, 2012, 4:34 am

Really nice article. i will share the same on my office notice board.
Good Luck.!

∞

AnonymousJuly 24, 2012, 11:50 pm

Loved this article!!!

∞

Rich BryantJuly 28, 2012, 4:28 pm

One of my basic rules as a longtime *IX admin is to (almost) never delete a file directly. Instead I rename it and set an ‘at’ job to delete it some time in the future. This can avoid greatly reduce you need to restore files from backup. It’s not uncommon for me to let a file sit around for a month before it finally gets delete.

∞

Justin P mathewSeptember 4, 2012, 11:24 pm

Thank you,
A simple and strong article …

∞

skonealoneSeptember 7, 2012, 6:19 pm

The FOURTH RULE and important rule to consider:

Rule #4: Setup a monitoring system (based on the threshold you tune it) to send period alerts about the system health(Disk, memory, swap, directory level space utilization). it will always helps to know about the production system health, to avoid any forthcoming or future disaster.

– SK (Shekhar Koli)

∞

Khider AllosOctober 12, 2012, 2:58 pm

#4 Ensure that automated tasks scripts have email notification to Data center Operators/Help Desk and to SAs cell phones when things go wrong.

Thanks for sharing your knowlege.

Khider Allos
Software Engineer

∞

ali frazJanuary 2, 2013, 11:26 pm

sincere guidance …… the whole truth….love u

∞

Jack in TNJanuary 8, 2013, 12:17 pm

Document everything, even if it is in your ‘personal notes’ spiral notebook.
— It just makes life changing things more reproducible. It doesn’t have to be ready for ‘publication’ either. … it just MUST communicate to the author.

Get another set of eyes on the problem if it is taking enough time that it is on your critical path.

Document your ‘upgrade senario/process’ before doing it. Print it out to use during the ‘fire fight’. Then use that as a guideline, and write times by when you start/stop each step (so you can see if you are keeping on track). Write notes on the printout of any deviations (so you can explain it to yourself later). … These are NOT notes for your boss, they are YOURS.

I have been through several upgrades over weekends, with sleepless nights. They help get through the Tuesday morning ‘post mortem’ (I reserve the Mondays for recovering from disaster that seldom happen, and getting sleep. Bosses can wait, I am more interested in keeping production going that reports.)

∞

SrinuJanuary 12, 2013, 1:38 pm

Very Nice and Keep up the good work,

∞

BenhankersonAugust 1, 2013, 8:58 am

What sir, are the other 4 rules?

∞

RameezSeptember 28, 2013, 4:19 am

I like the last most RULE…very innovative…….I expect more linux based articles from u……thank u

∞

chandanNovember 29, 2013, 4:00 am

Hi Ramesh,

Your stuffs are very informative and valuable. I frequently visit the website. Regarding the backup. We have monthly backup our files to the iOmega storage. The storage is mounted via NFS in the Linux server. We do an incremental backup using rsync command. Yes, we never bother to verify the backed up stuffs. I just realized (after reading this article) that it is an important aspect.

The comments suggest the checksum one of the methods. As we do the incremental backup can any one suggest me, how can the backup be validated.

Nice article. Thanks for sharing.

∞

ravitejaFebruary 3, 2014, 9:11 pm

How to add disk/lun from storage in linux

∞

HansOctober 31, 2014, 9:19 am

The “lazy sysadmin” is a nice one. However with Linux utterly reliable, you should not forget things you automated. At the very least you should have those taks send you e-mails so you remember what automated task you run.

The most horrible example is this: while performing a kernel upgrade on a server I built a script with the instruction: “if you can’t ping server X, the network did not survive the new kernel, so reboot in the old kernel”. I put that script in cron to be executed every 10 minutes. But of course after the successful kernel upgrade I forgot to deactivate the script.

Three years later this server suddenly started to reboot every 10 minutes. What the F*? Well, I decommissioned server X, hence it was not pingable. When I finally discovered the culprit I could not even remember I ever wrote the script.

Had my script sent me an e-mail, it might have occurred to me sooner I had some loose ends dangling somewhere.

∞

BrianJanuary 2, 2015, 6:01 pm

#4 – NEVER give out Admin level passwords to anyone not directly responsible for the environment.
#5 – Avoid any “shared user” account creation like the plague.
#6 – If they can’t spell “sudo”, they don’t get Admin privileges.

∞

Vipul JainMay 13, 2015, 3:59 am

Very nice article ! Lessons for budding system admins like me ! Thanks for sharing.

∞

Joel AndrewsJune 26, 2015, 2:51 pm

Those are great axioms to live your life as a sys admin by. I would, however, like to add one for thought. This one has served me well over the years, “always have a plan”. I know it may sound somewhat basic, and in a way it is, but it is one of those basic tenants that will save your preverbal derriere.

∞

Farrell O'ConnorSeptember 30, 2015, 3:10 pm

Aside from the 3 rules given, 2 more important rules, don’t walk around with a round in the chamber…..become root only when necessary! I have seen sysadmins have system backups where the backups are bombproof, but they may need y\to consult the application users ( turf wars) what data needs to be backed up, you can rebuild an OS and you can re-install applications, but if you lose data, you could well be unemployed!

∞

Lord RybecFebruary 3, 2017, 1:39 pm

Someone suggested that automation is bad, because then you forget rarely used commands. This is wrong on several levels. First, if you are only automating things you do frequently (it is a waste of time to automate things you do rarely), your automation will never contain rarely used commands. Second, you will forget rarely used commands anyway, because you rarely use them!

∞

AbuchuMarch 1, 2017, 1:38 pm

I really like your page. and am begginer network admin in Insa ethiopia. I need your help to know more on this field of study. Thank u guys alot.

∞