Usenix 2008 - Automated System Management, by Æleen Frisch of Exponential Consulting (and numerous books)

What is automation?

generic [perl|shell] scripts with cron,at

Problem: overlap of effort

So folks developed automation systems. General automation tools are around:

cfengine, puppet, cfg2

These are general — files, directories, etc. Don’t need to use chmod and chown and underlying commands.

However, they don’t really survive reboots well. For that, we tend to use tools more towards jumpstart, kickstart.

Monitoring with Nagios, related tools are rrd-tools such as cacti, cricket, munin, “or any of 8,000 others.” Automating ideas like iostat.

Nessus is a security testing tool.

homegrown, general, performance related, also automated backups — bakula, amanda, legato.

What do you want automated?

“Coffee machines”.

A lot of unsolved problems are human interaction.

Other problems solved — using remote power management.

Inventory management is another issue. HP OpenView is one, but Frisch says folks are not happy with it. You can pay for high-end monitoring systems.

A question came up about an inventory of users on systems. LDAP or NIS or Active Directory is the traditional solution where there are no local accounts. There’s authentication and then authorization, and the automated tools usually have authentication information but not authorization information. (You can handle it, but making groups on these tools is usually painful.) Authorization is usually handled either locally or as “if you’re authenticated you’re authorized”.

We talked about how to power down 500 machines when the air conditioning goes out, or when the power is going down. Combinations of temperature probes, “wake-on-lan”, remote power on and off were discussed.

What do people use to automate installs and configuration on Windows? For installation, the Windows native tools are great. It was noted that efs works better on Windows.

Anyone using Splunk with Windows? One answer — it works OK, there are some daemon tools to convert Windows Event Log to syslog.

Splunk came up as a topic of discussion, how it’s a great log management software and solves a problem we’ve had for decades — how to deal with logs. Frisch says, “Splunk is the most promising thing out there.”

Record keeping of time was brought up, as well as time management. Basically what we do at Pythian, so I explained how we do things. Other folks brought up ticketing systems as well. Jira and RT (Request Tracker) and OTRS (Open Ticket Request System) were brought up as well.

Also for change management, some folks use ClearCase (not open source), and others use rancid, others use Trac or bugzilla + change management system like subversion. Jira was recommended as a product that does both (with an add-on).

Use DHCP to help automate IP assigning. rsync is your friend too.

(it occurs to me that a dishwasher is an interesting problem; why do we have a dishwasher instead of just having a sink/dishwasher hybrid? Similarly, a hamper that does laundry for you when it’s full.)