December 25, 2010

Day 25 - Wikis and Documentation Part 2

Intro

This article is part two of a two part series and a collaboration between Brandon Burton and Philip J. Hollenback where they explore the problems with Mediawiki, the challenges of writing good documentation with today's tools, improvements that Brandon has implemented, and ideas for further improvements.

In part two, we focus on overcoming Mediawiki's poor defaults by covering existing improvements that Brandon has implemented at work and exploring a number of additional ideas for ways to improve it further.

Things I've implemented at work

As I mentioned, I have implemented a number of improvements to the Mediawiki installation at work, they fall into two categories.

  1. Visual Improvements
  2. Layout/Style Improvements

Visual Improvements

The first you'll notice about a default Mediawiki installation is that is ugly, it's very Web 1.0 in all the worst ways. I chose to improve on this with four things

  1. A better theme
  2. Tweaking the fonts
  3. Tweaking the URL colors
  4. Finding some better CSS for the tables, as we use tables heavily

Theme

Choosing a visually appealing, but not crazy, theme is very important. I chose the daddio because it has a simple color scheme and the shiny blue in the navigation bar adds some pop but doesn't distract. I also liked how it put the User links at the top as a navigation aid. This particular design choice offers a lot of possibilities if you want to do custom navigation along the top.

Fonts

The default font for the daddio theme is just font-family: sans-serif;, which isn't very pleasant to look at for long periods of time. I improved this with font-family: Verdana;. Of course, a picture is worth a thousand words, so -

Before:

beforefonts

After:

afterfonts

URL Colors

I found, and I think most people would agree, that the default URL colors of Mediawiki to be pretty much an eye sore, so I chose to do something a little more pleasant.

There are three main kinds of links in Mediawiki and I adjusted the colors for all three. They are:

  1. Visited
  2. Normal
  3. Page Doesn't exist

By default they have the following colors:

  1. Visited = Purple
  2. Normal = Dark Blue
  3. Page Doesn't Exist = Red

beforeurlcolors

After I finished tweaking them, they were

  1. Visited = Dark Green
  2. Normal = Light Blue
  3. Page Doesn't Exist = Light Orange

afterurlcolors

Tables

I think that the kind of documentation we produce as systems administrators, and so the kind we produce a lot of at my work, is best formatted in two ways.

  1. Lists (Bulleted or Numbered)
  2. Tables

I'm a big fan of tables, and having memorized the table syntax for the Mediawiki markup, I use a lot of tables. Unfortunately, the default table style is pretty bad. It didn't take a lot of CSS to fix that though. There isn't much else to say, so I'll show you the difference.

Before:

beforetable

After:

aftertable

In the spirit of sharing, I've put the CSS I used and the change I made to the daddio up as a Gist.

Layout/Style Improvements

A default Mediawiki is a wide open thing, and if consistent structure and style is not enforced from the beginning, it becomes a giant pile of chaos.

The two key things we did to prevent these were the creation of:

  1. Templates
  2. Style Guidelines

Templates

One of the keys to effective documentation and increasing people's likelihood of producing good documentation is consistency. And, as any good systems administrator knows, a key to consistency is templates.

Now, I want to stop and clarify, the templates I'm talking about here are not the Mediawiki templates, but are just pages in Mediawiki with all the markup written, so each user can just copy and paste them as make new pages. I investigated the Mediawiki templates, but they serve a slightly different function from what I was aiming for.

For my work, we have three main types of documentation that we made templates for:

  1. Customer Infrastructure
  2. Software Specific
  3. Procedure Specific

What I did was to make a specific /Templates/ section of our wiki and make a template for each type of documentation. This coupled with a set of guidelines on how we should document, meant that as each of us created a new page, perhaps for a new customer or a new piece of software we've begun using, we just need to edit the template page, Ctrl-A Ctrl-C, change the tab, and Ctrl-V and you've got all your structure in place.

An example of the template we use for software specific page is available as a Gist

Style Guidelines

Finally, we put together some basic guidelines on how we were going to document things. A lot of these ideas came from the OpenSolaris Guidelines, which says

Readers project some significance onto every change in language, tone, or typographic convention. A consistent style enables readers to internalize the language and text conventions of the document. As a result, understanding occurs more easily and significant points stand out more clearly. Consistency is one of the most valuable aspects of good style. In addition, consistency improves clarity for translation purposes.

I'll highlight some of the guidelines we came up with were:

  1. Always start from a template.
  2. Update the templates as new items that should exist in them are discovered
  3. Lists are used to break out information from the paragraph format and to structure the information into an easier-to-read format. Lists must include at least two items. If a list has more than nine items, try to identify a way to divide the list into two or more lists. Use unnumbered (bulleted) lists when the items are not dependent on the sequence in which you present them. When the items are dependent on sequence, use numbered lists with numerals and letters to build the hierarchy.
  4. The purpose of many technical documents is to explain how to use a product to accomplish specific tasks. In such documents, detailed instructions on how to accomplish tasks are often provided in the form of procedures. Procedures contain an ordered set of steps.

Those are the steps that we've taken so far to improve our Mediawiki installation, a combination of:

  1. Visual Improvements
  2. Layout/Style Improvements

Additional ideas

The improvements we've made are by no means the limit of what can be done with Mediawiki. In fact, I have a number of ideas for further improvements:

Extensions

Some of the potential extensions are

  1. Dynamic Page List
  2. This extension has a simple query language you can use to generate a list from any other wiki content
  3. Can be used with either a tag or as a parser function
  4. PDF Export
  5. Syntax highlighting
  6. Make your own
  7. Building your own extensions is relatively easily
  8. Extensions that query other database or data sources to integrate into your wiki

TODO Tracker

  • A TODO is a "bread crumb" that you leave which indicates further action, improvements, etc, that should be done for a specific piece of documentation.
  • An example is shown below:

    TODO: Write more detail instructions on how to document DNS records for a customer.

  • The idea is that you leave "TODO: foo" whenever there is something left to be done. Then we have an external script which will crawl the wiki and generate a report of Documentation TODOs.

  • This allows use to embed the action in the relevant context, but still generate a total view of the actions for assignment and should help guarantee documentation does not grow stale.

Categories

Categories to create "Automagic index pages"

Automation

The pywikipediabot toolkit offers a lot of potential for doing automation.

Conclusion

In conclusion, though wikis have a number of poor defaults and drawbacks and while producing good documentation is difficult, with a good base like Mediawiki, some customizations, and some good guidelines, you can go a long way towards making things easier and more fun to use.

Further Reading

  1. Mediawiki Book
  2. The Practice of System and Network Administration
  3. Writing Great Documentation, what to write
  4. Documentation Tips for Sysadmins

December 24, 2010

Day 24 - Terminal Multiplexers

Written by Jon Heise

In the grand world of the command line and in dealing with servers, persistence is key. To that end, there are ways to run stuff that lives beyond your current ssh session, this article will focus on screen and tmux. These tools help you run shell sessions that persist across logins (among other awesome capabilities).

GNU Screen is by far the older of the two being first released in 1987. Tmux was released much more recently in 2007.

Similarities

First off, let's talk similarities. They both have the core functionality of being able to “detach” a session from active usage by a user. Let's say a user wants to run irssi (an irc client) and detach it - the workflow would be as follows:

Note: screen and tmux are generally controlled by multiple keystrokes. In the examples below, C-a means pressing "control + a" keys. A sequence 'C-a d' means press "control + a", release, then press 'd'. This syntax is common in screen and tmux documentation.

goal screen tmux
run irssi screen -S irc irssi tmux new irssi
detach C-a d C-b d
attach again screen -r irc tmux attach

FYI: The default screen command key is C-a, while the default tmux command key is C-b. Most things you do in screen and tmux, by default, will always start by pressing the command key.

The above examples create a screen session called 'irc' and an unnamed tmux session. Both tmux and screen also support multiple terminals in one session, that is, you can run multiple commands in separate windows in the same screen or tmux session.

goal screen tmux
create new window C-a c C-b c
go to previous window C-a p C-b p

Using multiple windows in tmux is a bit easier, by default, as there is an omnipresent status bar at the bottom of the terminal showing open windows.

You can get a list of all known sessions:

screen -ls tmux ls

That's the basics of each - first: creating, attaching, and detaching sessions, and second: creating new windows in a session.

Differences

Having listed features that they both share, it is time to discuss features that one posses that the other does not.

Screen has special features like serial port support and multiuser access. Screen can directly open serial connections from other machines, network gear, or anything that wants to spew data forward on a serial terminal. This feature can be handy for sysadmins with macbooks that need to configure some network gear as there is no good mac specific terminal emulator software. Connection to a serial port is simple:

screen /dev/ttyN

Additionally, you can share your screen sessions with other users on the system with screen's "multiuser" feature. This allow you to share terminals with remote coworkers to do pair debugging or shadowing without having to use a user shared account. See the screen documentation for the following commands: multiuser and acladd.

Tmux's main feature is the client/server model it uses. Earlier it was mentioned that not being able to specify a specific instance of tmux was less of a nuisance, this is due to the fact that tmux runs in a client server model. As long as there is a session of tmux running (even in the background), the tmux server will exist to manage it. Since there is a central server dealing with each tmux client and session, it is far simpler for the client that the user has launched to be aware of these. If you have multiple tmux sessions running (via tmux new), you can ask any tmux session to list them by typing 'C-b s', the result looks something like this:

tmux session list

From the above session list, you can switch to any other open session.

Until recently, tmux (vs screen) was the only one supporting both vertical and horizontal splits. You can create horizontal splits with C-b " and C-b % for vertical splits. In screen, horizontals are made with C-a S and verticals made with C-a | (pipe).

tmux splits example

Tricks

You can nest screens within screens, or tmux within tmux. This is most common when running tmux or screen, sshing to another server, and running screen/tmux from there.

goal screen tmux
start up screen -S main tmux new -s main
ssh somewhere ssh ... ssh ...
create a new session on remote host screen -S foo tmux new -s foo

The main problem with nesting is knowing how to talk to the nested session. To detach from the 2nd screen session (nested) you'll have to send 'C-a a' which will send a litteral C-a from the first screen to the running program, which is another screen session. Detaching from the nested screen, then, would be 'C-a a d'

This is similar with tmux, though your tmux may not be configured similarly. You may have to add this to your ~/.tmux.conf:

bind-key C-b send prefix

Now pressing 'C-b C-b d' will detach from the nested tmux

The above only applies if you nest screen-in-screen and tmux-in-tmux. If you have screen-in-tmux, you would just press the normal C-a to talk to screen. Same with tmux.

Making them similar again

Screen's defaults don't usually include a status bar, but you can make one similar to tmux by adding this to your ~/.screenrc:

hardstatus alwayslastline "%w"

You can also tune tmux to behave more like screen by changing the command key. In your ~/.tmux.conf:

set-option -g prefix C-a  # make the command key C-a
unbind-key C-b            # unbind the old command key
bind-key a send-prefix    # 'C-a a' sends literal 'C-a'

Why would you do this? If you've used screen for years, and want some of your muscle memory to function in tmux, doing the above is a good start.

Cheat sheet

In closing, here is a recap of commands in convenient cheat sheet form:

action screen tmux
new named session screen -S foobar tmux new -s foobar
detach session `C-a d` `C-b d`
reattach session screen -dr foobar tmux a -t foobar
new terminal `C-a c` `C-b c`
next terminal `C-a a` or `C-a n` `C-b n`
lock terminal `C-a x` lock (from command prompt)
large clock not supported `C-b t`
not as good smaller clock `C-a t` not supported
split screen horizontal `C-a S` `C-b “`
split screen vertical `C-a |` `C-b %`
change to other portion of a split `C-a tab` `C-b arrowkey` (up or down)
send prefix `C-a a` `C-b C-b` (if

Happy tmux'ing and screen'ing!

Further reading:

December 23, 2010

Day 23 - Package vs Config management.

Written by Joshua Timberman

Package management is a best practice in system administration. So is automated configuration management. However, the maintainer scripts run by package management tools are an anti-pattern almost in direct conflict or competition with configuration management systems.

In my examples I'm going to talk about Debian packages and Chef, because that is what I use. Adapt your mindset for your own favorite distribution and configuration management tool.

Server Lifecycle

When almost all the modern, popular Linux distributions were created, servers had a general lifecycle, and an expected supportability throughout that lifecycle. Some distributions have a commercial entity that provides paid support. Others have an excellent user community that volunteers their time to help users and administrators. Many considerations in the development of the Linux distribution stem from the expectation that someone will require support, and the distribution should provide a supportable release. In addition to this, the package's maintainer scripts is what provides additional configuration, such as creating users, or starting services provided by the package.

Package Management

One of the value-adds of most Linux distributions is the package management system. Package management behavior and maintainer scripts are well documented by the distribution to be supportable by a company of support engineers, or a community of volunteers. For system administrators, however, the main reason to use package management is to get some pre-compiled software on the system, and to resolve and install any dependencies that package may have; it is less necessary to have a service start on package install. For example, CouchDB requires Erlang and various other libraries, so the package manager would install those libraries, Erlang and CouchDB. While package management has many other benefits, such as version management, and they can do things like drop off configuration files and start up daemons that were installed. There is definite business value in using packages, and that's why it is a sysadmin best practice.

Many system administrators create their own packages and host them on an internal repository. In most of the environments I've worked in, these packages were as simple as just managing the files included in the package usually ignoring the upstream culture of maintainer scripts and other policies, because the system administrator planned to use a configuration management tool to automate setup and maintenance of the software to run the business application. In these cases, the software provided by the distribution did not meet the needs of the business in some way. Perhaps an application required a newer library version, or you needed to patch in a feature or bug fix, or the default setup of a package conflicted with the way a business application was deployed.

Configuration Management

There are as many different application deployments as there are businesses. The different ways the application stacks are deployed provide a specific business value. The application stack often includes a number of the distribution-provided packages, as well as the code written by the business's software developers.

However, most companies have unique needs when it comes to how the software runs in their environment. Perhaps the HTTP server default configuration isn't properly tuned for the web application that it serves. Maybe the business requires that the MySQL server have replication slaves, and this configuration is not enabled by default. Perhaps the system administrator(s) that run the servers have tuned a particular web server for performance, but it conflicts with another web server package. The actual conflict is based on configuration, not on binaries that are created - both packages by default listen on the same port when the service is started.

For these reasons and more, automated configuration management tools such as Chef are now modern system administration best practice.

The problem we face, is that the packages that we install often run a number of maintenance scripts to ensure that the package is set up and configured. The distribution included the scripts to enforce some policy such as where to put certain configuration files, start services, or where to locate data files created by the packaged software. In some cases, the package maintainer scripts only perform actions when the package is removed (postrm in Debian/Ubuntu), and if there are problems, they don't surface until the package is removed.

Example of the Conflict

To illustrate the conflict between package maintainer scripts and configuration management systems, let's look at a couple use cases with MySQL. We are using Chef to automatically install the mysql-server package on Ubuntu 10.04 LTS running on an instance in Amazon EC2. Our two business requirements are setting a randomly generated root password and move the MySQL data directory to ephemeral storage, as the default location is on a smaller filesystem size. Normally, the package installation on Ubuntu will prompt the user for input on the password, which we then need to work around to automate the package installation. We'll need to generate a preseed file to give the proper settings to the package manager. We install mysql-server on a test system:

sudo apt-get install mysql-server

(And enter a bogus password when prompted, which is what we are trying to avoid).

To get the preseed settings for the package, we need the debconf-get-selections package:

sudo apt-get install debconf-get-selections

Then we get the mysql-server settings for our preseed file:

sudo debconf-get-selections | grep ^mysql-server > mysql-server.seed

We'll use a template that has a generated password (@mysql_root_password), along with the rest of the contents in the file:

mysql-server-5.1 mysql-server/root_password_again select <%= @mysql_root_password %>
mysql-server-5.1 mysql-server/root_password select <%= @mysql_root_password %>

And we set this up with Chef using a template and execute resource:

template "/var/cache/local/preseeding/mysql-server.seed" do
  source "mysql-server.seed.erb"
  owner "root"
  group "root"
  mode "0600"
  notifies :run, "execute[preseed mysql-server]", :immediately
end

execute "preseed mysql-server" do
  command "debconf-set-selections /var/cache/local/preseeding/mysql-server.seed"
  action :nothing
end

Then we have a package resource that installs mysql-server:

package "mysql-server"

Next, we want to configure an alternate location for the MySQL database on the ephemeral storage, as the database size may grow beyond the default root partition size (10G). An example Chef recipe to do this might look like:

service "mysql" do
  action :stop
end

execute "install-mysql" do
  command "mv /var/lib/mysql /mnt/mysql"
  not_if do FileTest.directory?("/mnt/mysql") end
end

directory "/mnt/mysql" do
  owner "mysql"
  group "mysql"
end

mount "/var/lib/mysql" do
  device "/mnt/mysql"
  fstype "none"
  options "bind,rw"
  action :mount
end

service "mysql" do
  action :start
end

We have to stop MySQL, move the directory, and restart MySQL. We use a bind mount so the configuration in /etc/mysql/my.cnf does not need to be changed. If we wanted to do that, there's additional configuration required.

Neither of these scenarios take into account the additional complexity required to manage the Debian system maintenance user set up in the MySQL package, or countless settings possible to set up MySQL tuning parameters, or database formats.

We're forced, here, to do extra work to skirt around problems created by the package management tool trying to be responsible for things outside of packages. The anti-pattern is exacerbated if we have to manage the package and installation on a different OS. Then, we'd have to redo the whole dance for another platform. If our package manager simply dropped the binaries/libraries off and we could handle this configuration directly and much in the configuration management, it would be much easier to manage in a heterogeneous environment.

Conclusion

Package management certainly has value! It allows system administrators to install a base OS image that gives all the hardware support and user-land well known and loved in Unix/Linux systems. When it comes to the application stack required by the business, custom configuration is often required. Package maintainers don't, and can't be expected to, imagine every possible custom configuration. Configuration management tools can, however, be used to cover any custom configuration, since that is their job.

After all, part of the Unix (and Linux) philosophy is that each program should do one thing well.

Further Reading

About the author

Joshua Timberman is a Technical Evangelist for Opscode. He has worked for a wide range of companies as a system administrator: from small company IT support to Enterprise web infrastructure delivery for Fortune 500 companies. He helps companies and individuals learn how to use Chef and the Opscode Platform. He wrote the majority of the Chef cookbooks Opscode publishes, teaches the Chef Fundamentals class, and speaks at user groups and conferences. He can be found as jtimberman on Twitter, Skype Freenode, GitHub and more, or via email joshua@opscode.com.

December 22, 2010

Day 22 - DevOps: Where Are We Now (part 2)

Intro

This article is part two of a two part series by [Brandon Burton][solarce] which explores what [DevOps][devops] is and why it matters, where it has come since Brandon's article in July, and where you should be looking if you want to be involved.

Part two focuses on where DevOps has come since July and where you should be looking if you want to be involved.

What's happened since July

Back in July, I wrote an article entitled DevOps (and Reliam) and why it matters which covered the idea of CAMS, discussed briefly where Reliam sees DevOps fitting into its business, and highlighted a number of resources for following with growth of DevOps. Things have been very busy since then. There has been an explosion of blogs, articles, conferences, user groups, and online discussion about DevOps.

If you're still a little unsure what this DevOps thing is all about, I'd highly recommend you read or revisit the following URLs before digging into the rest of the content in this article.

Since DevOps is first and foremost about people, the most exciting goings on since July have been the meetups, conference talks, and videos people have been producing.

Meetups and Conferences

There have been a growing number of meetups, including

Additionally, we've seen DevOps Days conferences begin being organized, including

Some specific talks and videos

There have been DevOps related talks and videos occurring at meetups and other conferences, some of the ones I found excellent where

Blog Articles

There have been so many blog articles, but I'll try to highlight a few that I think are worth reading, probably twice.

I'm going to highlight all the articles in a series that Matthias Marschall (@mmarschall) just wrapped up, because they are all high quality and must reads (in my opinion)

Where to go next

So now that you've digested all the awesome DevOps content that's been cranked out since July, you're probably wondering where you should be looking so you can keep up on all the awesome content that's going to be produced in the future. I'm happy to share all the places I'm watching, which span your usual collection of blogs, podcasts, mailing lists, and Twitter.

Mailing Lists/Group Discussion

Twitter folks to Follow

And of course, you can follow the #devops hashtag on Twitter.

Blogs to Follow

Podcasts

Summary

In summary, DevOps has seen a lot of activity since July. There has been an explosion of blog posts, meetups, and conferences, particularly in the last couple of months. There are many places to be watching and hopefully you've been able to add a few to your list.

December 21, 2010

Day 21 - Wikis and Documentation

Intro

This article is part one of a two part series and is a collaboration between Brandon Burton and Philip J. Hollenback where they explore the problems with Wikis, the challenges of writing good documentation with today's tools, improvements to Mediawiki that Brandon has implemented, and ideas for further improvements.

Part one focuses on the problems and challenges.

The problem(s) with Wikis is...

It's a wiki! Wikis are a slightly less-worse alternative to all other documentation and publishing mechanisms. What's the worst thing about wikis? Well...

Here it is, 2010, and guess what? Every wiki works pretty much exactly the same as they did back in 2001 (or, for that matter, back in 1995 when WikiWiki was invented). Why has absolutely no real development happened in the world of wikis? I realize that there may be amazing commercial wikis out there like Microsoft Sharepoint or Confluence, but who uses them? Instead we all blindly set up our own Mediawiki installations over and over again, with all the same annoyances and problems. We are all unquestioning worshippers at the altar of the wiki.

Let's get down to business: here are some of the numerous things wrong with wikis, in no particular order:

CamelCase

This seemed amazing back in 2001 because it allowed you to create your own web pages on the fly. Amazing! However, the really cool thing was the autocreation of web pages, not the mechanism of CamelCase. Camel case was just an easy way to tell early wiki syntax parsers to create a link to a new page. Nine years later, camel case is faintly embarrassing. It's like those pictures from the early 80s where guys all had perms - seemed a good idea at the time. Every single time you try to explain wikis to someone, you have to apologize for how camel case works.

Markup Languages

Wiki markup languages must be amazing and precious, because we have dozens of them to choose from. Seriously? I have to remember whether to write

[[www.hollenback.net][http://www.hollenback.net]]

or

[www.hollenback.net|http://www.hollenback.net]

or

[http://www.hollenback.net www.hollenback.net]

based on which wiki I'm using? That's awesome.

Tables

If you ask 99% of office workers how to create a table, the answer is fire up Excel. Wikis actually manage to make that worse due to the pain of creating tables. The canonical table representation in wikis is vertical bars and spaces, and you better not accidentally add an additional column unless you want to spend 15 minutes tracking down that one extra vertical bar somewhere.

| *this* | *is an* | *awesome table* |
| there | are | many like it | but this one | is mine |

Attaching Images and Documents

Looking for a standard way to drop images into a document? Good luck with that. If you are lucky you can attach an image to a page, assuming you don't accidentally exceed the web server file upload size. Wait, did you also say you want to flow the text around your image? You just made milk come out of my nose. Next you will be asking for the ability to right-justify your image on the page! What is this, QuarkXpress?

Attaching documents to a wiki is just as bad, because most wiki software uses the same horrible upload mechanism. As a bonus, any Excel spreadsheet you attach to a page becomes an inert lump of no-displayable, non-searchable data.

Organizational Structure

We all love really shallow document hierarchies, right? Must be true because that's how every wiki works. Oh sure we all pretend there is a tree structure in wikis but nobody ever uses it. We all end up creating zillions of top-level documents. Which then brings us to the issue of wiki search, which is also essentially nonexistent. Most people cheat and use a domain-specific google search instead, but then you surrender your site to the whims of the almighty google. That means your search mechanism doesn't have any domain-specific optimizations.

The problem with documentation is...

The problem with documentation is that it's a lot of effort to write clear, correct, and usable documentation. It takes time, not just any time, but concentrated, distraction free time. The sort of time that there is never enough of. Further, it takes a plan - a design for serving your intended audience reasonably. It does not help that most of the common tools that are chosen as the repository of the documentation are not very good. Bad tools drain your time. Sadly, this includes the most popular tool (in my experience), wikis, particularly my favorite tool for keeping documentation, Mediawiki.

Documentation + Mediawiki == Maybe better

Having said all that, wikis are still the best widely available documentation solution out there.

Of all the available wikis, Mediawiki is the wiki most commonly chosen, and this is the one that Brandon has had the opportunity to make a number of improvements to.

Since Mediawiki is open source software and just PHP + MySQL + Text + CSS, it is relatively easy to improve how it can be used to keep more effective documentation. I've had the opportunity to make a number of changes to the Mediawiki installation at my day job and I'm going to take part two of this article to share those with you. Additionally, I have some other ideas on how Mediawiki could be improved even further, a number of which have come from reading the Mediawiki book

In part two, we'll explore some improvements that Brandon Burton has implemented at his work and some ideas Brandon and Philip have for further improvements.

December 20, 2010

Day 20 - Github Gist

This article was written by Phil Hollenback (@philiph)

I assume everyone is familiar with the idea of a pastebin - a website for sharing text fragments with an emphasis on code fragments. Pastebins have been around since 2002, according to Wikipedia. They're an incredibly useful resource for sharing textual data and are something we, as sysadmins, need to do on an almost continual basis. However, there are several problems with some existing pastebin implementations:

  • lack of command-line integration
  • no version control
  • no privacy settings

I recently came across a new (to me, anyway) alternative to the traditional pastebin: github gists. The following is a description of how gists work and how they differ from traditional pastebin clippings. I'll also describe some ways you can collaboratively edit gists with one or more people.

What's a Gist?

A gist is simply a text clipping with optional syntax highlighting, the same as you would find in any other pastebin. You can go look at some right now to get the idea.

So, why would you want to use this instead of a traditional pastebin? Pick a file (say, a perl script) and hold on to your socks:

$ gist test.pl
https://gist.github.com/737292

That's it! You just created a syntax-highlighted text clipping anyone on the internet can view.

Unfortunately, there is some up-front work to get this all set up. You can't just post anonymous gists to github.com like you can with some pastebins. I'll detail that setup info below. And, here's the really exciting part: there's an emacs script to automate all of this!

Initial Setup

As I mentioned, you have to have a github account to create gists (or to comment on existing gists). The good news is that's free and just takes a moment to set up. Once you have your account created, go to your account page and click on Account Admin. You will find your API token on this page. Take a moment to copy that down as you will need it to set up your command-line gist client.

You should also click on SSH Public Keys in the account settings page and upload your ssh public key. you're going to need this to edit gists shortly. Did I mention that gists are version controlled with git?

I'm assuming you have the git client installed for your linux or mac box already, if you don't have that go get it now as you will be using git a lot for all this. One thing that was not clear to me, initially, was how to set up your local git config for gist access. This is controlled by your ~/.gitconfig file, which will look something like this:

[user]
    name = <your name>
    email = <your email>
[github]
    user = <your github username>
    token = <your api token>

You can actually read and write from this file via git config on the command-line, like this:

git config --global github.user username
git config --global github.token blah

the gist command-line and emacs clients use this mechanism to read from your ~/.gitconfig.

Once you have this all configured, download and install the gist command-line client. I used the gem install gist install method which worked just fine. Verify your setup works by creating a gist, as above.

Now What?

At this point you've got a simple, command-line pastebin client which is a pretty useful thing. For example, suppose you want to demonstrate some code to someone on twitter. Instead of mucking with pasting your code into a regular pastebin website, feed your script directly to the commandcommand-st client. Right here you've got an url you can paste into your tweet. If the viewer of your gist goes through the small hoop of creating their own github account, they can leave comments about your gist too.

Don't worry, though - there's lots of other ways to use gists. For example, there's an emacs interface to gists!

Emacs Mode

The emacs interface for gists is gist.el. It supports mostly the same options as the regular command-line client with a few twists. For example, you can use gist-list to select from and open one of you public gists.

I've been using the emacs gist interface quite heavily to share gists with others. For example, if someone tells me 'check out my gist 741773', I can just hit <Meta>-x gist-fetch<RET>741773 to pop that gist right into an emacs buffer.

Unfortunately the emacs mode suffers from some glitches due to problems with ssl access in emacs. I had to hack on gist.el somewhat myself to get it working with Aquamacs on my mac. Thus while I'm pretty excited about gist.el, it's not really ready for primetime.

Markdown

In addition to plain text and programming language markup, gists also support Markdown. Actually they support Github Flavored Markdown, which includes a few small tweaks of the original Markdown language.

I assume most readers are familiar with Markdown, but if you aren't, it's a simple way to write structured ASCII text that can be easily turned into HTML or other document formats. The beauty of Markdown is it's completely readable as straight ASCII as well as HTML.

To force interpretation of your gists as markdown, use the .md file extension on the file you upload to create a gist. When you view your gist on github you will see it all dressed up with headers and bullets and everything.

Private Gists

By default, gists are public. This is the standard convention for pastebins - everyone can see what you post. This usually works just fine. However, if you want to protect your information, you can create a private gist. There are two differences between private and public gists:

  1. public gists show up on the gist main page.
  2. public gists use easily guessable sequence numbers, private ones use hash identifiers.

For #2, public gists have incremented IDs like 73962 while private gists use hashes like d17b2652f7896c795723. In practice, this makes it difficult to guess the ID (and URL) of a private gist. Note there is no real security here in the form of access controls - if someone obtains your private gist ID, they can access it. Thus, don't use private gists for passwords or other sensitive information.

However, private gists work just great for information you want to protect but isn't super critical. I would feel fine pasting config files as private gists, for example.

With the gist command-line client, use the -p switch to create a private gist, or use git-config to set your default gist posting mechanism to private. If you are going to use gist as a pastebin to share system information such as config files and scripts, you should probably use private gists by default. The emacs interface supports similar functionality.

Using git for Gists

As I mentioned earlier, gists are stored in a git repository on github. That means you can use them to collaborate on a documentation project. Here's the workflow:

  1. Create a gist through web interface, cli, etc.
  2. Give your friend Joe the url to that gist on github.
  3. Joe visits that url and clicks 'fork' to get his own repository
  4. Joe makes edits to his forked copy of your gist
  5. Joe commits his changes to his repo, gives you his private clone url
  6. cd into your local repository on your computer
  7. Merge Joe's changes into yours with git pull <Joe's private clone url> master
  8. commit your merged changes to your repo with git commit -a and git push

That's it! You're now collaborating with someone on a shared script, config file, markdown document, or whatever. Also, since this is a distributed version control service, your collaborator can always fork your gist and start modifying their own copy.

Remember that the gist web interface supports comments, so if you don't want to do a full collaboration with someone, they can always just leave gist comments instead (although commenters do need github accounts). Note that comments don't seem to be exposed in the git repository, unfortunately.

I've focused on single-file gists in this description, but note that gists can contain multiple files. You can add additional files via the web interface or by creating additional files in your local git repository. This is another important difference from traditional pastebins.

Why Should I Care About This?

As a sysadmin, I'm excited about this tool for a number of reasons. Mainly, I have a need to share scripts, config files, and the like with other sysadmins. Currently, that involves emails or traditional cut-n-paste pastebins. Neither of these solutions are very satisfactory.

What I want is a way to create public and private pastebins from the command-line and share those via a URL. I also want a way to mark up and collaborate on text files. Finally, it would be pretty handy if those files were automatically version-controlled and stored somewhere out on the internet for me.

Oh, also, that tool better not cost me anything, because I'm cheap and/or poor. Hey look, github gists support all those features! That's why I've started using gists instead of the old pastebins. The command-line and emacs integration are the real power of gists. Gists are a direct interface between your terminal and the cloud, all wrapped up in a sysadmin-friendly package.

Further Reading

December 19, 2010

Day 19 - Upstart

This article was written by Jordan Sissel (@jordansissel)

In past sysadvents, I've talked about babysitting services and showed how to use supervisord to achieve it. This year, Ubuntu started shipping its release with a new init system called Upstart that has babysitting built in, so let's talk about that. I'll be doing all of these examples on Ubuntu 10.04, but any upstart-using system should work.

For me, the most important two features of Upstart are babysitting and events. Upstart supports the simple runner scripts that daemontools, supervisord, and other similar-class tools support. It also lets you configure jobs to respond to arbitrary events.

Diving in, let's take a look the ssh server configuration Ubuntu ships for Upstart (I edited for clarity). This file lives as /etc/init/ssh.conf:

description     "OpenSSH server"

# Start when we get the 'filesystem' event, presumably once the file
# systems are mounted. Stop when shutting down.
start on filesystem
stop on runlevel S

expect fork
respawn
respawn limit 10 5
umask 022
oom never

exec /usr/sbin/sshd

Some points:

  • respawn - tells Upstart to restart it if sshd ever stops abnormally (which means every exit except for those caused by you telling it to stop).
  • oom never - Gives hints to the Out-Of-Memory killer. In this case, we say never kill this process. This is super useful as a built-in feature.
  • exec /usr/bin/sshd - no massive SysV init script, just one line saying what binary to run. Awesome!

Notice:

  • No poorly-written 'status' commands.
  • No poorly-written /bin/sh scripts
  • No confusing/misunderstood restart vs reload vs stop/start semantics.

The initctl(8) command is the main interface to upstart, but there are shorthand commands status, stop, start, and restart. Let's query status:

% sudo initctl status ssh
ssh start/running, process 1141

# Or this works, too (/sbin/status is a symlink to /sbin/initctl):
% sudo status ssh 
ssh start/running, process 1141

# Stop the ssh server
% sudo initctl stop ssh
ssh stop/waiting

# And start it again
% sudo initctl start ssh 
ssh start/running, process 28919

Honestly, I'm less interested in how to be a user of upstart and more interested in running processes in upstart.

How about running nagios with upstart? Make /etc/init/nagios.conf:

description "Nagios"
start on filesystem
stop on runlevel S
respawn

# Run nagios
exec /usr/bin/nagios3 /etc/nagios3/nagios.cfg

Let's start it:

% sudo initctl start nagios
nagios start/running, process 1207
% sudo initctl start nagios
initctl: Job is already running: nagios

Most importantly, if something goes wrong and nagios crashes or otherwise dies, it should restart, right? Let's see:

% sudo initctl status nagios
nagios start/running, process 4825
% sudo kill 4825            
% sudo initctl status nagios
nagios start/running, process 4904

Excellent.

Events

Upstart supports simple messages. That is, you can create messages with 'initctl emit [KEY=VALUE] ...' You can subscribe to an event in your config by specifying 'start on ...' and same for 'stop.' A very simple example:

# /etc/init/helloworld.conf
start on helloworld
exec env | logger -t helloworld

Now send the 'helloworld' message, but also set some parameters in that message.

% sudo initctl emit helloworld foo=bar baz=fizz

And look at the logger results (writes to syslog)

2010-12-19T11:03:29.000+00:00 ops helloworld: UPSTART_INSTANCE=
2010-12-19T11:03:29.000+00:00 ops helloworld: foo=bar
2010-12-19T11:03:29.000+00:00 ops helloworld: baz=fizz
2010-12-19T11:03:29.000+00:00 ops helloworld: UPSTART_JOB=helloworld
2010-12-19T11:03:29.000+00:00 ops helloworld: TERM=linux
2010-12-19T11:03:29.000+00:00 ops helloworld: PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
2010-12-19T11:03:29.000+00:00 ops helloworld: UPSTART_EVENTS=helloworld
2010-12-19T11:03:29.000+00:00 ops helloworld: PWD=/

You can also conditionally accept events with key/value settings, too. See the init(5) manpage for more details.

Additionally, you can start jobs and pass parameters to the job with start helloworld key1=value1 ...

Problems

Upstart has issues.

First: Debugging it sucks. Why is your pre-start script failing? There's no built-in way to capture the output and log it. You're best doing 'exec 2> /var/log/upstart.${UPSTART_JOB}.log' or something similar. Your only option for capturing output otherwise is the 'console' setting which lets you send output to /dev/console, but that's not useful.

Second: The common 'graceful restart' idiom (test then restart) is hard to implement directly in Upstart. I tried one way, which is to in the 'pre-start' perform a config test, and on success, copy the file to a 'good' file and running on that, but that doesn't work well for things like Nagios that can have many config files:

# Set two variables for easier maintainability:
env CONFIG_FILE=/etc/nagios3/nagios.cfg
env NAGIOS=/usr/sbin/nagios3

pre-start script
  if $NAGIOS -v $CONFIG_FILE ; then
    # Copy to '<config file>.test_ok'
    cp $CONFIG_FILE ${CONFIG_FILE}.test_ok
  else
    echo "Config check failed, using old config."
  fi
end script

# Use the verified 'test_ok' config
exec $NAGIOS $CONFIG_FILE.test_ok

The above solution kind of sucks. The right way to implement graceful restart , with upstart, is to implement the 'test' yourself and only call initctl restart nagios on success - that is, keep it external to upstart.

Third: D-Bus (the message backend for Upstart) has very bad user documentation. The system seems to support access control, but I couldn't find any docs on the subject. Upstart doesn't seem to mention how, but you can see access control in action when you try to 'start' ssh as non-root:

initctl: Rejected send message, 1 matched rules; type="method_call",
sender=":1.328" (uid=1000 pid=29686 comm="initctl)
interface="com.ubuntu.Upstart0_6.Job" member="Start" error name="(unset)"
requested_reply=0 destination="com.ubuntu.Upstart" (uid=0 pid=1 comm="/sbin/init"))

So, there's access control, but I'm not sure anyone knows how to use it.

Fourth: There's no "died" or "exited" event to otherwise indicate that a process has exited unexpectedly, so you can't have event-driven tasks that alert you if a process is flapping or to notify you otherwise that it died.

Fifth: Again on the debugging problem, there's no way to watch events passing along to upstart. strace doesn't help very much:

% sudo strace -s1500 -p 1 |& grep com.ubuntu.Upstart
# output edited for sanity, I ran 'sudo initctl start ssh'
read(10, "BEGIN ... binary mess ... /com/ubuntu/Upstart ... GetJobByName ...ssh\0", 2048) = 127
...

Lastly, the system feels like it was built for desktops: lack of 'exited' event, confusing or missing access control, stopped state likely being lost across reboots, no slow-starts or backoff, little/no output on failures, etc.

Conclusion

Anyway, despite some problems, Upstart seems like a promising solution to the problem of babysitting your daemons. If it has no other benefit, the best benefit is that it comes with Ubuntu 10.04 and beyond, by default, so if you're an Ubuntu infrastructure, it's worth learning.

Further reading:

December 18, 2010

Day 18 - DevOps

This article was written by Brandon Burton

This article is part one of a two-part series exploring what DevOps is, what it means for systems administrators (in all his/her many forms), where it has come since my post earlier this year, and where you should be looking if you want to be involved.

Part one focuses on what DevOps and what it means for the Systems Administrator.

DevOps

DevOps is really just a label for a rallying point around a variety of trends (primarily lead by Web Operations) that have begun to coalesce over the last couple of years. These trends, ideas, and tools have existed in various forms and have been adopted to various extents for many years, but have seen a huge coming together in the last two years. Early voices in this area include John Allspaw's talk at Velocity 2009 and Patrick Debois's coining of the 'DevOps' term, and is covered well by a writeup by John M. Willis

CAMS

The principles that DevOps is about are best summarized as CAMS, the four pillars of DevOps. CAMS is:

Culture

People and process first. If you don't have culture, all automation attempts will be fruitless.

Automation

This is one of the places you start once you understand your culture. At this point, the tools can start to stitch together an automation fabric for DevOps. Tools for release management, provisioning, configuration management, systems integration, monitoring and control, and orchestration become important pieces in building a DevOps fabric.

Measurement

If you can't measure, you can't improve. A successful DevOps implementation will measure everything it can as often as it can: performance metrics, process metrics, and even people metrics.

Sharing

Sharing is the feedback part of the CAMS cycle. Creating a culture where people share ideas and problems is critical. Another interesting motivation in the DevOps movement is the way sharing DevOps success stories helps others. First, it attracts talent, and second, there is a belief that by exposing ideas you can create a great open feedback that in the end helps them improve.

Why it's important

DevOps is important because it represents a shift in the way we are thinking about and practicing operations and how we are interact with developers -- and beyond to other groups within businesses.

DevOps is important because it is about helping, it's about an attitude that says, "I'm going to make a difference, I'm going to cooperate and communicate, and I'm going to understand that in the business of delivering great software, we're all in it together."

For the Systems Administrator

Don't let any potential hype or the the fact that the discussion seems focused on things like the cloud, nosql, or agile deter you. DevOps represents the things that Systems Administrators love and advocate.

It's about planning for failure, choosing good tools, creating and using good lines of communications. Half of DevOps is operations, and operations is systems administrators. DevOps advocates ensuring systems administrators are involved in the design process and architectural decisions. It's about promoting operations as a provider of value and not as a cost center.

I think that as systems administrators we can see the value in these things being given a voice and DevOps represents an opportunity to be a part of an ongoing conversation and see the values, tools, and practices we believe in reach a wider audience and wider adoption.

Where to go from here?

In part 2, I'll explore past and future events, how and where to learn, and how to get more involved.

Further reading:

December 17, 2010

Day 17 - Smoke Testing Deployments using Cucumber

Written by Gareth Rushgrove (@garethr)

Developers love tests: testing, quality, and inspection tools. Modern testing practices yield automated tests for code and means to run them constantly while developing an application. When the app hits production, the operations team often have a different set of tools for monitoring the health of everything from disk space to requests per second to service health. To get the tested code into the well-monitored production environment, you need to deploy it, and that's where smoke testing comes in.

But What is Smoke Testing?

Smoke testing is non-exhaustive software testing aimed at ensuring that the most crucial functions of an application meet expectations, but skipping on the finer details. Smoke testing your deployments simply means having a test suite you can run very quickly against your production environment just after your deployment to make sure you've not broken everything.

You can write your smoke tests with whatever tool you choose, but I'm going to show some simple examples using the Ruby tool: Cucumber. I've found Cucumber useful for smoke tests as it makes it very simple for everyone involved, including project manager and business stakeholders, to understand what is being tested. Cucumber is useful because smoke tests need to be very fast and targeted to be useful, which means making judgements about what is critical, which requires a common language for communicating what is critical and how it should be tested.

I'm going to run this example using jruby in order to use another great tool, Celerity, which is a jruby wrapper around a headless browser. You don't have use Celerity to do this; lots of people use the Webrat library to make web requests instead. I like Celerity because it can execute the javascript on a page, meaning you can test more complex applications and more complete functionality.

An Example

I'm going to show a real world test from FreeAgent, which checks that the login box appears when you click the login button on the homepage. This would be just one part of a larger smoke test suite but is hopefully a good simple but non-trivial example.

First we need a few dependencies. Here are the instructions for installing on a recent version of Ubuntu, although any system you can run jruby on should be fine.

apt-get install jruby
jruby -S gem update --system
jruby -S gem install jruby-openssl gherkin cucumber celerity rspec

Next we create a cucumber feature file in features/homepage.feature, which describes in a structured but human readable format exactly what we're testing.

Feature: Homepage
  So we can keep existing users happy
  Visitors to the site
  Should be able to login

Scenario: Check login box appears when login button is clicked
    Given I'm on the homepage
    When I click the login button
    Then I should see the login box

You don't have to use cucumber for writing smoke tests, but I find it useful because I can easily discuss what is being tested with other non-developers simply by sharing the feature file contents (above).

Next we write the actual code that makes the test work in features/steps/homepage.rb. I've included everything in one file for simplicities sake but in a larger example you would probably separate out utility functions and configuration from the step code. For a larger test suite you'll also find that you can reuse many steps by passing in arguments from the features files.

require 'rubygems'
require 'celerity'
require 'rspec'

BROWSER = Celerity::Browser.new
TIMEOUT = 20

# this is a simple utility function I use to find content on a page
# even if it might not appear straight away
def check_for_presence_of(content)
  begin
    timeout(TIMEOUT) do
      sleep 1 until BROWSER.html.include? content
    end
  rescue Timeout::Error
    raise "Content not found in #{TIMEOUT} seconds"
  end
end

Given /^I'm on the homepage$/ do
  BROWSER.goto("http://www.freeagentcentral.com")
end

When /^I click the login button$/ do
  check_for_presence_of "Log In"
  BROWSER.div(:id, "login_box").visible?.should == false  
  BROWSER.link(:id, "login_link").click
end

Then /^I should see the login box$/ do
  BROWSER.div(:id, "login_box").visible?.should == true
end

To run the feature we've just created just run the following command in the directory where you created the files - cucumber looks for a 'features' directory.

jruby -S cucumber

This should output test results showing what ran and whether it passed:

Feature: Marketing Site
  So we can keep existing users happy
  Vistors to the site
  Should be able to login

Scenario: Check login box appears when login button is clicked # features/homepage.feature:6
    Given I'm on the homepage             # features/steps/homepage.rb:1
    When I click the login button         # features/steps/homepage.rb:5
    Then I should see the login box       # features/steps/homepage.rb:11

1 scenario (1 passed)
3 steps (3 passed)
0m8.411s

Cucumber provider a number of other output formats that might also be useful (html, etc), and Cucumber Nagios has an output formatter for the nagios plugin format, too:

% jruby -S gem install cucumber-nagios
% jruby -S cucumber --format Cucumber::Formatter::Nagios
CUCUMBER OK - Critical: 0, Warning: 0, 3 okay | passed=3; failed=0; nosteps=0; total=3

Start Simple

A more complex example might step through a multi-stage form to test the purchase of a product, or it could conduct a series of searches to check a search index has been populated. Smoke testing and Cucumber are not just for web apps either. You should be testing all of your important services and systems. Unlike lower level testing you want to touch as many individual parts of the app as possible, including testing that third party API's or parts of your infrastructure are up and running. You definitely don't want to mock out your database calls and then find that the app actually fails due to a problem with the database coming back up after a deployment.

Once you have a smoke test you can run manually each time you deploy, you can take the next step: automation. Running the smoke tests automatically as part of whatever deployment mechanism you have might be useful, logging the results or integrating the output into a reporting tool like nagios might work well for your team, too. Automated deployment followed by a smoke test failure could invoke an automated rollback.

Deployment actions are still a time of higher-than-average risk for most projects, and strong smoke tests are important regardless of your deployment frequency or project size. If you have only a few machines, then smoke testing might tell you that you need to roll back a deployment immediately. If you have a larger infrastructure, then smoke testing a newly upgraded application server before putting it back into a load balancer rotation could save you from any ill effects in production at all.

Further reading:

December 16, 2010

Day 16 - Introduction to LVM

This was written by Ben Cotton (@funnelfiasco)

Logical volume management (LVM) is not a new concept -- it first appeared in Linux in 1998 and had existed in HP-UX before then. Still, some sysadmins, new and old, aren't familiar with it. LVM is a form of storage virtualization that allows for more configuration flexibility than the traditional on-disk partitions. In fact, LVM is a kind of anti-partitioning, where multiple devices can be grouped together. In this article, we'll assume you've got a spare machine or VM to follow along on. If not, you can use losetup to create file-based "disks" (see the man page for instructions).

LVM setup starts with physical volumes. You mark disk devices (drives or partitions) as physical volumes with the pvcreate command. Physical volumes are then grouped together into one or more volume groups with vgcreate. It's most common to create a single volume group, but there are cases when multiple volume groups might be desirable. For example, one vg may be created on a solid-state drive to use for read-mostly data, with spinning disk(s) in a separate vg for more volatile data.

# Initialize two drives for use in LVM
pvcreate /dev/sda1 /dev/sdb1
# Create a single volume group called "myvg"
vgcreate myvg /dev/sda1 /dev/sdb1

Once the vg is created, you can generally think of it as a single disk. Instead of partitioning it as you would with a traditional flat disk, space is carved up by creating logical volumes with lvcreate. After a lv is created, any OS-supported file system can be made on it. (Note: it's important to consider your use case when selecting the file system to put on an lv. Not only do file systems have different performance characteristics, but if you want to grow or shrink the file system later, you'll have to use a file system that supports such operations. XFS and JFS in particular do not support shrinking.)

# Create a 10 GB logical volume for MySQL in myvg
lvcreate -L 10G -n mysql myvg
# Create a 1 TB logical volume for MythTV in myvg
lvcreate -L  1T -n mythtv myvg
# file system creation omitted

Instead of the traditional /dev/sdxn nomenclature, the logical volumes are available as /dev/mapper/$vgname-$lvname or /dev/$vgname/$lvname (e.g. /dev/mapper/vg00-usr or /dev/vg00/usr). This lack of numbering hints at the most beneficial feature of LVM -- the ability to flexibly re-allocate disk space. In traditional partitions, you generally create the partition layout you think you need and then hope your needs don't change. Although it's possible to re-configure a disk after it's been used, it can get messy, and is generally very difficult to do on a live system. With LVM, you can start out by provisioning a small amount of disk space and then growing the file systems as needed.

One real-life example is the case of a file server. In a previous job, we had a many-TB file server which groups in the department purchased space on. Five slices were available on each drive (actually a LUN from a SAN), and if a group outgrew the LUN it was on, we had to split their data across two LUNs. Additionally, if two groups were on the same LUN, we'd have to shift data around as they competed for space (which happened a lot). Had we used LVM instead, each group could have an LV in the correct size and they'd only compete with themselves.

In my current job, we leave most of the disk space on infrastructure servers unallocated. As the need for a particular file system grows, we grow that file system. Does the new temperature monitoring system blow up the /var file system? Grow it. Need to add more applications to /opt? Grow it. The flexibility LVM provides allows us to quickly adapt to the needs of the research we support.

For file systems that support online resizing, growing an lv is a simple process. The first step is to check to see if you've got enough disk space available by looking at the "Free PE / Size" line of the output from vgdisplay. The lv is resized with the lvresize command. Absolute and relative sizes can be used, and unit abbreviations like G and T are supported. After the lv has been resized, the file system still needs to be grown the appropriate tools (e.g. resize2fs).

# All your data[base] are belong to us. Need more space for MySQL
lvresize -L +5G /dev/myvg/mysql

Once the volume group is completely allocated, lvresize can be used to shrink an overgrown lv after the file system has been shrunk. Additional disks or partitions can be added to the volume group as well. After pvcreate has been run as above, the vgextend command can be used to add the physical volume to the volume group. This makes LVM somewhat similar to JBOD in that it can take disks of various sizes and combine them into a single usable unit.

# There's nothing good on TV. Shrink the MythTV lv
lvresize -L -10 /dev/myvg/mythtv
# We added another disk, put it in myvg
pvcreate /dev/sdc1
vgextend myvg /dev/sdc1

Flexibility isn't the only advantage that LVM provides sysadmins. LVM also has optional snapshots. If there's a volatile file system that doesn't lend itself to live backups (e.g. our database), an LVM snapshot volume can be used to allow the backup to run. Snapshot volumes generally only need to be a small fraction of the original lv (10-15% is a number I've seen frequently), but the key point is that the snapshot lv size must be large enough to hold the changes that happen to the original snapshot. Thus it is better to overestimate the size of the snapshot volume.

# Create a snapshot volume for our MySQL file system
lvcreate -L 2G -s -n dbbackup /dev/myvg/mysql
# Mount the snapshot for the backup program (tar? rsync?)
mount /dev/myvg/mysql /mnt

Although LVM can be set up on software RAID (md) devices, it also has some built-in RAID-like features. If the volume group has at least three devices, mirroring can be used for logical volumes (the third device is a log, with a live copy on one device and a mirrored copy on the second device). To use mirroring, the logical volume needs to be created with the -m or --mirrors option, which take the argument n-1 for n copies.

# Create an important data volume with two copies
lvcreate -L 50G -m 1 -n important myvg

LVM also has a striping feature which allows the logical volume to be striped across two or more devices. Unlike the striping in RAID 5/6, LVM striping has no checksum and therefore provides no data protection. However, because I/O operations are spread across multiple devices, performance is improved. The number of stripes can vary between 2 and the number of devices and is specified by -i or --stripes. The size of the stripe (in KB) is specified with -I or --stripesize and must be a power of 2.

# Create a 100GB 3-striped lv
lvcreate -L 100G -i 3 -I 4 -n fastlv myvg

LVM isn't all sunshine and rainbows, though. In the past, LVM support was not baked in to initrd files and so it had to be manually included after a kernel update. That's not as much of a concern anymore as most distributions include LVM support, but users of custom kernels should make sure the initrd contains LVM support if the root partition is on a logical volume. Because of this historical issue, many admins still opt for paranoia and put / (or at least /boot) on a traditional partition. Additionally, LVM presents an additional level of abstraction that can make recovery via Knoppix or a similar method more difficult. Still, LVM offers a great deal of flexibility that makes it indispensable to the system administrator.

Further reading:

December 15, 2010

Day 15 - Down the 'ls' Rabbit Hole

By: Adam Fletcher (@adamfblahblah)

From ls(1) to the kernel and back again.

Too often sysadmins are afraid to dive into the source code of our core utilities to see how they really work. We're happy to edit our scripts but we don't do the same with our command line utilities, libraries, and kernel. Today we're going to do some source diving in those core components. We'll answer the age-old interview question, "What happens when you type ls at the command line and press enter?" The answer to this question has infinite depth, so I'll leave out some detail, but I'll capture the essence of what is going, and I'll show the source in each component as we go. The pedants in the crowd may find much to gripe about but hopefully they'll do so by posting further detail in the comments.

Requirements

It'll be helpful if you install the source on your machine for the software we'll be looking at. Below are the commands I used to get the source for the needed packages on Ubuntu 9.10, and similar packages are available for your Linux distribution.


apt-get install linux-source 
apt-get source coreutils 
apt-get source bash
apt-get source libc6
apt-get install manpages-dev

I'm using linux-source version 2.6.31.22.35, coreutils (for the code to ls) version 7.4-2ubuntu1, bash version 3.5.21, and libc6 version 2.10.1-0ubuntu18, and finally manpages-dev to get the programmer's man pages.

Starting Out - strace & bash

One of the most useful tools in the sysadmin's arsenal is strace, a command that will show you most of the standard library and system calls a program makes while it executes. We'll use this tool extensively to figure out what code we are looking for in each component.

Let's start by strace'ing bash when it runs ls. To do so, we'll start a new instance of bash under strace. Note that I'll be cutting the output of strace down a lot in the post for readability.

adamf@kid-charlemagne:~/foo$ strace bash
execve("/bin/bash", ["bash"], [/* 30 vars */]) = 0

[... wow that's a lot of output ...]


write(2, "adamf@kid-charlemagne:~/foo$ ", 29adamf@kid-charlemagne:~/foo$ ) = 29
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0,

... and that's where the output stops. If you're new to strace the key to reading it is to make liberal use of man pages to figure out what each library call does. Be aware that the relevant pages you want are in section 2 of the man pages, so you'll need to do man 2 read to find the page on read; this is because many of the system functions have the same name as regular commands that are found in chapter 1 of the man pages.

The read call is waiting for input on file descriptor 0, which is standard input. So we type ls and hit enter (you'll see more read & write calls as you type).

There's a lot of output, but we know we want to see ls related output, so let's do the simple thing and look at the lines that have ls in them:

stat("/usr/local/sbin/ls", 0x7fff03f1fd60) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/ls", 0x7fff03f1fd60) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/ls", 0x7fff03f1fd60)    = -1 ENOENT (No such file or directory)
stat("/usr/bin/ls", 0x7fff03f1fd60)     = -1 ENOENT (No such file or directory)
stat("/sbin/ls", 0x7fff03f1fd60)        = -1 ENOENT (No such file or directory)
stat("/bin/ls", {st_mode=S_IFREG|0755, st_size=114032, ...}) = 0
stat("/bin/ls", {st_mode=S_IFREG|0755, st_size=114032, ...}) = 0

If we man 2 stat we see that stat returns information about a file if it can find it, and an error if it can't (much more on stat later). In this case what bash is doing is searching my $PATH environment variable in hopes of finding an executable file with the name ls. Bash will stat every directory in my $PATH, and if it can't find the file it returns command not found. In this case, Bash found ls in /bin, and then that's the last we see of the string ls in our output.

We don't see ls in our output anymore because once Bash knows it can execute the program it spawns a child process to execute that program, and we haven't told strace to follow children of the command it is tracing. It's the next few lines of strace that give this spawning away:

pipe([3, 4])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f2c853217c0) = 30125

If we man 2 pipe and man 2 clone we see that bash is creating a pipe (two file descriptors that can be read and written to; this way a shell can link commands input and output together when you give the shell a | character) and clone'ing itself so that there are two copies of bash running. Remember, every UNIX process is a child of another process, and a brand new process starts out as a copy of its parent. So when does ls actually happen? Let's strace ls and find out!

adamf@kid-charlemagne:~/foo$ strace ls
execve("/bin/ls", ["ls"], [/* 30 vars */]) = 0

That first line is the key. execve is the library call to load and run a new executable. Once execve runs we're actually ls (well, the loader runs first, but that's another article). Interestingly, the call to execve is in the bash source code, not the ls source code. Let's find it in the bash code:

adamf@kid-charlemagne:/usr/src/bash-4.0/bash-4.0$ find . | xargs grep -n "execve ("
./builtins/exec.def:201:  shell_execve (command, args, env);
./execute_cmd.c:4323:   5) execve ()
./execute_cmd.c:4466:      exit (shell_execve (command, args, export_env));
./execute_cmd.c:4577:  return (shell_execve (execname, args, env));
./execute_cmd.c:4653:/* Call execve (), handling interpreting shell scripts, and handling
./execute_cmd.c:4656:shell_execve (command, args, env)
./execute_cmd.c:4665:  execve (command, args, env);

If we look at line 4323 in execute_cmd.c we see this helpful comment:

/* Execute a simple command that is hopefully defined in a disk file
somewhere.

1) fork ()
2) connect pipes
3) look up the command
4) do redirections
5) execve ()
6) If the execve failed, see if the file has executable mode set.
If so, and it isn't a directory, then execute its contents as
a shell script.
[...]
*/

And looking at line 4665 we do see the call to execve. Take a look at the code around execve - it's a bunch of error handling but nothing too hard to understand. What's interesting is what is not there; the code exists only to handle errors and nothing to handle success. That is because execve will only return if there's a failure, which makes sense - a successful call to execve means we're running something completely different!

Look around execute_cmd.c at the code around calls to shell_execve and you'll see that that code is fairly straightforward.

Inside ls(1)

Let's look at what ls is doing by creating a single file in our directory and ls'ing that file under strace.

adamf@kid-charlemagne:~/foo$ touch bar
adamf@kid-charlemagne:~/foo$ strace ls bar
execve("/bin/ls", ["ls", "bar"], [/* 30 vars */]) = 0

Interesting! We can see that bar is now being passed to our execve call. Let's keep looking at the strace output to find bar:

stat("bar", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
lstat("bar", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f467abbe000
write(1, "bar\n", 4bar
)                    = 4

Right at the end of the strace output we see bar a few times. It looks like bar gets passed to stat, lstat, and write. Working backwards, we can man 2 write to figure out that write sends data to a file descriptor, in this case standard out, which is our screen. So the call to write is just ls printing out bar. The next two library calls, stat and lstat, share a man page, with the difference between the commands being that lstat will get information on a symbolic link while stat will only get information on a file. Let's look in in the ls source code for these calls to see why ls does both lstat and stat:

adamf@kid-charlemagne:/usr/src/coreutils-7.4/src$ grep -n "stat (" ls.c
967:      assert (0 <= stat (Name, &sb));       \ 
2437:      ? fstat (fd, &dir_stat)
2438:      : stat (name, &dir_stat)) < 0)
2721:     err = stat (absolute_name, &f->stat);
2730:         err = stat (absolute_name, &f->stat);
2749:     err = lstat (absolute_name, &f->stat);
2837:         && stat (linkname, &linkstats) == 0)

That call to lstat stands out amongst the other calls, and so it is a pretty good guess that lstat happens for some exceptional reason that programmer would notate with a comment. Looking at line 2749 in ls.c we see an interesting comment a few lines above:

         /* stat failed because of ENOENT, maybe indicating a dangling
             symlink.  Or stat succeeded, ABSOLUTE_NAME does not refer to a
             directory, and --dereference-command-line-symlink-to-dir is
             in effect.  Fall through so that we call lstat instead.  */
        }

    default: /* DEREF_NEVER */
      err = lstat (absolute_name, &f->stat);
      do_deref = false;
      break;
    }

That comment means that if we're not talking about a directory and stat has already succeeded, we need to see if we are looking at a symlink. We can see that this is true by ls'ing a directory under strace:

adamf@kid-charlemagne:~/foo$ strace ls /home/adamf/foo/
[...]
stat("/home/adamf/foo/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/home/adamf/foo/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fcntl(3, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
getdents(3, /* 3 entries */, 32768)     = 72
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f873dda4000
write(1, "bar\n", 4bar

Note that there was no call to lstat this time.

Where We Are Going There Is No strace

It is time to bid our friend strace a fond farewell as he doesn't have what it takes to show us what stat is doing. For that we need to look into the standard library, or as it is commonly known, libc.

The libc code provides a common API for UNIX programs, and a portion of that API is the system calls. These are functions that provide a way for a programmer to ask the kernel for a resource that is managed by the kernel, including the resource we're interested in: the filesystem. The code we'd like to look at is for the system call stat. However, because kernels are very dependent on the hardware architecture they run on, and libc needs to talk to the kernel, much of the code you'll find in the libc source organized by architecture. This makes finding the code for stat tricky; if we look in io/stat.c we see basically a single line of code that calls a function called __xstat. If we find . -name xstat.c we'll see that we want ./sysdeps/unix/sysv/linux/i386/xstat.c, which is the implementation of stat for Linux on i386.

The code in xstat.c that isn't a reference to a C #include looks like:

return INLINE_SYSCALL (stat, 2, CHECK_STRING (name), CHECK_1 ((struct kernel_stat *) buf));

And:

result = INLINE_SYSCALL (stat64, 2, CHECK_STRING (name), __ptrvalue (&buf64));

Reading the comments in the code we can see that stat64is for 64-bit platforms. We'll stick to 32-bits for now, but either way we need to figure out what INLINE_SYSCALL is. A convention in C programming is that FUNCTIONS IN ALL CAPS are pre-processor macros, which means you can typically find out what those macros are by grep'ing for define <macroname>:

adamf@kid-charlemagne:/usr/src/eglibc-2.10.1/sysdeps/unix/sysv/linux/i386$ grep -n "define INLINE_SYSCALL" *
sysdep.h:348:#define INLINE_SYSCALL(name, nr, args...) \

At first, the code we find at line 348 in sysdep.h looks confusing:

#define INLINE_SYSCALL(name, nr, args...) \ 
({                                                                          \ 
    unsigned int resultvar = INTERNAL_SYSCALL (name, , nr, args);             \ 
    if (__builtin_expect (INTERNAL_SYSCALL_ERROR_P (resultvar, ), 0))         \ 
    {                                                                       \ 
        __set_errno (INTERNAL_SYSCALL_ERRNO (resultvar, ));                   \ 
        resultvar = 0xffffffff;                                               \ 
    }                                                                       \ 
    (int) resultvar; })

Looking at the code the call to INTERNAL_SYSCALL stands out - it appears that all INLINE_SYSCALL is doing is calling INTERNAL_SYSCALL. Conveniently we can scroll down in sysdep.h to find the definition of INTERNAL_SYSCALL:

/* Define a macro which expands inline into the wrapper code for a system
call.  This use is for internal calls that do not need to handle errors
normally.  It will never touch errno.  This returns just what the kernel
gave back.

The _NCS variant allows non-constant syscall numbers but it is not
possible to use more than four parameters.  */
#undef INTERNAL_SYSCALL
#ifdef I386_USE_SYSENTER
# ifdef SHARED
#  define INTERNAL_SYSCALL(name, err, nr, args...) \

... but it appears to define INTERNAL_SYSCALL a few times, and I'm not sure which one is actually used.

A good practice in a situation like this is to stop looking at the code and instead take some time to understand the concept the code is trying to implement. Googling for something like i386 system calls linux gets us a to (Implementing A System Call On i386 Linux)[http://tldp.org/HOWTO/html_single/Implement-Sys-Call-Linux-2.6-i386/] which says:

A system call executes in the kernel mode. Every system call has a number associated with it. This number is passed to the kernel and that's how the kernel knows which system call was made. When a user program issues a system call, it is actually calling a library routine. The library routine issues a trap to the Linux operating system by executing INT 0x80 assembly instruction. It also passes the system call number to the kernel using the EAX register. The arguments of the system call are also passed to the kernel using other registers (EBX, ECX, etc.). The kernel executes the system call and returns the result to the user program using a register. If the system call needs to supply the user program with large amounts of data, it will use another mechanism (e.g., copy_to_user call).

Okay, so I think the implementation of INTERNAL_SYSCALL we'll want will have 0x80 in it and some assembly code that puts stuff in the eax register (newer x86 machines can use sysenter instead of int 0x80 to make syscalls). Line 419 in sysdep.h does the trick:

# define INTERNAL_SYSCALL(name, err, nr, args...) \ 
({                                                                          \ 
    register unsigned int resultvar;                                          \ 
    EXTRAVAR_##nr                                                             \ 
    asm volatile (                                                            \ 
    LOADARGS_##nr                                                             \ 
    "movl %1, %%eax\n\t"                                                      \ 
    "int $0x80\n\t"                                                           \ 
    RESTOREARGS_##nr                                                          \ 
    : "=a" (resultvar)                                                        \ 
    : "i" (__NR_##name) ASMFMT_##nr(args) : "memory", "cc");                  \ 
    (int) resultvar; })

If we go back to xstat.c we see that the name we pass to INTERNAL_SYSCALL is stat, and in the code above the name argument will expand from __NR_##name to __NR_stat. The web page we found describing syscalls says that syscalls are represented by a number, so there has to be some piece of code that turns __NR_stat into a number. However, when I grep through all of the libc6 source I can't find any definition of __NR_stat for i386.

It turns out that the code that translates __NR_stat into a number is inside the Linux kernel:

adamf@kid-charlemagne:/usr/src/linux-source-2.6.31$ find . | grep x86 | xargs grep -n "define __NR_stat"
./arch/x86/include/asm/unistd_64.h:23:#define __NR_stat             4
./arch/x86/include/asm/unistd_64.h:309:#define __NR_statfs              137
./arch/x86/include/asm/unistd_32.h:107:#define __NR_statfs       99
./arch/x86/include/asm/unistd_32.h:114:#define __NR_stat        106
./arch/x86/include/asm/unistd_32.h:203:#define __NR_stat64      195
./arch/x86/include/asm/unistd_32.h:276:#define __NR_statfs64        268

The Amulet Of Yendor: Inside The Kernel

The syscall number definitions being inside the kernel makes sense, as the kernel is the owner of the syscall API and as such will have the final say on what numbers get assigned to each syscall. As we're running on 32-bit Linux, it appears the syscall number that libc is going to put in eax is 106.

The table in unistd_32.h is great (look at all those syscalls!) but it doesn't tell us where the code for handling a call to stat actually lives in the kernel. find is our friend again:

adamf@kid-charlemagne:/usr/src/linux-source-2.6.31$ find . -name stat.c
./fs/stat.c
./fs/proc/stat.c

Well that was easy. Opening up fs/stat.c we find what we're looking for:

SYSCALL_DEFINE2(stat, char __user *, filename, struct __old_kernel_stat __user *, statbuf)
{
        struct kstat stat;
        int error;

        error = vfs_stat(filename, &stat);
        if (error)
                return error;

        return cp_old_stat(&stat, statbuf);
}

Looks like this just a wrapper around vfs_stat, which is also in stat.c and is a wrapper around vfs_statat, which again is in stat.c and is wrapper around two functions, user_path_at() and vfs_getattr(). We'll ignore user_path_at() for now (it figures out if the file exists) and instead follow vfs_getattr():

int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
{
        struct inode *inode = dentry->d_inode;
        int retval;

        retval = security_inode_getattr(mnt, dentry);
        if (retval)
                return retval;

        if (inode->i_op->getattr)
                return inode->i_op->getattr(mnt, dentry, stat);

        generic_fillattr(inode, stat);
        return 0;
}

One thing that is helpful to do in a case like is to look back at any documentation I have on the function whose implementation I'm tracking down, which in this case is the library call to stat. Back to man 2 stat we see:

   All of these system calls return a stat structure, which contains the following fields:

       struct stat {
           dev_t     st_dev;     /* ID of device containing file */
           ino_t     st_ino;     /* inode number */
           mode_t    st_mode;    /* protection */
           nlink_t   st_nlink;   /* number of hard links */
           uid_t     st_uid;     /* user ID of owner */
           gid_t     st_gid;     /* group ID of owner */
           dev_t     st_rdev;    /* device ID (if special file) */
           off_t     st_size;    /* total size, in bytes */
           blksize_t st_blksize; /* blocksize for file system I/O */
           blkcnt_t  st_blocks;  /* number of 512B blocks allocated */
           time_t    st_atime;   /* time of last access */
           time_t    st_mtime;   /* time of last modification */
           time_t    st_ctime;   /* time of last status change */
       };

So our vfs_getattr function is trying to fill out these fields, which must be in the struct kstat *stat argument to vfs_getattr. vfs_getattr tries to fill out the stat struct in two ways:

    if (inode->i_op->getattr)
            return inode->i_op->getattr(mnt, dentry, stat);

    generic_fillattr(inode, stat);

In the first attempt to fill stat, vfs_getattr checks to see if this inode struct has a special function defined to fill the stat structure. Each inode has an i_op struct which can have a getattr function, if needed. This getattr function is not defined in fs.h but rather is defined by the specific file system the inode is on. This makes good sense as it allows the application programmer to call libc's stat without caring if the underlying file system is ext2, ext3, NTFS, NFS, etc. This abstraction layer is called the 'Virtual File System' and is why the syscall above is prefixed with 'vfs'.

Some filesystems, like NFS, implement a specific getattr handler, but the filesystem I'm running (ext3) does not. In the case where there is no special getattr function defined vfs_getattr will call generic_fillattr (helpfully defined in stat.c) which simply copies the relevant data from the inode struct to the stat struct:

void generic_fillattr(struct inode *inode, struct kstat *stat)
{
        stat->dev = inode->i_sb->s_dev;
        stat->ino = inode->i_ino;
        stat->mode = inode->i_mode;
        stat->nlink = inode->i_nlink;
        stat->uid = inode->i_uid;
        stat->gid = inode->i_gid;
        stat->rdev = inode->i_rdev;
        stat->atime = inode->i_atime;
        stat->mtime = inode->i_mtime;
        stat->ctime = inode->i_ctime;
        stat->size = i_size_read(inode);
        stat->blocks = inode->i_blocks;
        stat->blksize = (1 << inode->i_blkbits);
}

If you squint a little bit at this struct you'll see all the fields you can get out of a single ls command! Our adventure into the kernel has yielded fruit.

Just One More Turn...

If you'd like to keep going, the next thing to figure out is how the inode struct gets populated (hint: ext3_iget) and updated, and from there figure out how the kernel reads that data from the block device (and then how the block device talks to the disk controller, and how the disk controller finds the data on the disk, and so on).

I hope this has been instructive. Digging through the actual source code to a program isn't as easy as reading a summary of how something works, but it is more rewarding and you'll know how the program actually works. Don't be intimidated by an unknown language or concept! We found our way through the internals of the kernel with strace, find and grep, tools a sysadmin uses every day.

Other Resources