return to OCLUG Web Site
A Django site.
June 23, 2013

Brenda Butler
bjb
linuxbutler
» prism-break

A site that promotes alternatives to the software and cloud services that the US government (and others) uses as its own databases for mining.

http://prism-break.org

» EFF save podcasting campaign

The EFF is raising funds to pay for a challenge to Personal Audio’s patent, that they are using to squeeze podcasters. Now might be a good time to contribute, or even join and make regular contributions.

It looks like they’ve already raised their goal, but it doesn’t hurt to support or join the EFF in this and other causes.

Also, in order to make their case that the patent is baseless, they have issued a call for prior art. If you can contribute information to their case, that will also help to win this case.

» python's setuptools

One of the nice things about the Ottawa Python Author’s Group irc channel (oftc.net, #opag) is that they occasionally mention a great but under-advertized reference, like this one for setuptools:

http://peak.telecommunity.com/DevCenter/setuptools#basic-use

Thanks Ian!

February 24, 2012

Brenda Butler
bjb
linuxbutler
» awstats overview for sysadmins in a hurry

I needed to install awstats into an existing web installation recently, and finding the info needed for that was a bit annoying. The documentation I could find gets into the nitty gritty without giving you the big picture.

So here is the big picture for awstats. Because it is meant to be a “big picture”, I’m putting the configuration discusson last. I want to cover the overall view of how the system works before getting into configuration specifics.

Overview for awstats

awstats is a script for analyzing web server logs (it has been extended to analyze other types of logs like mail logs). It analyzes the logs, and stores the statistics, and you can see the results as graphs and charts on a web page. It is a venerable old tool (meaning it doesn’t quite fit into modern ways of handling log files, init scripts, script parameters or whatever), and also is designed to be lean so it can analyze quite large logfiles without bogging down the whole system (so the parser for the log lines is a bit simple and can get confused — this just means that line is thrown away but the rest of the file does get processed).

awstats.pl is a perl script. On my Debian system it got installed into /usr/lib/cgi-bin/awstats.pl. It can run as a cgi-bin script, but doesn’t have to.

After configuring, you use it in two stages:

  1. analyze the web server logs
  2. generate the results page.

Stage1

In stage 1, you run awstats.pl -update on the log file. This will produce a bunch of .txt files. There will be a .txt file for each time period (usually a month, but could be a year). There will generally be a .txt file series for each set of logfiles for a domain or virtualhost. If one log file spans two calendar months (say, covers Jan 28 — Feb 3), then it will produce two .txt files — one for January and one for February. When you process the next logfile (that might span Feb 3 — Feb 10), then no new .txt files will be created but the existing one for February will be updated.

Generally, the documentation assumes you will not be trying to “catch up” with your old log files. If you want to run your old log files through awstats, you will need to analyze them in chronological order, as awstats.pl is meant to run on the same logfiles over and over, and only process the new items since last session. It does this by storing a date and comparing each log record to the date to find out if it is old or new. I wrote a script that processes all the old log files in order (catchup.py).

Also, as far as I know, awstats doesn’t understand compressed files, so you will have to uncompress the logfiles before analyzing them. My script handles that too, but for that it needs write permission in the logfile directory.

The “chronological ordering” requirement implies that all the things that log to that log file better agree on the time. If one app is logging in local time (say -0500) and another is logging in UTC time, then generally only the records that are 5 hours later will be picked up by awstats.pl. The other records will be regarded as “corrupted” and ignored.

You can run this stage as a cgi script — but it can also be run by a cron job. Running it as a cron job means you don’t have to give your web server user permission to write to its DocumentRoot. Running it as a cgi script means you can see the very latest statistics (right up to the moment before you run the update) — but if you don’t do it often enough, you may miss analyzing some of the web server logs (eg, if they get rotated before you run awstats.pl on them). If that happens you have the relatively painful task of trying to fix the mess, or just abandoning stats for those months. You could run it as a cron job and still allow web users to run it as well, to avoid losing info when logs are rotated.

Stage 2

Once you have the web server logs digested into statistics in .txt files, then you can view the results. There are two ways to view the results:

  1. dynamically, via a cgi script
  2. statically, as pre-generated static html pages

To see the results dynamically, you need to configure your web server to call the cgi script.

To see the results statically, you need to make a place for the generated html, and then call awstats.pl -output for each report you might want to have available. There are quite a few reports, and you need to do it for each time period as well. awstats supplies a script (in my Debian system it went here:
/usr/share/awstats/tools/awstats_buildstaticpages.pl) that will generate all the reports for a given time period (i.e. month) so you just have to loop over the months. And virtualhosts, if you’re doing it for more than one web server/domain name.

Configuration Considerations

There are two things to configure with awstats: one is awstats itself (a config file for each “web site”) and one is the web server that you will use to view the results (if that is how you are going to view the results). Below, I discuss only configuration of awstats itself.

The awstats.pl script is configured with files in /etc/awstats/awstats.domainname.conf (again, this is for my Debian system). You would copy the awstats example conf file to a file with your domain name in the middle, eg:

cp /etc/awstats/awstats.conf /etc/awstats/awstats.sourcerer.ca.conf

And then edit the file to have the configuration you want.

awstats works best if you have a separate series of web server logfiles for each host for which you want graphs. If you have some virtualhosts, you might want to configure them each to have their own log files.

On my Debian system using apache2 for a web server, all the log files go into the same directory /var/log/apache2. The catchup.py script can handle this — and it would be easy to make a set of cron commands that will each update a different virtual host. At the moment, I have all the stats files and static html files going into one directory — one for stats, one for all the static html files. Maybe I should have a directory per virtualhost for the html files, though — they are getting quite numerous. A directory per virtualhost means you can more easily apply different access policies to the different domains.

The things I changed in the awstats.conf file for my purposes were:

LogFile
LogFormat
SiteDomain
HostAliases
DirData

There are lots of other options, but customizing those was enough to get some charts to start wtih.

LogFile is used if you don’t specify -LogFile on the command line. The catchup script uses the -LogFile argument on the command line, but the cron jobs that keep the stats updated can probably use the most recent logfile name domain-access.log.

LogFormat — it’s important to match the LogFormat to the actual format that your logfiles are written in, or every line will be classified as corrupted. I used format 1 for my apache2 logfiles. awstats has 4 predefined log formats, or you can specify a custom log format in exquisite detail field by field.

SiteDomain is the name of your site as your web server knows it

HostAliases is a list of other names for “self” for the web server (for the domain being analyzed).

DirData is the directory where the statistical output will go (all the .txt files).

The web sites I administer have hired a service to monitor themselves, and I added those user agents to the robots file (/usr/share/awstats/lib/robots.pm) in order to count them. Adding them to SkipHosts just meant they weren’t counted and didn’t show up in the stats at all.

Last words

Hopefully that will give you an idea of what you’re aiming for as you follow the other, more detailed explanations of how to set up awstats. Remember, when you lose the data in a logfile and have to leave it behind — it’s only stats. You’ll manage without them. The stats are approximate anyway — very little attempt is made in the program to give an exact account of the activity. Records are thrown away and not counted almost every time awstats is run — so don’t sweat it if you lose a log file or two on the way. Once the cron jobs are set up and time passes, you’ll get fairly good coverage of the activity of the web server. Keep tuning your web app (eg, ensure that times logged use the same timezone across all apps), look at the config options for awstats and tune your .conf files for more interesting reports, and eventually you’ll have a great resource for security monitoring, marketing analysis and for web site usability and effectiveness reviews.

In fact, you probably can just set up awstats and let it accumulate statistics over time — don’t bother with the catchup script. I did it because I did have old logs, and I wanted to see what a year’s worth of web log statistics looked like in awstats and other packages. It did help me with choosing a web log analysis package, and in choosing among the various extra options for awstats not discussed above — but it was also time-consuming.

November 25, 2011

Brenda Butler
bjb
linuxbutler
» excellent twisted documentation

I found a site where there is some not-just-good, but right-out excellent twisted documentation.

http://krondo.com/blog/?page_id=1327

November 20, 2011

Brenda Butler
bjb
linuxbutler
» debian kernel source build package bug

The official Debian kernel building tools are a thing of wonder. But, it didn’t do what I wanted, which was to build the exact version of the kernel that I’m running. I guess it is only ever used to build the latest version.

debian bug 649394

Here is the best documentation I found for this task. It refers to this which is also pretty good.

Also, reportbug failed (it was unable to get the list of open bugs for this package from the Bug Tracking System) — I used debian-bug in debian-el package (as noted at the bottom of this page). To actually send the mail, use ctrl-c ctrl-s in the mail buffer (or ctrl-c ctrl-c if you want to send the email and exit emacs).

UPDATE:

Maybe I misunderstood … maybe the -5 is not the patch level I’m aiming for. We shall see.

UPDATE:

No, the -5 is the “ABI” level, and has nothing to do with the Debian patch level. So there was no bug. I was supposed to build with all the patches. Live and learn …

October 29, 2011

Brenda Butler
bjb
linuxbutler
» Still can't start X under Xen 4.0 hypervisor

X starts and promptly exits.

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=646987

But only under the Xen Hypervisor.

This time the keyboard device is there even under the hypervisor, but xinit “cannot invoke xkbcomp” under the hypervisor. It’s there in /usr/bin/xkbcomp, but xinit cannot “invoke” it under the hypervisor while it can invoke it when it’s not running under the hypervisor. Mysterious.

September 27, 2011

Brenda Butler
bjb
linuxbutler
» power supply shopping

I had to shop for a power supply for my desktop. I visited a couple of computer part retailers and looked at their lists of power supplies — there are a lot of brands out there! and they make competing claims about what to look for. “One rail! Better than two or more!” “4 rails! Better than 1 rail!” “80+ Bronze” “780 Watts — Peak” “620 Watts — Continuous” So I looked for some help in interpreting all this.

I found jonnyguru.com. What a great site! They explain all the features you might want in a power supply, and explain why vendors make the competing claims (like 1 rail vs 4 rails) in plain English. Check out the FAQ for everything you need to know, concisely. They have some very nice reviews too — worth a read just to admire the review.

September 17, 2011

Brenda Butler
bjb
linuxbutler
» Software Freedom Day

Today is International Software Freedom Day. Enjoy your free software! While you still have it.

Consider becoming a supporting member of the Free Software Foundation and/or the Electronic Frontier Foundation if you want to do something more concrete towards supporting free software.

August 26, 2011

Brenda Butler
bjb
linuxbutler
» gcc function attributes

I recently worked on some c code on an embedded platform with this declaration:

__attribute__((critical)) void somefunc (void) {
    function body ...
}

I’d never seen anything like that.

It turns out that “critical” (and “atomic” and a few other keywords) are part of the open mp spec, where multiprocessing support is being built in to compilers. This has been moving into gcc since 2005 (at least, that’s when I see the mention of “omp” in the changelogs).

Dunno when it will be available on x86 though … it didn’t work on my desktop:

bjb@blueeyes:~/junk/foo$ gcc try1.c -o try1
try1.c:12: warning: 'critical' attribute directive ignored
try1.c:23: warning: 'critical' attribute directive ignored
bjb@blueeyes:~/junk/foo$ gcc --version
gcc (Debian 4.4.5-8) 4.4.5
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

bjb@blueeyes:~/junk/foo$

There was no warning on the customer platform, but it’s presence did not produce a different executable than source code without it.

July 5, 2011

Brenda Butler
bjb
linuxbutler
» Another wait for DSL service

The next “activation date” came and went, with no DSL service. The reason given is that my address did not exactly match the address on my phone bill. I said the address as it should be into the phone when I ordered the service, but the customer service rep(s) on the other end insisted on putting the letter after the number as a “unit” number or “suite” number. I said I don’t write 999 unit A, I write 999A. But they didn’t know how to enter that into their (or their supplier’s) system.

I wonder why that is, when I had DSL service through NCF, which uses TekSavvy for it’s upstream, and that worked first try?

Why is TekSavvy so certain this time that the activation will work, when it failed the last two times?

Why did TekSavvy decline my offer to fax my phone bill (with address) to their office?

If they’ve done something to ensure that it will work this time — why didn’t they do that the first time?

June 24, 2011

Brenda Butler
bjb
linuxbutler
» TekSavvy: Stumbling Out of the Gate

I’m having a rocky start with TekSavvy. In spite of ordering DSL service a while ago, the order got messed up and I’ve had to call a few times to try to sort it out. Today, NCF cutoff day, I find that the order has been so badly mangled that I have to start over from scratch and be internetless for a week.

I have spoken to 2 newbie customer service reps out of the three calls I’ve made (they volunteered the info). One insisted that the only way to switch between payment types was to start over (but he didn’t tell me that it would delay the start date). He didn’t cancel the first order … that confused TekSavvy no end, I even got a call from them asking about it. But the person who called didn’t make it better so that when I spoke to the next newbie (today), we had to start over - again.

June 19, 2011

Brenda Butler
bjb
linuxbutler
» My New ISP: TekSavvy

I’ve ordered a new DSL supplier and have cancelled the old one — the transfer date is June 24. So if I go offline June 24, that might be why. I’ll be back.

I haven’t got my new static IP address yet, nor my new IPv6 subnet. Stay tuned! Hopefully I’ll find out what they are before June 24 (so I can put them in DNS on time).

NCF (National Capital FreeNet) has been great — but I wanted a native IPv6 supplier. So, I’m trying out TekSavvy. TekSavvy is NCF’s upstream, as it happens.

I will try to stay in touch with NCF by visiting the fora and asking/answering questions there, if I see anything I can respond to.

June 16, 2011

Brenda Butler
bjb
linuxbutler
» Which PostgreSQL daemon does psql talk to?

The ways that psql can be configured to connect to a different port:

  1. compiled in default
  2. PGPORT environment variable
  3. —port or -p option
  4. .pgpass setting

If you are running more than one version of PostgreSQL, you might wonder which one the psql client will talk to by default.

(DJANGO-1-3)bjb@spidy:~$ bash
bjb@spidy:~$ echo $PGPORT

bjb@spidy:~$ unset PGPORT
bjb@spidy:~$ ls -la ~/.pgpass
-rw------- 1 bjb bjb 0 May 20 17:33 /home/bjb/.pgpass
bjb@spidy:~$ locate psql | egrep bin
/usr/bin/psql
/usr/lib/postgresql/8.4/bin/psql
bjb@spidy:~$ /usr/lib/postgresql/8.4/bin/psql template1
psql (8.4.8)
Type "help" for help.

template1=> \echo :PORT
5432
template1=> \q
bjb@spidy:~$ exit
(DJANGO-1-3)bjb@spidy:~$ 

Voila. The psql in /usr/bin is a perl script wrapper for the real psql. To find the “compiled-in” default port number, you can run the real psql without the command-line arg to change the port number, --port 5555 or -p 5555, and you also need to unset PGPORT (if it’s set). I have a .pgpass, but it’s empty so I didn’t have to do anything special for that. If you have a non-empty .pgpass, you might copy it aside before running psql if you want to try this test. Don’t forget to put it back when you’re done.

On my work machine, I had two versions of PostgreSQL running: 8.4 and 8.3. 8.3 was listening on 5432 and 8.4 was listening on 5433. psql was configured to go to port 5432 by default (and therefore PostgreSQL 8.3).

» giving a legacy table a primary key in postgres/django

I found this very helpful. It worked great with Django 1.3, PostgreSQL 8.3 in 2011/06:

And the penultimate step (filling in the primary key in existing rows) took around half a minute for almost 200,000 records on a not-particularly well-endowed laptop.

Summary:

CREATE SEQUENCE rcvh7_id_seq;
ALTER TABLE rcvh7 ADD id INT UNIQUE;
ALTER TABLE rcvh7 ALTER COLUMN id SET DEFAULT NEXTVAL('rcvh7_id_seq');
UPDATE rcvh7 SET id = NEXTVAL('rcvh7_id_seq');
ALTER TABLE rcvh7 ALTER COLUMN id SET NOT NULL;

June 9, 2011

Brenda Butler
bjb
linuxbutler
» unicodedata to the rescue

I have a database full of strings where the accented characters have been replaced by their non-accented equivalents, and a spreadsheet full of strings with accents in them. I’m supposed to look up the info in the database given the info in the spreadsheet.

I found this great stackoverflow post that helped me out:

title = u"some string with accented characters in it like b\xe9cancour"
import unicodedata
unicodedata.normalize('NFKD', title).encode('ascii', 'ignore')
'some string with accented characters in it like becancour'

Normalize with ‘NFKD’ will decompose each character in the string into its composing characters. For example, if there was an e with acute accent, it separates it into an e and an acute accent. The K part of NFKD ensures the ‘e’ is the simplest possible e (presumably if there is an ‘e’ in ASCII, it will prefer that one). Then the encode ('ascii', 'ignore') will drop all the non-ASCII characters, which by now are just the accents which have been separated from the rest of the letter.

Awesome. And it works in python 2.5.

June 8, 2011

Brenda Butler
bjb
linuxbutler
» Happy World IPv6 day!

Today is the day (starting from midnight GMT).

Some test sites:

June 4, 2011

Brenda Butler
bjb
linuxbutler
» identd in ipv6 under inetd

identd provides the “auth” service (see /etc/services). On Debian, it can be provided by any of several packages, including ident2, nullidentd and oidentd.

If you want to have an identd that can talk IPv6, you can choose oidentd.

If you are running it from inetd, you should configure your inetd to respond to IPv6 as well. I’m using openbsd-inetd, and the lines in /etc/inetd.conf to make it listen on both IPv4 and IPv6 for the auth service are:

auth stream tcp4 nowait root /usr/sbin/tcpd /usr/sbin/oidentd -I
auth stream tcp6 nowait root /usr/sbin/tcpd /usr/sbin/oidentd -I

Note the protocol, which specifies IPv4 or IPv6. Also note the -I option to oidentd, to make it read from stdin and write to stdout and to exit after answering one request (needed for inetd operation).

I briefly considered nullidentd, but the description made it sound like it would only ever return one static string. Not quite what I was looking for, and I didn’t investigate further.

April 28, 2011

Brenda Butler
bjb
linuxbutler
» sessions in django

I was looking at django sessions and was a bit confused until I read the source code.

django/contrib/sessions/models.py
django/contrib/sessions/backends/base.py
django/contrib/sessions/backends/db.py
django/contrib/sessions/backends/cache.py
django/contrib/sessions/backends/cached_db.py
django/contrib/sessions/backends/file.py

You might think the interesting file to look at is django/contrib/sessions/models.py, but really the “toplevel” session object is defined in base.py. The object in base.py SessionBase. It is a base class for the various session implementations.

If you’re using a database-backed SessionStore, then you’ll be using base.py, db.py and session.py. db.py uses session.py for the database model and database interaction. In a view, the session object in the request object that is supplied by the session middleware is actually a SessionStore object. That is the object that has the methods get_expiry_age, etc.

I wanted to get some info out of every session in a batch mode including expiry age, so I needed to traverse SessionStore.objects.all(), not Session.objects.all().

» sessions in django

I was looking at django sessions and was a bit confused until I read the source code.

django/contrib/sessions/models.py
django/contrib/sessions/backends/base.py
django/contrib/sessions/backends/db.py
django/contrib/sessions/backends/cache.py
django/contrib/sessions/backends/cached_db.py
django/contrib/sessions/backends/file.py

You might think the interesting file to look at is django/contrib/sessions/models.py, but really the “toplevel” session object is defined in base.py. The object in base.py SessionBase. It is a base class for the various session implementations.

If you’re using a database-backed SessionStore, then you’ll be using base.py, db.py and session.py. db.py uses session.py for the database model and database interaction. In a view, the session object in the request object that is supplied by the session middleware is actually a SessionStore object. That is the object that has the methods get_expiry_age, etc.

I wanted to get some info out of every session in a batch mode including expiry age, so I needed to traverse SessionStore.objects.all(), not Session.objects.all().