Database locking during kprops, MIT 1.8

Paul B. Henson henson at acm.org
Wed Jun 29 22:15:46 EDT 2011


On Wed, Jun 22, 2011 at 03:01:52PM -0700, John Devitofranceschi wrote:
> I was wondering if there was any more clarity to be had around 'issue
> 3' here. I am interested in a reliable way to detect this error
> condition on the slave server itself (or even reproducing it).

We switched to incremental about 6-8 months ago. Prior to the switch,
we'd have kadmin ops fail on a fairly regular basis due to lock
contention. Since the switch, we've only seen 1 update failure.

Being paranoid, we implemented a simple consistancy check to verify the
slaves were staying up to date. On the master server, a cron job runs
every two minutes and dumps the current master serial number into a
file:

*/2 * * * * /usr/sbin/kproplog -h -e 1 | /bin/grep "Last serial #" |
/bin/cut -d " " -f 5 > /var/run/krb5kdc_last_serial.new && /bin/mv
/var/run/krb5kdc_last_serial.new /var/run/krb5kdc_last_serial

There's a simple perl script running on the master that makes that
available over the network (access controlled by firewall rules):

-------------------------------------------------------------
#!/usr/bin/perl -w -T

use strict;

package PropMon;
use base qw(Net::Server::Fork);

my $port     = shift;
my $filename = shift;

if (!$port or !$filename) {
    print STDERR "propmon: must specify port and filename\n";
    print STDERR "usage: propmon.pl <port> <file>\n";
    exit 1;
}

sub process_request {
    if (open(FILE, "< $filename")) {
        print <FILE>;
        close FILE;
    }
    else {
        my $self = shift;
        $self->log("info", "error opening file $filename: $!");
    }
}

PropMon->run(
    port          => $port, 
    log_file      => "Sys::Syslog", 
    log_level     => 3, # 3 == LOG_INFO
    syslog_ident  => "propmon",
    syslog_logopt => "",
    background    => 1
);

1;
-------------------------------------------------------------

And a slightly more complicated one that runs on each slave (currently every 30
minutes), pulls over the master serial number, and makes sure the slave isn't
out of date. So far they never have been, but if for some reason replication
ever fails, we'll get alerted and go fix it...

-------------------------------------------------------------
#!/usr/bin/perl -W

use strict;
use Getopt::Long qw(:config no_ignore_case);
use Socket;

=head1 NAME

krb_check_repl.pl [--delta=n --repl_time=n --host=hostname --port=n]

=head1 DESCRIPTION

Compare master and slave database serial numbers, raising an alarm if
they go out of sync by too much.

The optional delta arugment specifies how much the master and slave are
allowed to differ in serial number. By default, delta is 10.

The optional --repl_time=n argument specifies the interval between
incremental propagation updates. By default, this is 120. This value
controls how long the script waits to compare serial numbers again if
they differ on a first comparison.

The optional --host=hostname argument specifies which server is
the master (or which server reports for the master).

The optional --port=n argument specifies the port that the client should
poll the master on.

=cut

my $delta = 10;
my $repl_time = 120;
my $master_hostname = "halfy";
my $master_port = "6623";

if (!GetOptions("delta=s"     => \$delta,
                "repl_time=s" => \$repl_time,
                "host=s"      => \$master_hostname,
                "port=s"      => \$master_port)) {
        error_exit("unable to parse command-line arguments");
}

# Get the serial number from the master.
my $master_sn = get_master_sn();
my $slave_sn = get_my_sn();

# It is possible for $slave_sn to be larger than $master_sn; this means
# that there are updates on the master side that haven't yet propagated
# to the file that the listener reads, but which have propagated to the
# slave via a recent sync operation. So $slave_sn > $master_sn implies
# that replication is working, and we can bail in that case.
if ($master_sn > $slave_sn) {
        my $prev_master_sn = $master_sn;
        my $prev_slave_sn = $slave_sn;

        sleep $repl_time;

        $master_sn = get_master_sn();
        $slave_sn = get_my_sn();

        if ($master_sn > $slave_sn) {
                if (($prev_slave_sn ne $slave_sn) and ($master_sn - $slave_sn > $delta)) {
                        print STDERR "Error: Kerberos replica out of synchronization\n";
                        print STDERR "       master sn: $master_sn, slave sn: $slave_sn\n";
                        exit(1);
                }
                elsif ($prev_slave_sn eq $slave_sn) {
                        print STDERR "Error: Kerberos replica out of syncronization, no apparent update progress seen\n";
                        print STDERR "       master sn = $master_sn, slave sn = $slave_sn\n";
                        exit(1);
                }
        }
}

sub get_master_sn {
        my $proto = getprotobyname("tcp");
        my $sock;

        if (!socket($sock, PF_INET, SOCK_STREAM, $proto)) {
                error_exit("failed to create socket: $!");
        }

        my $iaddr;
        if (!($iaddr = inet_aton($master_hostname))) {
                error_exit("failed to lookup $master_hostname");
        }

        my $paddr = sockaddr_in($master_port, $iaddr);

        if (!connect($sock, $paddr)) {
                error_exit("failed to connect to $master_hostname: $!");
        }

        my $sn;
        if (!($sn = readline $sock)) {
                error_exit("failed to read from socket");
        }
        close($sock);

        if ($sn eq "\n") {
                error_exit("got empty SN from master");
        }
        chomp($sn);

        if ($sn =~ m/\D/) {
                error_exit("got invalid sn ($sn) from master");
        }
        return $sn;
}

sub get_my_sn {
        my $cmd = '/usr/sbin/kproplog -h -e 1 | grep "Last serial #" | cut -d " " -f 5 |';

        if (!open(OUT, $cmd)) {
                error_exit("failed to get local serial number");
        }

        my $sn = <OUT>;

        if (!$sn) {
                error_exit("failed to get local serial number");
        }

        chomp($sn);
        return $sn;
}

sub error_exit {
        my ($msg) = @_;

        print STDERR "Error: " . $msg . "\n";
        exit(1);
}
--------------------------------------------------------------------


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  henson at csupomona.edu
California State Polytechnic University  |  Pomona CA 91768



More information about the Kerberos mailing list