modperl_transaction_handling_across_multiple_machines

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.

To: Ged Haywood <ged@jubileegroup.co.uk>, modperl@apache.org
From: mkennedy@hssinc.com (Matthew Kennedy)
Subject: Re: mod_perl advocacy project resurrection
Date: Wed, 06 Dec 2000 15:33:47 -0600

Ged Haywood wrote:
> 
> Hi there,
> 
> This isn't a silly question.  At least I hope it isn't.
> 
> On Wed, 6 Dec 2000, Jeffrey W. Baker wrote:
> [snip,snip]
> > A modifies a row in X and adds a row to Y.  A commits X, which succeeds.
> > A commits Y, which fails.
> >
> > The only thing that Machine A can do now is send an email to the DBA
> >
> > "..." says the DBA,
> 
> Given that it's designed to fail sooner or later, are there good
> reasons why someone would put together a system in that way?

There's probably no reason one would _design_ a system like that per se.
However there are plenty of times it just _turns_out_ like that --
usually as the result of a system evolving through time. Another example
might be the B2B case of consulting your own DB etc. and then
communicating some change based on that to another organization's DB
system. I've seen that particular situation arrise many times.

===

To: "Jeffrey W. Baker" <jwbaker@acm.org>, modperl@apache.org
From: Eric Strovink <strovink@acm.org>
Subject: [OT] 1%, two-phase commits, etc.
Date: Wed, 06 Dec 2000 17:58:46 -0500

"Jeffrey W. Baker" wrote:

> Machine A is controlling a transaction across Machine X
> and Machine Y.  A modifies a row in X and adds a row to Y.
> A commits X, which succeeds.  A commits Y, which fails.

> A cannot guarantee a recovery on machine X because there
> might already be other transactions in flight on that
> record in that database.  A cannot just try to put the
> record back the way it used to be, because now the commit
> might fail on X.  The data is inconsistent.

As a couple others have noted, two phase (prepare-commit)
commits solve the above problem.  But two phase commits are
not a panacea.  They just move Jeffrey's problem elsewhere.
Suppose A (in a two phase commit implementation, the
transaction coordinator) prepares X and Y, but then dies
just after committing X, but before committing Y.  How does
X know that Y hasn't committed, and that he should roll
back?  He doesn't, unless we cons up some magic secondary
communication channel between X and Y.

One way to recover from the "A dies after telling X to
commit" problem is to shadow A with a mirror, A'.  A' takes
over for A, interrogates the servers to find out who has
committed the last transaction and who hasn't, and if
necessary completes the transaction or rolls it back
(assuming all writes are serialized through {A, A'}, so
there hasn't been any intervening activity to confuse
things).  Steady-state synchronization between A and A' is
straightforward, as is failure detection and recovery
fall-over.  In fact, if this is properly implemented, the
transaction processing system can keep going without a hitch
through any one failure, and the distributed dataset stays
consistent.

Unfortunately, in many transaction middleware systems, you
discover in the fine print that A' is actually a
semi-automated recovery process under the control of a human
administrator.  Human?  That would be Dork, down the hall,
the Certified Microsoft Solutions Fuckwad.  Feel safe.

But let's go back to the example, and stipulate that a
reasonable A' exists.  Are we now 100% consistent 100% of
the time?  No, in fact we're not.  Because after X commits
and A dies, but before A' realizes that A has died and
patches things up, any reader of X and Y could potentially
see an inconsistent view of the data. Do we therefore
serialize our reads through the transaction monitor, too?
With a distributed database, we have to, if we want a
guaranteed-consistent view.  Of course, we could choose not
to, for performance reasons.

Does all of this make your head spin just a bit?  Hence
Jeffrey's point.  There's a lot of margin for error, and the
more that's buried in mysterious middleware, the less
confident you should be.  If you can get away with a single
server, you dodge all these bullets.


===
the rest of The Pile (a partial mailing list archive)
doom@kzsu.stanford.edu