Esthetic Randomness

There are different things we might mean by the word "random": Some apps (e.g. cryptography) need a true, mathematical randomness, but for other apps (e.g. "randomly" choosing an image to display), a softer randomness is more appropriate.

My point is that for esthetic purposes, hard randomness isn't just overkill, it can be detrimental.

Think about audio "shuffle play". Sometimes when you press "shuffle" it will:

play tracks from the same album back-to-back
lead off with a track unrepresentative of the entire set

There's no reasons things like this shouldn't happen with truly random distributions -- and that shows that there can be problems with them.

The central fact:
Human beings have a very poor grasp of probability.

There are differences between human perception of randomness
and actual randomness, and because of that:

if you give them something really random,
they won't believe that it is.

Often, it doesn't register right, something seems off about it.

I used to play xgammon, and I often got the weird feeling that the programmer had rigged the dice: if the programmer wanted to convince me that they were perfectly honest, then the dice really would need need to be rigged.

In esthetic randomness we want:

streaks to be minimized or elimimated
special handling for end points of a series
- for many apps, they need to seem average
- but for some apps, they need to be forced to an extreme

Point (1) "no streaks"

This requires a "random" distribution with memory.
We want to simulate a world in which the Gambler's Fallacy is correct:

"Hm... red has come up a lot, I bet
that black is going to come up now."

Shuffle play sometimes gives you "streaks" -- a selection of similar things all in a row. But when you press "shuffle" you really don't want that to happen, you're asking for something more mixed up than that.

Point (2) "end points are special"

This is a little more subtle, but it comes up frequently. If the first card out of the deck is the Ace of Spades, they're going to wonder if you stacked the deck.

With music shuffle play, the first track is likely to be treated as representative of the entire set: If you're doing a classic rock shuffle, and the first track is Tiny Tim's "Tiptoe Through the Tulips", you may find people demanding you shut off your mix before it even gets started -- but if an outlier like that showed up as the twentieth track, it would be recognized as a odd pick, just a joke.

I've always been interested in using randomness for weird effects...

And I still think it's a cool technique -- it's one of the things that a designer-programmer can do that a designer usually can't.

A "real world" example:

I wrote a CPAN module "Text::Capitalize" to do title casing, but just for the hell of it (because the name was so general) I threw in some routines to do "weird effects", like: "wEiRD eFfECts".

   use Text::Capitalize qw(scramble_case);
   print scramble_case('It depends on what you mean by "mean"');
      iT dEpenDS On wHAT YOu mEan by "meAn".

My first try was "random_case", but I quickly saw the need for a "scramble_case":

The function "random_case" does a straight-forward randomization of capitalization so that each letter has a 50-50 chance of being upper or lower case (int(rand(2))). The function "scramble_case" performs a very similar function, but does a slightly better job of producing something "weird-looking".

Of the sixteen ways that the four letter word "word" can be capitalized, three of them are rather boring (so the obvious solution is 19% broken):


      boring         weird/whacky
      ---------      ---------
      word           worD   wORd   WOrd
      Word           woRd   wORD   WOrD
      WORD           woRD   WorD   WORd
                     wOrd   WoRd
                     wOrD   WoRD

To make it less likely that scramble_case will produce dull output when you want "weird" output, you need a modified probability distribution.

scramble_case records the history of previous outcomes, and tweaks the likelihood of the next decision in the opposite direction, back toward the expected average.

Plus: the probability that the first character of the input string will become upper-case has been reduced.

The actual code (in the style I used to write perl a decade ago):

# Function to provide a special effect: sCraMBliNg tHe CaSe
sub scramble_case {
   local $_;
   my $string = shift;
   my (@chars, $uppity, $newstring, $total, $uppers, $downers, $tweak);

   @chars = split /(?=.)/, $string;

   $uppers = 2;  # bias against initial upper (also avoids division by 0)
   $downers = 1;
   foreach (@chars) {

      $tweak = $downers/$uppers;
      $uppity = int( rand(1 + $tweak) );

      if ($uppity) {                  # "int(rand(2))" would generate
         $_ = uc;                     # 50/50 series of 0s and 1s,
         $uppers++;                   # here $tweak is a restoring
       } else {                       # force back to the middle
         $_ = lc;
         $downers++;                  # So here we want $tweak:
       }                              # to go to 1 when $uppers = $downers
   }                                  # to be larger than 1 if $downers > $uppers
   $newstring = join '', @chars;      # to be less than 1 if $uppers > $downers
   return $newstring;
}

Another example: web pages with "randomly" selected images.

I've been working on one recently DiagonalGrid.
This is a "Hipster Tourist's" guide to San Francisco.

The page for Soma:: http://diagonalgrid.com/sf/soma.html
I don't have that many images up for Soma yet, so it's easier to see the problems with "sameyness":: http://diagonalgrid.com/pics/sf/soma/

Even more obvious is with the "Public Transit" page, which has to reshuffle the deck to fill all available slots.: http://diagonalgrid.com/sf/public_transit.html
There are only four images for this page, right now:: http://diagonalgrid.com/pics/sf/public_transit/

Avoiding streaks:

The images can be classified in different ways;
they belong to multiple sets.

The page being displayed defines the primary set we're interested in, but just pulling images from that set isn't good enough:

By chance you can get clumps of images associated with other sets (this seems "samey", obscures the message).

So, we don't want streaks of external set membership.

Special handling of the endpoints:

We're more likely to get "representative" images if they don't have memberships in other sets: use those in the first slot (and possibly the last).
With a short deck, after a reshuffle we can get a duplicate adjacent image (last card of previous shuffle may be the first card of the next). One solution: bury that card if necessary.

In a photo series representing The Mission, you'll probably want some shots of graffitti, but not three or four of them in a row, in a series about the Tenderloin some shots of front doorway grills are okay, but a half-dozen is too much, and so on.

So, the database schema needs to accomodate ownership of an image by multiple overlapping sets, and when you choose images from one main set, we want to avoid adjacent picks associated with some other set.

My original schema was too simple, with each picture belonging to only one set:

An improved schema, where each picture can be included in multiple "series", one of which is associated with the displayed web page:

I've considered trying to package up this small insight as a CPAN module:
Acme::GamblersParadise
But it's a little hard to see how to make it useful in the general case --

Possibly I might come up with a drop-in replacment for "rand", such as "soft_rand":

  my $pick = int( soft_rand( $range ) + $zero_shift );

But choosing the right defaults for soft_rand would be tricky.

The full set of options might be:

   soft_rand( $range,
              $delta,    # granularity (how close is too close)
              $horizon,  # how far back to remember
              \$chi,     # CHI handle for persistant memory
            );

But this only covers one use case. What if streaks aren't defined by identity, but some other quantity associated with the item (a card's face value, membership in some external set, etc.)?

Most often custom programming is the solution.

Suggestions on how to solve these problems:

Think about avoiding streaks first, then worry about endpoint handling. Usually, the solution to the first problem suggests a way to solve the second.
It's often eaisier to start with a "sloppy" generator that almost gives you want you want, and then filter it's output -- if you don't like a pick, call it a do-over.
- Hard randomness is usually easily available (as is uniqueness) so you can start from there and filter it to avoid things that don't seem right.

Summary

To repeat, in esthetic randomness we want:

streaks to be minimized or elimimated
special handling for end points of a series
- for many apps, they need to seem average
- but for some apps, they need to be forced to an extreme

You need to think about what you want:
What will be percieved as a "streak"?
what will seem funny if it's used in first or last place?

"Harder" isn't always better.

Questions and Comments:

Couldn't I sell this to audio player manufacturers for a million dollars?: Damn, I forgot to patent it. And software patents are looking weaker than they used to, thanks to a recent court case.
Have I done any literature searches on this? It seems like the kind of thing that someone must've figured out already.: No, I haven't. I did some CPAN searches, and maybe a web search or two, without finding much about it -- but it would hardly be a surprise to find that there's some precedent for this material. In fact, I did run into one other programmer who had independantly come to the same conclusions that I had -- his application was a random selection of a series of quizzes, where he found that he couldn't begin or end with particularly hard or easy quizzes without being accused of stacking the deck. And a few people in the audience mentioned things that might be precedents: a mulit-CD player with a pseudo-random shuffle that worked better than the true-random ones, a slider feature in XMMS that would let you adjust how likely it is for the current track to come up again....
There were some suggestions on how one might get a general Acme::GamblersParadise to work. Perhaps: (1) an interface that would accept a callback to identify when to filter some output (2) Require the user to precisely specify the characteristics of the data structure: sequential attributes, non-sequential attributes, etc.: My guess is that this sort of thing might be workable, but it wouldn't leave much left for the core routines to do -- it most cases you might as well code a custom routine.