[PREV - PATTERN_ACTION]    [TOP]

BACK_SCRATCH

                                             May 16, 2023

   From Steven Levy's "In the Plex" (2011), p. 17:

   "... But it wasn't at all obvious what
   linked *to* a page.  To find that out,
   you'd have to somehow collect a database
   of links that connected to some other
   page.  Then you'd go *backward*."

   "That's why Page called his system BackRub.
   'The early version of hypertext had a tragic
   flaw: you couldn't follow links in the other     Nelson's Xanadu project
   direction,' Page once told a reporter.           has been written out of
   'BackRub was about reversing that.'"             this history of "early
                                                    versions" of hypertext.









   p.16:

   "Having a human being determine the ratings
   was out of the question.  First, it was
   inherently impractical.  Further, humans          But it's humans all the
   were unreliable.  Only algorithms-- well          way down-- as presented
   drawn, efficiently executed, and based on         here, this is a quest for
   sound data-- could deliver unbiased results.      something like an
   So the problem became finding the right data      objective view of the web
   to determine whose comments were more             of information, but it
   trustworthy, or interesting than others."         can't be anything but an
                                                     "unbiased" rendering of
                                                     human biases.


                                               Missing from this story is
                                               yahoo, which was originally
                                               in the business of
                                               presenting a human-curated
                                               collection of links
                                               "yet another hierarchically
                                               organized o--"

                                                             Ontology?

                                               This is indeed difficult--
                                               certainly yahoo had difficulty
                                               maintaining quality over time--
                                               but not exactly undoable, at
                                               least not in the early days of
                                               the web.


                                         I have a theory that as web
                                         consolidation has progresed and the
                                         intelligence level of the average
                                         contribution continues to be
                                         diluted, that we're back in a
                                         regime where it might be a winning
                                         strategy to do collections of human
                                         curated links.

                                         To get *serious* material on a
                                         subject, you can probably write
                                         down a list of a few dozen sites to
                                         start.  The collections of material
                                         in some of those places may well be
                                         massive, but it's bound to be more
                                         tractable than spidering the entire
                                         web, and more to the point, the
                                         material is going to be *already
                                         indexed* in many cases.  Collating
                                         search results from the search
                                         features at individual sites would
                                         get you a lot of the way there.

                                                      Consider:

                                                        Google Scholar
                                                        Blekko

                                                And there's a need
                                                for an end-run
                                                around wikipedia's
                                                "nofollow" policy.


   p. 17:

   "Page, a child of academia, understood that
   web links were like citations in a scholarly
   article.  It was widely recognized that you
   could identify which papers were really
   important without reading them-- simply tally
   up how many other papers cited them in notes
   and bibliographies."


       You can use this to identify "importance",
       but there isn't any way to use to this to
       find unfairly ignored high-quality work.

       And people writing academic papers are
       well aware that they have to be
       careful to cite predecessors *when
       they're already regarded as                      SUPERCONDUCTING_STATE
       important*.  These are people who may
       well be critical for your own career,
       you don't want to offend them.

           Citation indexing is a guide to quality
           that relies on the intellectual integrity
           of the human beings publishing the
           research...


           And this problem is even worse for the
           web, where identities remain slippery,
           and motivations are often even more
           corrupt-- political operatives and
           commercial shills abound, in addition
           to outright crazies, and the pathetic 
           sabotage efforts of trolls.


                              ENGINE_TROUBLE


           The idea that you can *automatically*
           navigate this chaff to find the             And there's a bad
           true gold looks increasingly fool-hardy.    problem that follows
                                                       from this one: what
                                                       *point* is there in
                                                       creating a new work
                                                       that you know will be
                                                       effectively invisible?









--------
[NEXT - SNAKE_SCRATCH]