Feb 062014

Google Image Search(web archive) image 1MAZBVC:


  1. MRR776:  Kindly see & follow WHAT else reply explaining, ideally via comments here.
  2. N0L8VC:  If you aren’t already participating (accessing the archive) but want to, Reply-comment here; write-access is limited.

-end of WHAT TO DO


  1. N0L9FX:  This documentation at least ~70% compete or possibly more.
  2. N0L7VS:  Stats (as of 2014.02)
    1. N0N0D1:  used by about 6 Google accounts which include virtually all of OCAndroid leadership; I Destiny use it daily.
    2. N0L7W9:  in use now: ~3 years
    3. N0L8KF:   “4555 Files” -and nearly everyone is manually-created archive web pages –that’s a lot of web pages!
    4. N0L8S0:  “.com” websites (url_archive_MAXUKI/com_): qty 79,  including biggest “/Meetup_”
    5. N0L8MW:  Size:  “459 MB”
    6. N0N1AX:  money cost: $0 (as much less than free quota of Google Drive storage prices)
    7. N0L8P3:  Known data loss: 0!
  3. N0L4U1:  Does “community-archiving of content of URLs especially web pages” fairly easily & fairly reliably.
    1. N0L6PN:  fairly easily
      1. N0L6QE:  Each version of content saved manually almost always via using the web browse’s “Save As” function -no special software required.
        1. N0L7ZV:  Requires use of the excellent Insyc or the equivalent Google Drive client.d
        2. N0MZ30:  once you know how it works, takes (only) ~30 seconds.
      2. N0L6RX:  automatic archiving planned; prior system had it.
      3. N0MWMW:  A TOP CON: Whenever any use of Google Drive that day (as by this),  you must fix for Google Drive’s dangerous version retention policy, and I currently only manual method(s) so that’s what I do.
      4. N0L70R:  A TOP CON: Writers must NEVER to delete any content in the archive, most especially content others created or may be referring to.
        1. N0L74B:  ways to automatically insure this
          1. N0L768:  no known way now, so
            1. N0LADP:   people writers
              1. N0L76V:  are limited to a few (<~10 trusted associates)
              2. N0L77D:  usually given write privileges on just the folders they need until they are well trusted here
              3. N0L798:  must exercise great caution here
      5. N0LAQ7:  Archiving a web page you are (about) to and/or have (just) changed:
        1. N0LAS6:   This is seemingly the biggest use of this archive.
        2. N0LAU0:  General procedure, in order:
          1. N0LAUO:  content-before archiving: Just before you make any changes to any page, unless you’re doing 0 edits and only pure additions (notably only adding comments or Greets), archive the page
            1. N0LBWB:  This protects from loosing anyone’s else work plus yours, plus enabling you to revert when that’s needed.
          2. N0LBXY:  Edit session: make your edits while simultaneously starting content after archiving
          3. N0LC08:  content after archiving: after every about 15 minutes, interrupt whatever you’re doing instead doing an additional archive of the page if certain things:
            1. N0MXJF:  certain things are basically right before you might otherwise loose significant work and 2nd before any response change one might make; specifically  if 1 or more of the following:
              1. N0MXEX:  your your edits/additions so far are visible and might result in anyone else making response changes including additions but especially any removals.
              2. N0MXYL:  On the page you have changes archived and at least 1 of the following is true, from most timely:
                1. N0MY0T:  You are about to try making to the page a non-trivial change that may need to be undone.
                2. N0MXFP:  you’ve put in at least about 1 hour of work on these unarchived changes
                3. N0MXL0:  it’s approaching midnight else at least 1 day since your last archive of the page.
                  1. N0MY9S:  This finishes the content after archiving for this Edit session.
            2. N0LCC8:  For each edit session, this extra step results in 2 or sometimes-more archives  when with an ideal world one would just need the first archive,
              1. N0MYFX:  which is upsetting (though the space to save is generally cheap) and needed as it’s NOT an ideal software world.
              2. N0MYG8:  Might be well fixed by (software) going thru the history of changes and removing duplicate (else similar) save storage while still  somehow keeping a note that the content was sampled on this later date and hadn’t changed (or else just the amount of change)
                1. N0MYKW:  Possible with Google Drive, though maybe not best without a new GUI, as it allows intermediate revisions to be deleted and I recall meta-data notes on the file.
        3. N0L7E6:  For archiving every never-before edited Meetup event listing automatically generated by Meetup (so part of an event series), requires a few extra special steps.
          1. N0L11X: With every auto-generated (notably auto-series) Meetup event listing,
            1. N0L1F3: the page initially has a URL with the event “#” containing no numerics, indeed all letters, here www.meetup.com/OCAndroid/events/qlkqkfysdbrb
              1. N0L1GO: This URL is temporary: if event never happens (as the series is changed/canceled) and sometimes seemingly spontaneously after a few days, Meetup software will dump this URL
                1. N0L1GZ: possibly redirecting it to another URL or reusing it for another event listing or making it a bad URL –-I haven’t found the pattern
                2. N0L1H9: bottom line it can’t be relied on, and I’ve only found reason NOT to archive this URL NAME (not the content)
              2. N0L1J8: the moment such listing is edited in the slightest (not just a description/date/name edit, but even a comment or an RSVP)
                1. N0L1OH: A permanent URL is generated for the event, ending in all numerics, as here http://www.meetup.com/OCAndroid/events/164406282/comments/308438252
                2. N0L1ON: The prior temp URL is set to redirect to this permanent URL for a few days at least, but after that that redirect ends and the temp URL is possibly resused.
            2. N0L1P4: so to 1st archive such a page,
              1. N0MZSB:  do in order:
                1. N0L1QQ: Make a slight change;
                  1. N0L1UF: I recommend doing as I do: post an attendance thread for myself, as “DestinyArchitet ATTENDANCE & REVIEW OF THIS EVENT –post all on that here in this thread<br/>*I am now starting edits of this listing(also generating a permanent URL)<br/>*I plan to attend”
                  2. N0L1VC: This then instantly creates the permanent URL; be sure to do a browser page refresh to see it.
                2. N0L1U0: archive the page as normal.
              2. N0L1YR: If this was not done in advance, so the page got archived under temporary URL(s)
                1. N0L7OG:  as happened for Martin
                  1. N0L1W8: Before any edits to it, appears you did the _almost always_ the right thing archived it, saving it to a path matching this URL url_archive_MAXUKI/com_/meetup_/www_/OCAndroid/events/qlkqkfysdbrb , creating https://drive.google.com/#folders/0B1iBaZhjEYO4ZnI5S0V6RjhIQzA
                  2. N0L7P3:  Motivating me to create this section indeed (finally) this entire article
                2. N0MZUO:  do in order:
                  1. N0L7QC:  create the proper folder for the permanent URL
                  2. N0L7QU:  for every temporary-URL folder,
                    1. N0L7SD:  add a Unix symbolic link else Windows shortcut into it, say named “See instead” which points to the permanent URL folder
                    2. N0L7U4:  for all its contents except this redirect link, move & merge it into the permanent URL folder.
    2. N0L6Q5:  fairly reliably
      1. N0L7BY:  per the low data loss given the stats there.
  4. N0L599:  uses (archives into) “url_archive_MAXUKI: a Google Drive folder”
    1. N0L60S:  URL to folder-path conversion:
      1. N0L5W0:  Real & exemplary example: http://meetup.com/OCAndroid/events/164406282 is archived into folder url_archive_MAXUKI/com_/meetup_/www_/OCAndroid/events/164406282/
      2. N0L61C:  a domain component ends with “_” and components are flipped into biggest-first:  example:  “www.meetup.com” becomes “com_/meetup_/www_”
      3. N0L66P:  Most every component of the URL, except the protocol (http,https,ftp) has its own folder, and in the order it occurs in the URL except for domains.
      4. N0L67O:  Each variable setting of the URL uses the “&” prefix even if it’s the first setting,  as “?name=john” or “&name=john” are both represented “&name=john”
      5. N0L6EY:  Each URL content for a given URL
        1. N0L6G5:  has its own folder (named that URL)
        2. N0L6HJ:  is a file name which:
          1. N0L6IB:  gives a unique ID
          2. N0L6J4:  if HTML, tells
            1. N0L6KG:  how the content was saved, as “HTML[ Only]” or “Complete” or some others
              1. N0LCHE:  “HTML only”
                1. N0LCK0:   should be used unless here proven not to work and other methods (as HTML Complete) form is justified for the much significant space (plus shown to work)
                2. N0LCKA:  works perfectly for all of Meetup.com, especially since Meetup wonderfully never seems to delete all the content its web pages internally link to (pics, CSS files, JavaScript, etc).
            2. N0L6KQ:  browser used to save it, as “Chrome”, “FF”, “IE”
          3. N0L6LO:  notably does NOT tell:
            1. N0L6M4:  Any part of the URL (already covered by its folder path)
            2. N0L6MP:   Any of the content, as the page title (as that’s already in the content and easily & instantly findable via folder content search)
            3. N0LAWV:  The date of the archive (unless a snapshot not to be edited) as that is given by Google Drive/Subversion history.
        3. N0LB22:  each archive content file
          1. N0LB2O:  if NOT a snapshot (the usual case)
            1. N0LB3D:  is to be overwritten with the latest current content but only when once ok
              1. N0LB5T:  but only when once (check for this!) the client(Insync/TortiseSVN) reports that all present content has been successfully archived/checked-in, which it reports via a Green (not red or blue) checkmark when the file is seen in the OS’s normal file explorer, otherwise you will permanently overwrite so loose the last (unarchived) contents of the file
            2. N0LBB5:  then the Google Drive/Subversion will still have the prior version & generally all prior versions 
              1. N0LBK1:  which you can access read-only (plus -be careful!- delete)
                1. N0LBKW:  NOT via Insync-or-equivalent client (currently)
                2. N0LBLK:  can via http://drive.google.com then find there the file then right-click.Manage Revisions.
      6. N0L8X6:  several more to be documented here.
    2. N0L5A2:  Mostly contains public info
    3. N0L5B1:  Can & sometimes does contain private info
      1. N0L5BE:  via say 
        1. N0L5DO:  past versions which are to be kept private
        2. N0L5E2:  within a folder else file where the public visibility has been turned off
        3. N0L5K2:  content archived here under the assumption that only those with the link will find it.
  5. N0L52B:  This folder, and its contents (unless individually overridden), has access settings “anyone with a link”
    1. N0L55S:  allowing read-only access to by anyone by just giving him/her the link
    2. N0L57F:  because of this plus some content privacy depends on the link not being found past versions private where , only the URL of {low-level aka deep} folders can be posted on the public web, as posting of any higher-level folders, most extremely the root folder URL, would be dangerous to privacy in terms of the amount of content it exposes to the public plus especially since public search engines may crawl it.

-end of WHAT


  1. N0L4JT:  Created initially to archive http://Meetup.com content since Meetup.com
    1. N0L4P0:  doesn’t keep past versions of almost any of its content (unlike a wiki)
    2. N0L4PA:  encourages multi-person editing
    3.  N0L4PZ:  effectively encorages users to readily destroy each other’s content (as trivial for someone to delete a comment or Greet someone else has written).
    4. N0L8YO:  appears to block archiving by Archive.Org
  2. N0L4RQ:  Extremely useful for archiving all sorts of web pages.
  3. N0L4UX:  Archive.Org only archives every 3 to 6 months and on its own schedule, which usually isn’t frequent enough and won’t work.
  4. N0L4SZ:  I use it as my entire method to archive content of URLs
  5. d

-end of WHY


  1. N0N16X:  See N0L7W9

-end of WHEN


  1.  N0N18F:  anywhere web access.

-end of WHERE


  1. MNLAB7:  See WHAT

-end of COST


  1. N0N1FX:  Now: see N0N0D1; possible: see N0L8VC.



  1. .



  1. N0L839:  Replaces prior version http://0.JotHere.com ‘s JIT_archive_LJNZCF
    1. N0L888:  by storage mechanism:
      1. N0L933:  using Google Drive, mostly dramatically better (notably easier) , but with some real dangers of data loss which Google could fix if they bothered,
      2. N0L93Q:  instead of Subversion, which is
        1. N0L949:  Painful to setup & learn, so generally only by the programmer or very-tech-savvy  user
        2. N0L96H:  Painful to use:  routinely interrupted to stop & make a checkin else suffer date loss (or use continual webdav checkin but that easily gets out-of-hand permanently using up gobs of storage)
        3. N0L9AD:  Very hard & tricky whenever folders moved & renamed which corrupts the checkout, as semi-regularly happens
        4. N0L99E:  Generally impossible to delete wasted space from unneeded versions
        5. N0L9A2:  Much safer in terms of preventing accidental deletions.
    2. N0L8C7:   similar but improved URL to folder-path conversion
    3. N0MZ6C:  For Subversion I had developed shell scripts for auto-archiving; not yet for Google Drive.



  1. .


MDE167: POST ADDITIONAL TODO, roughly in order:

  1. .

-end of POST TODO


  1. MEMPEO: The author

  2. MEMPF1: No one else unless attributed.

-end of CREATORS


  1. .



  1. .


M31R7R: POST HISTORY, in order:

  1. N0L47O:  Motivated by N0L7P3
  2. N0L438:  I Destiny now created this post by Copy to a new draft (of http://1.JotHere.com/4250#N0L3UI (latest)) then gave it fresh IDs & content.
  3. N0L9GK:  drafted 1st version, ~70% complete; usable.
  4. N0L9O7:  add to category q(0.JotHere.com LXK5HE) q(Google Drive M8SLXW)
  5. N0L9SY:  in category q(of DestinyArchitect MA0YLR) add category:
    1. N0L9TB:  q(by DestinyArchitect N0L9TB)
      1. N0L9VW:  q(DestinyArchitect creation N0L9VW)
  6. N0L9ZA:  in category q(www=World Wide Web N0C2LO) add category:
    1. N0LA1L:  q({www=web} archive N0LA1L)
  7. N0LA5H:  image cnt: 0 to 1;
  8. N0LBNX:  additions as N0LB22 and N0LAQ7; some fixes
  9. N0LCTQ:  first published; pst2014.02.06Thu1228.
  10. N0MWUT:  Undid main URL lack of 4255 realizing that would likely create problems with copying it; N0MWMW: added;  N0LAU0:  drastically improved;  N0MZ6C: added
  11. N0N04E:  #N0L7E6: convert from pre-tag to std outline, including doing the ExWeb regex replace fr({[0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]}=) to(<sup><a id=”\1″ class=”aself_KEP2FG”>\1</a>:</sup> );
  12. N0N133:   N0L7VS: move in WHAT from near bottom to near top as a relevant sell/no-sell point;  N0N16X, N0N18F, N0N1FX: added; M33YGV: replaced links to links to sections; pst2014.02.07Fri1024.