The Grumpy Editor mangles some web pages
LWN looked at greasemonkey back in March. Greasemonkey is a powerful tool, but it requires that the user write scripts to perform the edits; it's also a heavyweight tool for one-time page tweaks. So your editor decided to look at some of the other tools which are available. Thanks to the Firefox plugin architecture, there is a wealth of tools out there for would-be page manglers.
Your editor's first stop was aardvark, an extension which,
unlike most others, is not found on the mozdev.org site. Aardvark is a
tool optimized for examination of web pages, and the deletion of items from
those pages.
Aardvark lurks during normal browsing, only making itself visible when the "start aardvark" item is chosen from the right-button context menu. Thereafter, the HTML element containing the pointer will be highlighted; picking the interesting portion of the page is simply a matter of moving the pointer there and, possibly, using "w" to "widen" the scope to larger, containing elements. Once the element of interest is chosen, it is a matter of a keystroke to remove it from the page, blank it out, perform some simple formatting changes, or view the HTML source. The source viewer is a nice touch; it enables easy examination of a specific part of a page which might otherwise be hard to find among the kilobytes of junk that modern editors and content management systems dump into pages.
What aardvark lacks, first of all, is any sort of help facility. The user must simply memorize a dozen or so keystrokes, or keep a pointer to the help information available. There is also no way to make changes permanent. So aardvark can be useful for one-time tweaks (useful, for example, to print a page without wasting sheets of paper on unrelated junk), and as a nicer sort of "view source" function. It is not helpful for making permanent changes, however.
Platypus is an on-the-fly editor
which is very similar to aardvark, but which appears to be somewhat more
advanced in some areas. For starters, platypus has a help screen for
people who cannot remember the keyboard shortcuts. The selection of HTML
elements is very similar to aardvark, except that the arrow keys are used:
Platypus explicitly recognizes the tree structure of web pages, and uses
arrows to move up and down the tree, or to "sibling" elements (stepping
across columns in a table, for example).
Platypus can do a number of things which aardvark can't. It can relocate elements on the page, should you like things organized in a different way. So it can be used to rearrange navigation links, or put seldom-useful stuff at the bottom of the page. There is a simple CSS editor which can be used to reformat things or change their colors. And, for advanced users, there is a regular expression-based HTML editor which can make no end of changes.
Perhaps the key feature behind platypus, however, is used at the end: once you have mangled a web page to your satisfaction, a keystroke turns all of the edits into a greasemonkey script. Install that script, and the changes become permanent.
The biggest down side to platypus, perhaps, is that its source viewer is nearly unusable. Instead of aardvark's nice, hierarchical display, platypus gave your editor a window with everything in one long line of text.
The final stop on this tour is rip, which stands for "remove it permanently." As its name would suggest, rip has a very specific mission: allow the user to select web page elements, rip them out of the page, and never see them again. It cannot perform all of the functions of either aardvark or platypus, but it is effective at what it does do.
Rip's core interface is simple: put the pointer over an undesired web
element, put down the right button, and select "remove it permanently" from
the resulting context menu. The affected area will be briefly highlighted
when the menu item is hit, but before it is selected. Rip could benefit
from the more developed mechanisms for selecting elements seen in aardvark
and platypus; it can be hard to communicate to rip exactly what you want to
get rid of.
First-time users may be surprised to learn that rip, when installed, includes "rips" for several popular sites, including Slashdot, BoingBoing, and Wired. There is a wiki page available to host rips created by other users; it probably would be best to put all of them there, and not mess with specific pages without the user's acknowledgment. That said, rip seems like a useful tool for quick simplification of web pages.
Which tool would a grumpy editor, made even grumpier by the user-hostile
features of certain web sites, use? Rip is a lightweight tool for quick
removal of unwanted web cruft, but it lacks flexibility and ease of use.
The future in this space almost certainly belongs to the combination of a
powerful script-based facility (like greasemonkey) combined with a nicer
front end - platypus, for now. With tools like these, control of the web
is moving closer to where it belongs: with the people actually trying to
read all that content.
The Grumpy Editor mangles some web pages
Posted Jun 30, 2005 9:26 UTC (Thu)
by james (subscriber, #1325)
[Link] (4 responses)
Posted Jun 30, 2005 9:26 UTC (Thu) by james (subscriber, #1325) [Link] (4 responses)
Blinking text seems awfully archaic and old-fashioned in these days of flash and javascript atrocities, but we had to manage to get annoyed at the technology that was available at the time; you youngsters won't understand.
Jon, you've done it again. That has to be the funniest thing I've read all week. Thanks!
James.
The Grumpy Editor mangles some web pages
Posted Jun 30, 2005 16:41 UTC (Thu)
by madscientist (subscriber, #16861)
[Link] (3 responses)
Agree... that was worth the coke-drenched keyboard :-)Posted Jun 30, 2005 16:41 UTC (Thu) by madscientist (subscriber, #16861) [Link] (3 responses)
The Grumpy Editor mangles some web pages
Posted Jun 30, 2005 18:47 UTC (Thu)
by sbergman27 (guest, #10767)
[Link] (2 responses)
Jonathan's dry humor is always a delight. Though my favorite is still his comment about the Debian developer's "predicatable" reaction to the "Hot Babe" applet's graphics files.Posted Jun 30, 2005 18:47 UTC (Thu) by sbergman27 (guest, #10767) [Link] (2 responses)
The Grumpy Editor mangles some web pages
Posted Jun 30, 2005 18:53 UTC (Thu)
by sbergman27 (guest, #10767)
[Link] (1 responses)
Err.. that was supposed to be "predictable". Any chance of allowing edits in the forum to correct embarrassing spelling and grammatical errors? Or maybe a spell check?Posted Jun 30, 2005 18:53 UTC (Thu) by sbergman27 (guest, #10767) [Link] (1 responses)
The Grumpy Editor mangles some web pages
Posted Jul 1, 2005 8:25 UTC (Fri)
by mwh (guest, #582)
[Link]
My favourite Corbet-ism is the description of debian as "Excruciatingly free" :)Posted Jul 1, 2005 8:25 UTC (Fri) by mwh (guest, #582) [Link]
The Grumpy Editor mangles some web pages
Posted Jun 30, 2005 12:45 UTC (Thu)
by lwn163 (guest, #11797)
[Link]
I find that elinks (or insert your favorite text-browser here)Posted Jun 30, 2005 12:45 UTC (Thu) by lwn163 (guest, #11797) [Link]
is an excellent mechanism for non-annoying viewing. :)
DMCA, DRM
Posted Jun 30, 2005 16:55 UTC (Thu)
by martinfick (guest, #4455)
[Link] (1 responses)
Sorry for being pesemistic, but how long before someone gets sued under Posted Jun 30, 2005 16:55 UTC (Thu) by martinfick (guest, #4455) [Link] (1 responses)
the DMCA for these tools? How long before publishers start baking up DRM
schemes for web pages?
DMCA, DRM
Posted Jul 1, 2005 6:34 UTC (Fri)
by man_ls (guest, #15091)
[Link]
Very long, hopefully. Firefox is very polite in that respect: e.g. if I block all images from adserve.advertising.ad, it still downloads every individual image, but then they are blanked on the image. This way the publisher never notices the filter, and cannot really complain. You could argue that click-through rates are worse, but the opposite is true: I tend to click on white areas to change focus, but I never click on banners.
Posted Jul 1, 2005 6:34 UTC (Fri) by man_ls (guest, #15091) [Link]
Do these plugins do the same, i.e. download all page elements and just not display some of them? This way we all win at the cost of a little extra bandwidth.
The Grumpy Editor mangles some web pages
Posted Jul 2, 2005 3:16 UTC (Sat)
by shlomif (guest, #11299)
[Link]
It's interesting to learn that Aardvark has these capabilities. So far I've been using it mostly as a kind of debugger for HTML pages, that allows me to see which elements are present at a certain place and what are their CSS classes.Posted Jul 2, 2005 3:16 UTC (Sat) by shlomif (guest, #11299) [Link]