Content-Length: 23910 | pFad | http://lwn.net/Articles/140450/

The Grumpy Editor mangles some web pages [LWN.net]
|
|
Subscribe / Log in / New account

The Grumpy Editor mangles some web pages

As long as there have been web pages, there have been web page annoyances. Back in the early days, it was <blink> tags. Blinking text seems awfully archaic and old-fashioned in these days of flash and javascript atrocities, but we had to manage to get annoyed at the technology that was available at the time; you youngsters won't understand. Back in those days, the technology for annoyance mitigation were also limited; we had to rely upon special-purpose web proxy processes and other unwieldy hacks.

LWN looked at greasemonkey back in March. Greasemonkey is a powerful tool, but it requires that the user write scripts to perform the edits; it's also a heavyweight tool for one-time page tweaks. So your editor decided to look at some of the other tools which are available. Thanks to the Firefox plugin architecture, there is a wealth of tools out there for would-be page manglers.

Your editor's first stop was aardvark, an extension which, unlike most others, is not found on the mozdev.org site. Aardvark is a [aardvark] tool optimized for examination of web pages, and the deletion of items from those pages.

Aardvark lurks during normal browsing, only making itself visible when the "start aardvark" item is chosen from the right-button context menu. Thereafter, the HTML element containing the pointer will be highlighted; picking the interesting portion of the page is simply a matter of moving the pointer there and, possibly, using "w" to "widen" the scope to larger, containing elements. Once the element of interest is chosen, it is a matter of a keystroke to remove it from the page, blank it out, perform some simple formatting changes, or view the HTML source. The source viewer is a nice touch; it enables easy examination of a specific part of a page which might otherwise be hard to find among the kilobytes of junk that modern editors and content management systems dump into pages.

What aardvark lacks, first of all, is any sort of help facility. The user must simply memorize a dozen or so keystrokes, or keep a pointer to the help information available. There is also no way to make changes permanent. So aardvark can be useful for one-time tweaks (useful, for example, to print a page without wasting sheets of paper on unrelated junk), and as a nicer sort of "view source" function. It is not helpful for making permanent changes, however.

Platypus is an on-the-fly editor which is very similar to aardvark, but which appears to be somewhat more advanced in some areas. For starters, platypus has a help screen for [Platypus] people who cannot remember the keyboard shortcuts. The selection of HTML elements is very similar to aardvark, except that the arrow keys are used: Platypus explicitly recognizes the tree structure of web pages, and uses arrows to move up and down the tree, or to "sibling" elements (stepping across columns in a table, for example).

Platypus can do a number of things which aardvark can't. It can relocate elements on the page, should you like things organized in a different way. So it can be used to rearrange navigation links, or put seldom-useful stuff at the bottom of the page. There is a simple CSS editor which can be used to reformat things or change their colors. And, for advanced users, there is a regular expression-based HTML editor which can make no end of changes.

Perhaps the key feature behind platypus, however, is used at the end: once you have mangled a web page to your satisfaction, a keystroke turns all of the edits into a greasemonkey script. Install that script, and the changes become permanent.

The biggest down side to platypus, perhaps, is that its source viewer is nearly unusable. Instead of aardvark's nice, hierarchical display, platypus gave your editor a window with everything in one long line of text.

The final stop on this tour is rip, which stands for "remove it permanently." As its name would suggest, rip has a very specific mission: allow the user to select web page elements, rip them out of the page, and never see them again. It cannot perform all of the functions of either aardvark or platypus, but it is effective at what it does do.

Rip's core interface is simple: put the pointer over an undesired web element, put down the right button, and select "remove it permanently" from the resulting context menu. The affected area will be briefly highlighted when the menu item is hit, but before it is selected. Rip could benefit [rip] from the more developed mechanisms for selecting elements seen in aardvark and platypus; it can be hard to communicate to rip exactly what you want to get rid of.

First-time users may be surprised to learn that rip, when installed, includes "rips" for several popular sites, including Slashdot, BoingBoing, and Wired. There is a wiki page available to host rips created by other users; it probably would be best to put all of them there, and not mess with specific pages without the user's acknowledgment. That said, rip seems like a useful tool for quick simplification of web pages.

Which tool would a grumpy editor, made even grumpier by the user-hostile features of certain web sites, use? Rip is a lightweight tool for quick removal of unwanted web cruft, but it lacks flexibility and ease of use. The future in this space almost certainly belongs to the combination of a powerful script-based facility (like greasemonkey) combined with a nicer front end - platypus, for now. With tools like these, control of the web is moving closer to where it belongs: with the people actually trying to read all that content.


to post comments

The Grumpy Editor mangles some web pages

Posted Jun 30, 2005 9:26 UTC (Thu) by james (subscriber, #1325) [Link] (4 responses)

Blinking text seems awfully archaic and old-fashioned in these days of flash and javascript atrocities, but we had to manage to get annoyed at the technology that was available at the time; you youngsters won't understand.

Jon, you've done it again. That has to be the funniest thing I've read all week. Thanks!

James.

The Grumpy Editor mangles some web pages

Posted Jun 30, 2005 16:41 UTC (Thu) by madscientist (subscriber, #16861) [Link] (3 responses)

Agree... that was worth the coke-drenched keyboard :-)

The Grumpy Editor mangles some web pages

Posted Jun 30, 2005 18:47 UTC (Thu) by sbergman27 (guest, #10767) [Link] (2 responses)

Jonathan's dry humor is always a delight. Though my favorite is still his comment about the Debian developer's "predicatable" reaction to the "Hot Babe" applet's graphics files.

The Grumpy Editor mangles some web pages

Posted Jun 30, 2005 18:53 UTC (Thu) by sbergman27 (guest, #10767) [Link] (1 responses)

Err.. that was supposed to be "predictable". Any chance of allowing edits in the forum to correct embarrassing spelling and grammatical errors? Or maybe a spell check?

The Grumpy Editor mangles some web pages

Posted Jul 1, 2005 8:25 UTC (Fri) by mwh (guest, #582) [Link]

My favourite Corbet-ism is the description of debian as "Excruciatingly free" :)

The Grumpy Editor mangles some web pages

Posted Jun 30, 2005 12:45 UTC (Thu) by lwn163 (guest, #11797) [Link]

I find that elinks (or insert your favorite text-browser here)
is an excellent mechanism for non-annoying viewing. :)

DMCA, DRM

Posted Jun 30, 2005 16:55 UTC (Thu) by martinfick (guest, #4455) [Link] (1 responses)

Sorry for being pesemistic, but how long before someone gets sued under
the DMCA for these tools? How long before publishers start baking up DRM
schemes for web pages?

DMCA, DRM

Posted Jul 1, 2005 6:34 UTC (Fri) by man_ls (guest, #15091) [Link]

Very long, hopefully. Firefox is very polite in that respect: e.g. if I block all images from adserve.advertising.ad, it still downloads every individual image, but then they are blanked on the image. This way the publisher never notices the filter, and cannot really complain. You could argue that click-through rates are worse, but the opposite is true: I tend to click on white areas to change focus, but I never click on banners.

Do these plugins do the same, i.e. download all page elements and just not display some of them? This way we all win at the cost of a little extra bandwidth.

The Grumpy Editor mangles some web pages

Posted Jul 2, 2005 3:16 UTC (Sat) by shlomif (guest, #11299) [Link]

It's interesting to learn that Aardvark has these capabilities. So far I've been using it mostly as a kind of debugger for HTML pages, that allows me to see which elements are present at a certain place and what are their CSS classes.


Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://lwn.net/Articles/140450/

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy