Skip to main content

Deep Linking from RSS

One of the more unique and perhaps controversial features of FeedJournal is that it can filter out the meat of an article published on the web.

How does it accomplish this? FeedJournal has four ways of retrieving the actual content for the next issue.

Actual Content
In the trivial case, a site (like this blog for example) decides to include the full article text within its RSS feed. FeedJournal simply published the content; no surprises here. By the way, this is how all standard RSS aggregators work. The problem is when a site decides to only publish summaries or teasers of the full article text. FeedJournal needs to deal with this because it is an offline RSS reader, users cannot click on their printed newspaper to read the full article.

Linked Content
The <link> tag inside the RSS feed specifies the URL for the full article. In case the RSS only includes summaries of the full articles, FeedJournal retrieves the text from this URL.

Rewritten Link
In most cases, just following this link is not a good solution. The web page typically includes lots of irrelevant content, like a navigation menu, a blogroll, or other articles. FeedJournal lets the user write a regular expression for each feed, automatically rewriting the article’s URL to the URL of the printer-friendly version. As an example the URL to a full article in International Herald Tribune is http://www.iht.com/articles/2006/07/28/news/mideast.php while the link to the printer-friendly version is http://www.iht.com/bin/print_ipub.php?file=/articles/2006/07/28/news/mideast.php By inserting bin/print_ipub.php?file=/ in the middle of the URL we will reach the printer-friendly article. This article is much more suitable for publishing in FeedJournal, because it more or less only contains the meat of the article.

Filtered Content
“More or less”, I said in the last sentence. There are usually some unwanted elements left in the printer-friendly version, like a header and a footer. These can be filtered out by letting FeedJournal begin the article after a specified substring in the HTML document source. Likewise, another substring can be selected as ending the relevant content.

By applying these functions it is possible to scoop, or extract, the meat of almost any web published article. Of course it is only necessary to do this once for every feed. To my knowledge, FeedJournal is the only aggregator who has the functionality described in the last three sections.

Is this legal, you ask? Wouldn’t a site owner require each user to actually visit the web site to read the content and click on all those fancy ads sprinkled all over? Well, my stance is that if the content is freely available on the web, I am free to do whatever I want with it for my own purposes. Keep in mind that we are not actually republishing the site’s content, we are only filtering it for our own use. Essentially, I think of this as a pop-up or ad blocker running in your browser.

What is interesting to note is that some web sites have tried to include in their copyright notice a paragraph limiting the usage of their content. Digg.com, for example, initially had a clause in the their copyright effectively prohibiting RSS aggregators from using their RSS feeds! Today, it is removed.

As long as FeedJournal is used for personal use, and the issues are not sold or made available publicly, I do not see any legal problems with the deep linking.

Comments

Popular posts from this blog

HOWTO: Fix a Broken Laptop Lid for $1

A few months ago my laptop lid's hinges gave up and my lid kept falling over. I will show you how I fixed the problem in five minutes by using materials for $1. But first some background info. At first, I assumed there would be a quick and simple fix to this common laptop problem. My laptop is an Evo N800v. HP has bought Compaq since I purchased the computer so that's where I'm supposed to turn for help. I was kind of startled to hear that HP support wanted $500 for fixing the broken hinges - presumably they intended to replace the entire lid. Obviously, shelling out $500 for fixing a 6 year old laptop is not the way to go, so I started to look for alternative solutions. First, I disassembled the laptop numerous times, trying to make the hinges more sturdy (that's spelled S-U-P-E-R-G-L-U-E). Anyway, that didn't help. Option number two was to do something similar to what user xrobevansx did on instructables.com . Basically he bought a lid support in a hardware store...

HOWTO: GTD with Google Docs & PocketMod

Take control of your unwieldy to do-list by combining Google Docs and PocketMod. With the system described here you will always be ready to take notes, and never run the risk of losing an idea! Update (July 30, 2009): Now using a Google Docs template. I use a subset of GTD (" Getting Things Done ") by having a digital copy of my next actions, sorted by context (@Home, @Office, @Shopping, @Computer, etc.). This lets me easily look up what I need to do, depending on where I am. However, a digital copy is not very useful by itself, since it is not accessible when I am offline. Putting it in my PDA is not ideal either, since the overhead of adding a new note is too big (turning on the device, opening the right application, having it recognize my handwriting). That's why I print out my to-do list on paper once a week and carry it in my pocket. It's the ideal way of accessing and editing tasks. Before I print out a new list I spend a minute or two copying the edits from my...

Reading on Paper vs. on Screen

One of the basic premises behind FeedJournal is that it's better to read text on paper than on a screen. While it might not sound like a bold assumption, it still is an assumption and as such worth to examine deeper. Today, office workers and many other professionals are required to focus their eyes on a computer screen during most of their work day. Many of them continue to use the computer at home. FeedJournal was created with many goals in mind; one of them is to release you from the screen while enabling you to read the content you love. You shouldn't have to spend more time reading off a screen, just because you want to access fresh and relevant content. Recent research has found that reading a longer text on paper is 25% faster than reading the same text on a computer screen. At the same time, reading comprehension and article overview are improved. Although screen resolutions have increased and font rendering technologies such as ClearType make it much easier to rea...