Wednesday, October 10, 2012

Feed filtering Stream Spigot tool

Build a Stream Spigot tool for filtering of feeds. Unlike Feed Rinse or Yahoo! Pipes, it wouldn't involve taking a feed URL, giving it to the tool, and getting a filtered URL to subscribe to in return. Instead, it would work by connecting to your Google Reader account and then marking matching items as read, so that they don't show up (if using new items only view).

This would have the advantage of not having to re-subscribe when deciding to filter something, benefiting from the regular crawl speed (vs. filtered feed URLs generally having only one subscriber and thus being in the slow crawl bucket) and having item metadata be preserved (though that matters less now that likes and shares are gone).

The filtering could either be done periodically, or for feeds that are PubSubHubbub-enabled, it could be triggered by PSHB pings. It does mean that there is a window during which items that should be filtered our are visible, but the filtering is likely to be cheap enough (get items since the last check, apply a read tag) that it can be done quite frequently.

In addition to the read tag, a "filtered" state tag should also be applied to the items, so that these items can be differentiated during subsequent API calls.