Thursday, March 3, 2011

boilerpipe

The boilerpipe library provides algorithms to detect and remove the surplus 'clutter' (boilerplate, templates) around the main textual content of a web page.

Instance/web API running at http://boilerpipe-web.appspot.com/.