Source: http://htmlsummarizer.codeplex.com/
This is one C# tool that I want to share. It can be used for extracting a portion of an HTML code without cutting the HTML tags in half or leaving unclosed tags. It also allows to measure the length of the extracted part in number of letters, words, sentences, closed HTML tags, closed P tags, closed DIV tags and closed P or DIV. When counting the number of letters or words, those contained within the HTML tags are not considered.
If you have some content stored as HTML in a database, or if you want to summarize an HTML page residing on a remote server you have to be able to extract a certain number of words or letters without considering the HTML tags them self and without leaving unclosed HTML tags. This is the exact tool for this kind of scenario.