JavaHtmlTidy is a Java port of the WorldWideWebConsortium's Tidy program for validating and fixing HTML.  As a side effect, it can be used as a DOM parser for HTML.  See http://sourceforge.net/projects/jtidy/ for more info.

Other HTML parsers can be found at: http://java-source.net/open-source/html-parsers