Helma Logo
main list history

Version 10 by hannes on 25. July 2006, 11:29

40* Likes to convert HTML snippets to whole documents, which is not really convenient for some of our purposespurposes (but not a big problem either: just provide convenience methods to drop the html and body tags)

Version 9 by hannes on 24. July 2006, 23:24

33==== *Tagsoup|http://home.ccil.org/~cowan/XML/tagsoup/* notes
34
35* Very small (around 50 k)
36* Implements straight SAX2 parser
37* Never ever throws an exception
38* But does tag balancing, tag insertion etc.
39* Is pretty good at this tag balancing business, probably better than HtmlParser
40* Likes to convert HTML snippets to whole documents, which is not really convenient for some of our purposes

Version 8 by hannes on 07. July 2006, 12:49

33
34==== *HtmlParser|http://htmlparser.sourceforge.net/* notes
35
36* sufficiently small (full jar is ~300 k)
37* low level lexer is even smaller (~70 k), might be enough for us
38* NodeFactory class is used to create Nodes
39* most important Node subinterfaces are Text and Tag
40* Tag has methods getEnders() and getEndTagEnders() that determine which (end) tags will close this tag to implement tag balancing/injection of virtual end tags
41* Tag balancing is only done if a matching subclass of CompositeTag exists and is registered
42* Nodes and NodeLists have a toHtml() method that convert it back to html text
43* HtmlParser provides an advanced filtering/nodewalking framework that would be quite useful for tag/attribute filtering
44

Version 7 by hannes on 20. June 2006, 16:05

13Another interesting Package (via Jürg *Jürg on helma-dev): helma-dev|http://grazia.helma.at/pipermail/helma-dev/2006-June/002842.html*): <http://htmlparser.sourceforge.net/>

Version 6 by hannes on 20. June 2006, 14:07

13Another interesting Package (via Jürg on helma-dev): <http://htmlparser.sourceforge.net/>
14

Version 5 by hannes on 15. December 2005, 12:21

9# Plugin architecture for stuff like wiki formatting
18# Allow for plugins to handle formatting at various stages, e.g. before/after/instead of default formatting.

Version 4 by hannes on 12. October 2005, 23:41

12==== PlanPlan
17
18==== Plan B
19
20# Keep <a href="http://adele.helma.org/source/viewcvs.cgi/helma/src/helma/util/HtmlEncoder.java?rev=1.30&cvsroot=hop&content-type=text/vnd.viewcvs-markup">current code</a> for character entity escaping only.
21# Use Tagsoup for cleaning up tags and -- using the knowledge from helma's old html formatter -- to generate break/paragraph tags
22
23==== Open Issues
24
25We should provide a feature to only allow certain tag/attribute combinations to exclude scripts or just to keep people from ruining the layout.
26
27Skin parsing might start from this code too if we move to HTML/XML style skin tags.
28

Version 3 by hannes on 12. October 2005, 23:29

11
12==== Plan
13
14# Keep <a href="http://adele.helma.org/source/viewcvs.cgi/helma/src/helma/util/HtmlEncoder.java?rev=1.30&cvsroot=hop&content-type=text/vnd.viewcvs-markup">current code</a> as starting point, as I can't find any other code with a similar feature mix (most importantly smart formatting) that looks like it's worth the switch.
15# Separate character entity escaping from the formatting/tag closing.
16# Update the list of recognized tags from the Tagsoup project.

Version 2 by hannes on 12. October 2005, 23:23

5* # Parsing HTML for skin rendering
6* # Adding of missing tags
7* # Adding of formatting tags
8* # entity encoding
7Potentially of interest: <http://mercury.ccil.org/~cowan/XML/tagsoup/>.

Version 1 by hannes on 12. October 2005, 23:22

1==== Ideas
3Some separation of concerns:
4
5* Parsing HTML for skin rendering
6* Adding of missing tags
7* Adding of formatting tags
8* entity encoding
9
10Potentially of interest: <http://mercury.ccil.org/~cowan/XML/tagsoup/>