This is the mail archive of the mailing list .

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: legacy embedded HTML

>>>>> "R" == Robert Koberg <> writes:

    R> Hi, Are you paying for ths feed?? They should process the crap
    R> out first.

Forgive my sounding like a broken CD player, but in the real world,
function prevails over form almost every time ;)

In this particular example, what we buy arrives as pure ASCII (likely
generated from ancient mainframe apps) punctuated by XML-like tags.  I
suppose I could just ring up all the international press agencies from
my little woodland cottage office in Sauble Beach and demand they fix
their whole goddam industry right frickin now, but somehow, I don't
think it would elicit much more than a politely muffled laugh.

Our program ( cleans the received feed into
something that will actually parse, and filters any likely CDATA
through an opensource program called txt2html to trap paragraph
breaks, tables, headlines, urls, emails and other key items ... any
recommendations for better free ascii-html converters are most

    R> product by Andy Clark called nekoHTML that can balance your
    R> tags for you. Maybe it is worth another try to get well-formed
    R> XML.

For a newswire application, that only gets us one tiny step farther,
but thanks for that reference; it may still be helpful.

Gary Lawrence Murphy <> TeleDynamics Communications Inc
Business Innovations Through Open Source Systems:
"Computers are useless.  They can only give you answers."(Pablo Picasso)

 XSL-List info and archive:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]