This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: combine xml files
Hi, Tom,
You are really good! All your assumptions are correct.
One little thing is that, each searched record IS a xml file. We are actually
searching over xml records (of course, indexed) and each search result is a xml
record. That's why in the search result screen, when you check two search
results, the file C9A1876A75333C9.tomcat1 has two entries with each entry point
to the full path of a xml file as described in the previous email.
Thanks a lot for your help.
Ming
"Thomas B. Passin" wrote:
> [Ming]
> >
> > I think I can make it more clear with an example:
> >
>
> Good. Let me summarize what I think I understand:
>
> 1) Each search record is saved in a single xml file.
>
> 2) All contents of any one of these xml files pertain to a single work.
>
> 3) A single xml file may contain data obtained from several sources (the
> "db" values).
>
> 4) All information relevant to a particular search result is contained in a
> single xml file.
>
> 5) For formatting, reliability, or other reasons, information from a
> particular source may be preferred over that from another (the db preference
> order). The preferred source may be different for titles than for authors.
>
> 6) The data from the most preferred source available is the data to be
> displayed.
>
> Now to check out a few things I am assuming:
>
> a) The db preferences will be the same for all xml files.
>
> b) The db preferences will either not change over searches, or only change
> infrequently.
>
> c) The number of different dbs is small and will always be known before a
> search is processed (in case we want to hard-code them).
>
> If all these things are correct, it should be fairly easy, modulo the time
> needed to process 1000 files.
>
> Let us know if these things are correct.
>
> Tom P
>
> > My saved searched files named: C9A1876A75333C9.tomcat1 (the session id).
> Each
> > entry is saved to this file after a user click on the check box in front
> of
> > each search result.
> >
> > In this file, the entries are like these:
> > /records/sci01/1082-6068/30/1/69_DOU-PSOCFPGCWPIIC
> > /records/sci02/0254-3052/24/10/892_BAI-SDJPLGGVRP
> >
> > And each entry is a xml file. And the format of each xml file is like
> this:
> >
> > <xml>
> > <db1>
> > <jauthor>
> > <author db=db1> Smith, J</author>
> > <author db=db1> Mou, S </author>
> > </jauthor>
> > <jtitle>
> > <title db=db1> Preliminary study on network (II) </title>
> > </jtitle>
> > </db1>
> >
> > <db2>
> > <jauthor>
> > <author db=db2> Smith, JR </author> <!-- note here, since it's
> the
> > same article, the author is the same
> >
> > but displayed differently for different database -->
> > <author db=db2> Mou, ST </author>
> > </jauthor>
> > <jtitle>
> > <title db=db2> Preliminary Study on Network (II) </title><!-- same
> as
> > author, same article, but display title is slightly different -->
> > </jtitle>
> > </db2>
> > </xml>
> >
> > And here is my preference file (It can be in any format, here I just put
> it in
> > a text file with space delimited format):
> > filename: DbPref.txt
> > content:
> > title: db2 db1 db3
> > author: db1 db3 db2
> >
> > Actually, there are about 6 dbs (from db1 to db6). And each xml file (or
> each
> > record) can be in any one or more dbs.
> >
> > So, my job is to display something like this on the website:
> > Title: Preliminary Study on Network (II) <!-- note here, this title is
> from
> > title in db2, since db2 is the preferred title display database -->
> > Author: Smith, J; Mou, S <!-- note here, the authors are from the authors
> in
> > db1, since db1 is the preferred author display database -->
> >
> > I've thought about this over and over again and think maybe the way you
> > mentioned is a good idea. And what I need to do more is to add the
> preference
> > information (in order to do this, I may need to process each xml file in
> my
> > java servlet and find the preference) to the xml file. Something like:
> > <files>
> > <file title=db2 author=db1> xml file 1 </file>
> > <file title=db3 author=db2> xml file 2 </file> <!-- note here, the
> record in
> > xml file 2 is in db 2 and db3 -->
> > </files>
> >
> > I don't think I answer your question correctly. But I really don't know
> how to
> > find a proper answer. So, I gave you this complete scenario. Hope this can
> help
> > to clarify the problem.
> >
> > Thanks a lot.
> >
> > Ming
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > "Thomas B. Passin" wrote:
> >
> > > We're getting closer, I think. If a work can appear, with a different
> > > format, in more than one xml file, then how can you tell when an entry
> in
> > > one file is for the same work as an entry in another file? You need to
> be
> > > able to do that, it would seem, or you won't be able to match up
> entries.
> > >
> > > What data is contained in any one xml file? Is it data on one single
> work
> > > from one single database? Is it many works, but all from one database?
> Is
> > > it one single work, but possibly from many databases?
> > >
> > > Are you expecting to get a fast response when looking through 1000 files
> for
> > > each query? How fast? Or can it be a batch process? Even doing a
> > > directory listing of 1000 files can take some time, depending on your
> > > system, and that's not doing any processing on the files.
> > >
> > > Cheers,
> > >
> > > Tom P
> > >
> > > [Ming]
> > >
> > > >
> > > > To make my explanation easier to understand (sorry for the
> misleading),
> > > I'm
> > > > going to describe my task.
> > > >
> > > > Actually I'm doing the "View Marked" function after a search. The
> saved
> > > > searched are saved in a temporary file with the session id as the file
> > > name.
> > > > And each entry in the file is a complete path to a xml file. So, the
> > > number of
> > > > xml files saved in the temporary file can vary from 1 to 1000. After
> the
> > > user
> > > > click on the view marked button, I need to display the title and
> author
> > > > information for each xml file to the user. So, it's a
> > > > dynamic process.
> > > >
> > > > For the title in each xml file, the title format for each database is
> > > slightly
> > > > different and so are others such as author. That's why we have a
> > > preference
> > > > list for titles, authors, etc because different group of people prefer
> > > > different display format for titles, authors, etc.
> > > >
> > > > Yes, I need to look through each xml record since some titles appears
> only
> > > in
> > > > one database and some appear in more than one database. So, the <db*>
> tags
> > > are
> > > > different. And I need to find out the most preferred one to display
> from
> > > my
> > > > preferrence list.
> > > >
> > > ...
> > >
> > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
> >
> >
> > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
> >
>
> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list