[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Update on UNIPEN Handwriting Benchmark and data




                *******************************
                *            UPDATE           *
                *******************************
                 The First UNIPEN Benchmark of
                On-Line Handwriting Recognizers


   Hi,
   This message is being sent to a list I created of addresses compiled
   from:
	- the list of Benchmark participants
	- Isabelle Guyon's list of people who might be interested in
		the UNIPEN Benchmark and in pen-based handwriting
		recognition in general
	- people who have contacted me and expressed an interest
   I know there are many researchers interested in the UNIPEN data, and
   I thought now would be a good time to update everyone.

   Currently, the more than three dozen UNIPEN Benchmark participants
   have the fourth release of training data to work with. It consists of
   samples from all 50+ data sets that they collectively donated. The
   training data is organized into the 8 test sets such as "printed words",
   and in some cases it is then organized beyond that, for example by
   upper/lower case. At the lowest level, the files are grouped by the
   data set they were originally from. That is largely to help identify
   any problems within a particular data set and probably won't be the
   case with test sets.

   A fifth release of UNIPEN training data is now being compiled for
   distribution. It incorporates fixes to all (hopefully) of the data
   problems that people have found and reported to me. Twelve data sets
   have been fixed, and I am waiting to hear back from the donator of
   three others to decide how best to proceed with the data.

   The sixth release will follow shortly thereafter, and it will simply
   be larger (consisting of all data not required for the benchmark and
   development test sets).

   The final training data set is expected to be quite large, perhaps
   gigabytes. The Linguistic Data Consortium has plans to press CD-ROM's
   of the training set.

   There will be two UNIPEN Benchmarks of recognition performance, one
   this year (when the participants have largely indicated they are ready)
   and one next year, after the participants have had a chance to take
   advantage of what they have learned from the first Benchmark.

   At the conclusion of the first Benchmark, that test set and the
   development test set will become available to the public. NIST will
   be pressing CD-ROM's containing the test sets to distribute to the
   Benchmark participants, and will ultimately make extras available
   to other researchers.

   The second Benchmark's test set will also become available after that
   concludes.

   Those of you who are waiting for the data to become publically avail-
   able will receive the announcements from this mailing list.

   This message contains some information that Benchmark participants have
   already been notified of and some information that was contained in
   Lambert Schomaker's report "UNIPEN Scrawl #5" that was sent to members
   of the SCRIB-L mailing list. It is available on the NICI (Nijmegen
   Institute for Cognition and Information, The Netherlands) WWW site:
   	http://www.nici.kun.nl/unipen/scrawls/unipen-scrawl-5.html

   To subscribe to the SCRIB-L mailing list, send email to:
   	LISTSERV@NIC.SURFNET.NL
   and, in the text of your message (not the subject line), write:
   	SUBSCRIBE SCRIB-L
   Lambert Schomaker's email address at NICI is:
   	schomaker@nici.kun.nl

   Isabelle Guyon is no longer with AT&T. Her email address is now:
   	isabelle@cybergold.net
   so please make a note of that. If you have any email aliases set up that
   include her, it would be a good idea to update them now. I am not sure if
   email addressed to the old address at research.att.com gets forwarded or
   not.

   Publically available UNIPEN files can be downloaded via anonymous FTP from
	ftp://ftp.cis.upenn.edu/pub/UNIPEN-pub/
   Included are the original Call for Data, the language definition, and a
   few technical documents.

   You can send a note via email to:
   	unipen-info-list-request@magi.nist.gov
   if:
      (1) you would like to join the list and and are not already on it.
          If you are on it, the "To:" field in your email header will be:
		unipen-info-list@magi.nist.gov
          Please supply this information when available:
		name, affiliation, postal address, email address, phone#, fax#
      (2) you would like to be taken off this list
      (3) you want these messages to be sent to you at a different address;
          many of the entries that I have for people list multiple email
          addresses and in those cases, I am using the first one
      (4) you received multiple copies of this email sent directly from NIST,
          as opposed to copies forwarded from other sources; please enclose
          the complete headers of the messages because the "Received:" fields
          will be necessary to pin down the problem

   Regards,
      Stan Janet
      NIST