[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Update on UNIPEN Handwriting Benchmark and data
- Subject: Update on UNIPEN Handwriting Benchmark and data
- From: unipen-info-list-owner@magi.nist.gov
- Date: Fri, 28 Jun 96 17:06:20 EDT
- Organization: National Institute of Standards and Technology, Gaithersburg, MD
*******************************
* UPDATE *
*******************************
The First UNIPEN Benchmark of
On-Line Handwriting Recognizers
Hi,
This message is being sent to a list I created of addresses compiled
from:
- the list of Benchmark participants
- Isabelle Guyon's list of people who might be interested in
the UNIPEN Benchmark and in pen-based handwriting
recognition in general
- people who have contacted me and expressed an interest
I know there are many researchers interested in the UNIPEN data, and
I thought now would be a good time to update everyone.
Currently, the more than three dozen UNIPEN Benchmark participants
have the fourth release of training data to work with. It consists of
samples from all 50+ data sets that they collectively donated. The
training data is organized into the 8 test sets such as "printed words",
and in some cases it is then organized beyond that, for example by
upper/lower case. At the lowest level, the files are grouped by the
data set they were originally from. That is largely to help identify
any problems within a particular data set and probably won't be the
case with test sets.
A fifth release of UNIPEN training data is now being compiled for
distribution. It incorporates fixes to all (hopefully) of the data
problems that people have found and reported to me. Twelve data sets
have been fixed, and I am waiting to hear back from the donator of
three others to decide how best to proceed with the data.
The sixth release will follow shortly thereafter, and it will simply
be larger (consisting of all data not required for the benchmark and
development test sets).
The final training data set is expected to be quite large, perhaps
gigabytes. The Linguistic Data Consortium has plans to press CD-ROM's
of the training set.
There will be two UNIPEN Benchmarks of recognition performance, one
this year (when the participants have largely indicated they are ready)
and one next year, after the participants have had a chance to take
advantage of what they have learned from the first Benchmark.
At the conclusion of the first Benchmark, that test set and the
development test set will become available to the public. NIST will
be pressing CD-ROM's containing the test sets to distribute to the
Benchmark participants, and will ultimately make extras available
to other researchers.
The second Benchmark's test set will also become available after that
concludes.
Those of you who are waiting for the data to become publically avail-
able will receive the announcements from this mailing list.
This message contains some information that Benchmark participants have
already been notified of and some information that was contained in
Lambert Schomaker's report "UNIPEN Scrawl #5" that was sent to members
of the SCRIB-L mailing list. It is available on the NICI (Nijmegen
Institute for Cognition and Information, The Netherlands) WWW site:
http://www.nici.kun.nl/unipen/scrawls/unipen-scrawl-5.html
To subscribe to the SCRIB-L mailing list, send email to:
LISTSERV@NIC.SURFNET.NL
and, in the text of your message (not the subject line), write:
SUBSCRIBE SCRIB-L
Lambert Schomaker's email address at NICI is:
schomaker@nici.kun.nl
Isabelle Guyon is no longer with AT&T. Her email address is now:
isabelle@cybergold.net
so please make a note of that. If you have any email aliases set up that
include her, it would be a good idea to update them now. I am not sure if
email addressed to the old address at research.att.com gets forwarded or
not.
Publically available UNIPEN files can be downloaded via anonymous FTP from
ftp://ftp.cis.upenn.edu/pub/UNIPEN-pub/
Included are the original Call for Data, the language definition, and a
few technical documents.
You can send a note via email to:
unipen-info-list-request@magi.nist.gov
if:
(1) you would like to join the list and and are not already on it.
If you are on it, the "To:" field in your email header will be:
unipen-info-list@magi.nist.gov
Please supply this information when available:
name, affiliation, postal address, email address, phone#, fax#
(2) you would like to be taken off this list
(3) you want these messages to be sent to you at a different address;
many of the entries that I have for people list multiple email
addresses and in those cases, I am using the first one
(4) you received multiple copies of this email sent directly from NIST,
as opposed to copies forwarded from other sources; please enclose
the complete headers of the messages because the "Received:" fields
will be necessary to pin down the problem
Regards,
Stan Janet
NIST