[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Two large census datasets available from SGI
- Subject: Two large census datasets available from SGI
- From: Ronny Kohavi <ronnyk@starry.engr.sgi.com>
- Date: Thu, 18 Sep 1997 23:15:35 -0700
A year ago, we at Silicon Graphics created the "adult" dataset, which
is now available at UCI. Thanks to Terran Lane who interned here this
summer, we now have two larger files based on two years of real US
census data (unlike the previous dataset, these are not filtered
adults only).
The files are challenging for scale-up experiments because the
training sets are larger than common UCI files: 101MB and 47MB respectively.
The files are available at:
http://reality.sgi.com/ronnyk/census-income.tar.gz
http://reality.sgi.com/ronnyk/census-year.tar.gz
The files are in the standard UCI/C4.5 format with some documentation
on the attributes.