How big is your dataset?

Data archives can be many things to many people, it all depends what you mean by data. In fact there is no clear agreement as to what constitutes a data archive and what its content is or should be.

Datasets of course vary widely according discipline and data type. Observational (survey) data can consist of a few spreadsheets or a mass of images. Experimenta data likewise can vary from a spreadsheet or two to several gigabytes of data. And some anthropology data with its use of video can create massive data sets.

This has implications for data ingest. With a print repository you can get away with http upload as text objects are usually a lot less than 2Gb in size. It’s a different case entirely with data objects, and with the design of any easy self service mechanism for data upload,

To this end I’ve created a straw poll on average data object size in existing repositories on SurveyMonkey. It’s completely anonymous but should help give some numbers around average object size,

Click here to access the poll

I’ll post a summary of the results in a week or two.

Advertisements

About dgm

IT professional, ex research psychologist, blogger, twitterer, and amateur classical and medieval historian ...
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to How big is your dataset?

  1. Pingback: Dataset ingest and harvesting | Building an archive solution

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s