How big is your dataset?

Data archives can be many things to many people, it all depends what you mean by data. In fact there is no clear agreement as to what constitutes a data archive and what its content is or should be.

Datasets of course vary widely according discipline and data type. Observational (survey) data can consist of a few spreadsheets or a mass of images. Experimenta data likewise can vary from a spreadsheet or two to several gigabytes of data. And some anthropology data with its use of video can create massive data sets.

This has implications for data ingest. With a print repository you can get away with http upload as text objects are usually a lot less than 2Gb in size. It’s a different case entirely with data objects, and with the design of any easy self service mechanism for data upload,

To this end I’ve created a straw poll on average data object size in existing repositories on SurveyMonkey. It’s completely anonymous but should help give some numbers around average object size,

Click here to access the poll

I’ll post a summary of the results in a week or two.


About dgm

Former IT professional, previously a digital archiving and repository person, ex research psychologist, blogger, twitterer, and amateur classical medieval and nineteenth century historian ...
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to How big is your dataset?

  1. Pingback: Dataset ingest and harvesting | Building an archive solution

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s