Data archives can be many things to many people, it all depends what you mean by data. In fact there is no clear agreement as to what constitutes a data archive and what its content is or should be.
Datasets of course vary widely according discipline and data type. Observational (survey) data can consist of a few spreadsheets or a mass of images. Experimenta data likewise can vary from a spreadsheet or two to several gigabytes of data. And some anthropology data with its use of video can create massive data sets.
This has implications for data ingest. With a print repository you can get away with http upload as text objects are usually a lot less than 2Gb in size. It’s a different case entirely with data objects, and with the design of any easy self service mechanism for data upload,
To this end I’ve created a straw poll on average data object size in existing repositories on SurveyMonkey. It’s completely anonymous but should help give some numbers around average object size,
I’ll post a summary of the results in a week or two.