Data ingest questions

We have our data set. How we acquired it is no longer our concern. It is sitting spinning on a filesystem somewhere. We also understand the format of our data set and how much we have.

So our questions are now:

do we know and understand the format we wish to ingest
do we need to do a transform on the raw data
are there existing libraries/crosswalks available to do this
do we have all the data to generate the metadata we require
how much disk store will we need for the pre ingest store
how long does processing take

The last two are crucial – the data files for ingest should be smaller, but we could concievably want to generate multiple instantiations at this stage – say a jpeg thumbnail of a fits image data set – we will need to store these for ingest. Equally we need to ensure that we have enough processing power – we have a race condition in automatic data acquisition where we need to process all raw data before the next data set is ready for processing …

About dgm

Former IT professional, previously a digital archiving and repository person, ex research psychologist, blogger, twitterer, and amateur classical medieval and nineteenth century historian ...

View all posts by dgm →

This entry was posted in Uncategorized. Bookmark the permalink.