Data ingest questions

We have our data set. How we acquired it is no longer our concern. It is sitting spinning on a filesystem somewhere. We also understand the format of our data set and how much we have.

So our questions are now:

  1. do we know and understand the format we wish to ingest
  2. do we need to do a transform on the raw data
  3. are there existing libraries/crosswalks available to do this
  4. do we have all the data to generate the metadata we require
  5. how much disk store will we need for the pre ingest store
  6. how long does processing take

The last two are crucial – the data files for ingest should be smaller, but we could concievably want to generate multiple instantiations at this stage – say a jpeg thumbnail of a fits image data set – we will need to store these for ingest. Equally we need to ensure that we have enough processing power – we have a race condition in automatic data acquisition where we need to process all raw data before the next data set is ready for processing …

About dgm

Former IT professional, previously a digital archiving and repository person, ex research psychologist, blogger, twitterer, and amateur classical medieval and nineteenth century historian ...
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s