Wednesday, February 25, 2015

Storing and sharing research data after the ‘Space Race’

The emerging demands of modern, data-intensive, collaborative research has seen swathes of researchers adopting services for data storage and preservation beyond their institutional offer.

At the storage stage, a lot of researchers in the UK have been benefiting from the extra data storage space offered by DropBox as a part of its ‘Space Race’ promotion. By convincing colleagues and students to sign up it was possible for individuals to gain access to significantly enhanced storage space, all synchronised from local machines in the usual DropBox way.The final frontierHowever, as with all promotions, the ‘space race’ is coming to an end. From 4 March, these additional allocations will be removed and accounts will revert to their initial state, often with substantially less storage. Data above the account allocation will still be stored and accessible to download, but it won’t be synchronised with local files.For those expecting changes in regularly updated data to be reflected in the cloud – for example by sharing live datasets with collaborators via Dropbox – this could be an unwelcome surprise.What comes nextIt is possible – and DropBox is encouraging users – to sign up for additional space with a paid monthly subscription.But there are other considerations at force. In the UK the Engineering and Physical Sciences Research Council (EPSRC) and other funding councils have policies that the research data they fund must be managed and stored in the European Economic Area (EEA), and that “effective data curation is provided throughout the full data lifecycle”.  UK Data Protection regulations also mean that some personal and other human data will also be stored within the same area.A DropBox-type service is clearly beneficial to researchers in the managing active data stage, as it is accessible from the researcher’s personal computer, can be changed easily and opened up for collaboration. Research managers would therefore be advised to consider how they can implement a similar solution at an institutional level, and choose one that is fit for purpose –particularly as the growing number of individual paid accounts within an organisation could result in that institution approaching its procurement threshold.Pick your serviceThere are a range of cloud storage solutions available for researchers that comply with funding council and data protection regulations. These cover managing, sharing and collaborating on research data at different stages in the research data lifecycle.For example with the ‘File Sync and Share’ service Jisc offers, we have worked with commercial providers to deliver a catalogue of file syncing and sharing products that are capable of meeting the requirements of institutions and their researchers, depending on their needs. We are even working with DropBox in this area.For those who need additional file sharing capabilities at low cost, Jisc has negotiated an agreement with Microsoft for the Office365 suite. This offers EEA-based file syncing and sharing capabilities through the OneDrive application and has the added bonus that it connects through the Janet network, providing enhanced security by avoiding the public internet. Alternatively if the active research data is very large, or if it needs to be next to cloud computing capabilities, a solution such as the one we have with amazon web services might be more apt, providing EEA-based storage and being peered to the Janet network for faster data transfer.Publishing outputsOnce a research project has concluded what happens with the final data outputs?In some cases file storage services are also being used in the ‘manage, store and preserve’ and ‘share and publish’ stages of the research lifecycle, as it is seen as an easy way to backup research data in the cloud. They may also be used to informally publish and share finished datasets using peer to peer communication methods such as emailing links to datasets hosted on the service.This can be seen as bad practice, as this data is not openly available to all, carries little or no metadata to enable discovery or re-use and is outside the scholarly communications infrastructure. The researcher may also be missing out on gaining credit for some of their most valuable digital research outputs.A home for dataThe first home for this type of data should be a suitable disciplinary repository, (an extensive listing can be found on re3data) or the institution’s own data repository.If these options are either not available or suitable, or researchers would like to informally publish their data, then they would be advised to use the free specialist cloud publication services, Figshare and Zenodo. Both offer researchers the space to publish final, citeable, datasets and other digital research objects as best practice dictates.Data published on these services are also given a DataCite Digital Object Identifier (DOI), which is permanently resolvable and integrates with scholarly communications systems by exposing the dataset metadata. This enables the final stages of the research data lifecycle, by providing routes for data discovery, re-use and citation.The futureManaging the research data lifecycle is clearly a challenge. At Jisc, we are working with people who are affected by the changing landscape, to make sure that we are best supporting their needs. We would love to hear your examples or ideas for how to better manage this data - please get in touch with me.

» more

No comments:

Post a Comment