This post is part of our Curators’ Corner series. Every so often we’ll feature a different DCN Curator. The series grew out of a community-building activity wherein curators at our partner organizations interview each other “chain-letter style” in order to get to know each other and their work outside of the DCN better. We hope you enjoy these posts!
Nicholas Wolf is a Research Data Management Librarian at New York University (NYU). Nicholas was interviewed by Wendy Kozlowski in December 2021.
How did you come to your current position?
I came to the position from a faculty position in the same university, where I was working as a digital humanist but also a historian. It’s kind of an odd entry point, but I had already been working with faculty in my own department, and even other departments, on digital humanities projects. In that job, I was already providing some support to teach them technical skills or to get the project in order.
The position that I currently have, which is Research Data Management Librarian, was started as a new position. They were looking at candidates who would have a rapport with faculty, and I knew that was important. I could make the case that I had already done this type of work, so that’s how I ended up starting out in research data management librarianship.
Tell me what you do as a Research Data Management Librarian at NYU.
I do a varied kind of list of things. My day to day is to teach RDM to patrons; usually most of them tend to be graduate students, but every once while we get a faculty member come in, or an undergrad. These are sessions that are focused on everything from the logistics of it, like how to do structure projects that are Python based projects, to how organize files, how to work with databases to keep research data in good order, to more policy questions like how to write a data management plan, and what are the requirements of agencies around DMP planning.
There’s that teaching component and there’s also a consultation component. We encourage people at our university to reach out to us about questions related to any of those topics, and we will meet with them and either do some hands-on project management type of stuff, or it could be reading a data management plan and offering feedback.
In the past three or four years I’ve done a lot more work with PIDs. I’m our representative for DataCite as well as ORCID, and I help out on CrossRef. We’re minting those, or we’re looking for ways to implement ORCID, and encouraging implementation among our faculty. So it’s either being an advocate for adoption of those or contributing because DataCite, in particular, is a member driven organization, so we can donate some time to help you to do its work.
We’re part of the department called data services, although ours is jointly run by our campus IT so it’s kind of an interesting mix of data licensing – like providing licenses and facilitating licenses to certain software and supporting how to learn how to use that – in addition to data reference and other RDM work that we do. I help out on that side as well, like I teach intro Python for those that want to learn that, and I help run our lab space in the library.
Is curation one of the RDM services?
Only informally, and very rarely. We will help you curate your data, either at the end of the project phase, or in the midst of it. I guess you could call it more “project management” in that case. We have an institutional repository but it doesn’t have a long culture of data deposit with it. So without that, a lot of data goes elsewhere, to places that have their own curation workflows. We are developing a new IR, and looking to bring in more data. There’s always going to be multiple options in this space, but we are hoping to make this and make it a more direct option for researchers at NYU.
How much of your job involves Data Curation?
I came into thinking there would be more of it. I did a lot of training in it. I would say right now, it’s about 2% of my time actually working with researchers on this. But the ones that I do are fairly interesting; we have one coming up where it’s a very large data set for a machine learning publication, and the researcher has to take this data off the HPC, and put it in a location where people can download it. We get these edge cases because you can’t pop that into ICPSR or Dryad because the files are too large. When we do curate at NYU, it’s often hard like that.
Why is curation important to you?
This is part of our values as a research data management librarian: that curation is done well. We see our charge to facilitate and make curation happen more easily. So we’re interested in making the researcher do that work beforehand and then ensuring the outcome of curation actually happens. We’re also data users ourselves. We have had people use our data sets and ask questions– then we say “oh yeah, I should have done this curation better”! Our group here also does acquisition of data files for library assets, and that space is really fascinating because some of that is highly structured and coming from a vendor so we’re essentially curating a purchased data set as a library asset. We can see the problems there with the structure and the documentation, so anything that we can do to contribute to a community of curation, we think is good. It makes everybody’s life easier.
Why is the Data Curation Network important to you?
I do think that it is essential to share knowledge about how to curate, particularly as things change – we’re dealing with new sizes of data, new genres of data, more interdisciplinary use of data which has new documentation requirements – all these things require constant knowledge sharing among the community of people that are responsible for curation. I also think it’s a strong value to keep IR’s going and make them a salient part of the conversation. We see the DCN as a great way to ensure that can happen at scale by pooling resources, for instance. It’s expensive to guarantee that data will be available. Caring for an object that long is a big resource investment and personal investment, so it does require a lot of protection and good planning to do it.
If you weren’t doing data curation, what would you be doing?
If I weren’t curating data, I’d be making data, which I really do enjoy! I do a bit of work researching historical language shifts and social linguistics, just for fun.
What’s your favorite cuisine?
Thai food, but something not too spicy!
What do you like to do outside of work?
I like to do two things at the moment. One is read for pleasure, which is hard to find time for, but I’ve rediscovered it during the pandemic. In particular autobiographies and biographies; I just read a historical nonfiction work on a 17th century shipwreck. And then the other thing I’ve been doing recently is learning about micro-computing. This is like standing up minimal computers – little devices you can run a lightweight Linux on, and for example, stand up a server for my personal website on. They’re often used in engineering for scanning devices or things like that because they’re mobile and small. It’s fun to work on something technical, but not data. I don’t know much about it, but it’s fun to try and learn!
To learn more about Nicholas, and the datasets he has curated for the DCN, see his curator page!