The Role of the Data Librarian

Yale has a long history of research data management and a dedicated data librarian role. This position has traditionally been functional, allowing the librarian to focus on data sources as a format regardless of discipline. The data librarian supports researchers with inquiries about finding data sources, understanding the structure and organization of data sources, and connecting the researcher to statistical and methodological support. As data sources have become increasingly web-based, the role has evolved, creating complex workflows and legacies. It would be rare to encounter a data librarian who has not inherited or received a box or two of CD-ROMs, disks in various formats, and stacks of documentation that are kept “just in case.” As recipients of some of these treasures, we discovered that not only was Yale University Library “one of the first academic libraries to form a collection of machine-readable data, [and] began acquiring social science numeric data in 1972,” the library had a Social Science Data Librarian position since at least 1977 (Green et al., 1999). The role’s early responsibilities focused on the Yale Roper Collection, highlighting preservation and data management as critical functions.

Over time, the position morphed with changing data formats and volumes. A turning point came in 1997, during a time that is frequently cited as when growth was generally happening in data services in the United States and Canada (Geraci et al., 2012). An internal memo from then Associate University Librarian Donald Waters referenced a newly created Data and Electronic Services Librarian position. The role was established to acknowledge the work of the library and the newly formed Social Science Statistical Laboratory (Statlab), which together would share the core responsibilities of developing and maintaining a collection of data and ensuring its practical use. Under this new arrangement, the library would lead the way in building and maintaining data collections, and Statlab would support data analysis. The library’s collection of data would become the Social Science Data Archive (SSDA), which would include government data, survey data from Yale’s Economic Growth Center, and a variety of acquired and licensed datasets. A key point in the 1997 memo addressing the relationship between Statlab and the library highlighted a necessary shift from a collections-based organization to a service-based organization to reduce research barriers. Consequently, transitioning to a service-based focus changed the data librarian’s job. In addition to managing data and data collection, there was a more significant role in the larger research data lifecycle, from discovery to disposition.

Today, the services provided by Statlab remain primarily unchanged. The Statlab team assists with code debugging and data analysis and offers statistical advice to researchers from across the university. In contrast, the role of the data librarian has expanded significantly. With the growth of open science and the rise of data-sharing mandates, the data librarian’s responsibilities now include support for research data management. Meanwhile, the library has begun licensing more and larger data sets in response to researcher requests. This also creates new opportunities and work related to license review, data use agreements, and data governance and compliance. As research becomes increasingly multidisciplinary, researchers often seek unique and specialized data, which adds the responsibility of ensuring the requested data matches the researchers’ needs.

While the work of a data librarian is diverse and rewarding, it also presents many challenges and questions. Is it realistic to expect data librarians be well-versed in the data collections they curate, maintain, and access? More importantly, are there potential collaborators within the library who are engaged in similar work who should be part of the process? Emphasizing the importance of collaboration within the library can promote a team-based approach to data management, reducing the burden on individual data librarians.

The Role of the Business Librarian

The Research and Learning Division at Yale Library has over fifty subject specialist librarians with a wide range of expertise. The subject librarian role typically involves reference and instruction, outreach, and collection development responsibilities to support the teaching, research, and learning for specific academic departments and schools. For example, business librarians primarily focus on supporting the School of Management. Historically, professional schools at universities had a dedicated library with specialized resources and staff to cater to their specific needs. However, many subject or departmental libraries have now closed or merged with other libraries on campus for various reasons. Younger professional schools may never have had a library. This is the case at Yale, where the School of Management did not open until 1976.

Today, a dedicated liaison librarian or small team provides specialized research support and services to the School of Management. What often surprises non-business librarian colleagues is that these liaisons spend most of their time helping researchers identify, find, and use quantitative and qualitative data. In addition to selecting journals and monographs, business librarians are involved in licensing data delivered through various mechanisms, ranging from platforms to direct feeds. Researchers often ask specific questions about the data, including collection methods and how variables were calculated. When this occurs, a business librarian will review documentation, make connections with the vendor, and work to answer similar questions.

Where Data Librarian and Business Librarian Roles Intersect

When addressing data requests, a few foundational principles have emerged. First, domain knowledge is necessary to support the researcher effectively. Additionally, researchers benefit from working with librarians who can provide access but have a certain level of knowledge about the data. Lastly, from the researcher’s perspective, it does not matter who the librarian is as long as they can get access, or their question answered. It has been our experience that students and faculty generally contact a librarian with whom they have had a successful interaction. To address these, the data and business librarians started working together to share knowledge and information, ultimately enhancing research support and productivity. This approach combines the collection and service mindsets mentioned earlier by maximizing the knowledge and skills of all team members. For example, the data librarian may be familiar with the structure and organization of a dataset based on previous work. The business librarians may also be familiar with this data and can confirm if it was successfully combined with another data set using a specific common identifier. Everyone contributes and learns by working together, and we pass this insight on to our researchers.

Accomplishments and Growth

Through our team collaboration, we have been able to expand our services and help more researchers. Each team member has developed new skills, such as improved negotiation abilities and a better understanding of our licensing requirements, which led to faster license approvals. Our relationships with faculty, students, and staff have become instrumental in obtaining feedback on data products from different vendors. These relationships are like a shared Rolodex, making identifying potential candidates to vet data and content in new products easier. We have strengthened our relationships with strategic partners, including campus IT offices, library technical services, and library electronic resources. This has helped open the lines of communication, allowing us to better advocate for our researchers when our colleagues make decisions about infrastructure and access. Perhaps most importantly, we share information through tracking data requests and regular discussions called “Data Matters.” These conversations can cover everything from new requests, changes to data products, and new documentation from a vendor.

Our ongoing collaboration has been valuable when assessing new data requests and sources. We are heavily involved in data acquisition, including conducting due diligence to find data, building solid relationships with vendors, and budget planning. Our approach is customer-centric and hands-on, striving to accommodate as many requests as possible. However, obtaining licenses for datasets and other resources is complex and time-consuming because many products’ license clauses and delivery mechanisms do not typically work well with traditional acquisition workflows. One of the most time-consuming tasks is managing access to resources that are not standard for libraries. We develop processes for approving account requests, managing user credentials when existing IDs do not work with a new product, learning different download modes to assist researchers, and creating documentation for faculty and students.

Our team approach is invaluable because it helps extend our reach. We brainstorm, assess, and improve workflows and offer to tackle tasks when others cannot. This might involve filling out acquisition forms for internal library processes or consulting with a doctoral student who has a particularly challenging question. Each of us has other responsibilities related to our roles, and it is easier to prioritize our research community when we are not doing so as individuals. With this approach, we have identified new solutions to old problems, identified gaps in our data holdings, and moved toward intentionally developing a collection of data products and resources.

Focus on the Future

In an ideal state, we would see increased staff to meet the rise in data requests, and each discipline would find the best approach to meet researchers’ data needs. But to be realistic, researcher needs will always outpace staffing growth. For example, the faculty in the School of Management increasingly have joint appointments with the Economics Department, the School of the Environment, and the School of Global Affairs. This essentially doubles the faculty and doctoral population the business librarians serve and exponentially increases the number of data requests and consultations the data librarian receives. Additionally, data sources and technology will continue to change, and it will take time for new staff to gain the experience necessary to tackle these complex problems.

Despite these challenges, we have a path forward for the future. Our team is built on trust and united by a common goal that allows us to expand our services. We are stronger together, even with different views on approaching a problem or request. Our shared purpose keeps us moving forward. Over time, we must care for ourselves and create shared workflows and documentation to sustain this model and evolve as we continue to receive work.

References

Geraci, D., Jacobs, J., & Humphrey, C. (2012). Data Basics: An Introductory Text. https://doi.org/10.7939/R3251FK8Fhttps://doi.org/10.7939/R3251FK8F

Green, A., Dionne, J., & Dennis, M. (1999). Preserving the whole: A two-track approach to rescuing social science data and metadata. The Digital Library Federation.