It’s no secret that business data contracts are a peculiar beast, and one that librarians and researchers have long had to wrestle with.

On May 20th, 2025, I had the opportunity to speak about this very issue at Joint Meeting Vienna1— a room full of smart, resourceful, mildly jet-lagged business librarians and vendors who are all too familiar with the contractual tug-of-war that plays out between researcher needs and business data supplier protections. My talk, Advocating for Change: Enhancing Business Data Contracts for Academic Research, focused on some of the challenges we face, and changes we can advocate, to make business data licensing more researcher-friendly without compromising vendors’ legitimate concerns. I’m pleased to say the presentation resonated with the experiences of many and was warmly received, with both librarians and vendors offering thoughtful engagement (and, in some cases, even cautious optimism).

This article is a chance to revisit and expand on that presentation and I hope it will provide some practical food for thought.

Setting the Scene

As Meg Trauner pointed out in her 2017 article, “Buying the Haystack: New Roles for Academic Business Libraries,”

With increasing frequency, faculty members are requesting a different kind of resource—stand-alone data sets that are not widely available to the library market and not available through WRDS. The seller often withholds university-wide use, and in many cases is not set up to offer it. … These stand-alone data sets are usually custom extracts, often one-time purchases of specific data that an individual or small group of faculty members needs for their research. These data sets involve a number of licensing, hosting, curating, and funding issues that librarians must resolve if they want to serve their institutional research agenda. (Introductory section, para. 3 & 4).

Business librarians are often caught between two worlds. On one hand: academic researchers, increasingly in need of robust, reliable, and accessible business data to push forward the conversation in their respective field. Such data—often required as quickly as possible—is the raw material that fuels rigorous research, supports evidence-based analysis, and enables the kinds of collaborative projects that drive innovation. These projects frequently span disciplines, institutions, and territories, requiring shared access to consistent, high-quality datasets. With the right business data, researchers can test theories, uncover trends, and produce findings that are not only publishable but also impactful in real-world contexts. Without it, much of this collaborative, high-value scholarship simply couldn’t happen.

On the other hand, we have our vendors—the data suppliers whose products underpin much of that research. They are, quite understandably, focused on protecting the value of their intellectual property, ensuring their data isn’t misused, copied, or redistributed in ways that could undermine their business model, breach client agreements, or erode their competitive advantage. Vendors welcome academic use, but not unrestricted academic free-for-alls.

They’re also keen to protect the contractual rights of their own third-party data suppliers. Many vendors license or buy parts of their datasets from other companies, who live further upstream and also contractually limit how the data can be used. Vendors have a legal obligation to honor these contracts and make sure their own customers (such as universities) do the same.

Somewhere in the middle of these two groups stands the academic librarian: contract negotiator, policy interpreter, internal educator, and, often, reluctant middle manager of inter-party anxieties. It’s really not as glamorous as it sounds.

While there is much goodwill on both sides, the contracts that govern business data use in academia often lag behind the reality of modern research practice. They can be rigid where flexibility is needed, vague where clarity would help, and risk-averse in ways that create unnecessary friction. My argument is not that vendors must concede everything, but that we as a collective academic community need to advocate more confidently for researcher-friendly licensing terms that still respect the value and integrity of proprietary data.

In this piece, I want to focus on three key pressure points where meaningful improvements might make the biggest difference:

  1. Data retention: Striking the right balance between researchers’ need to preserve data for the completion of their research and vendors’ concerns about unauthorised long-term storage;

  2. Liability and institutional/individual risk: Revising indemnification and warranty clauses so academic institutions and individual scholars/budget holders aren’t left shouldering disproportionate legal burdens, and;

  3. The AI conundrum: Crafting usage terms that accommodate machine learning and generative AI research without compromising the integrity of proprietary datasets.

In the sections that follow, I’ll dive into each of these friction points, unpack current stumbling blocks—with real-world examples, and propose practical tweaks to contract language. Along the way, I’ll share insights from vendor conversations.

1. Data Retention: Subscription Terminations and Academic Realities

Let’s begin with data retention, an issue that lies at the heart of the disconnect between the lifecycle of business data subscriptions and the lifecycle of academic research.

Here’s a typical clause from one of the many agreements we, at The University of Manchester Library have reviewed over the years:

Upon expiry or termination of this agreement… you agree to immediately delete all data downloaded from the platform…

It sounds tidy. Final. Sensible. In essence, many standard agreements require all researchers to delete all downloaded data, in all forms and from all places, immediately upon the end of a subscription. This includes any data that has been merged, modified, or amalgamated into wider datasets. The requirement applies regardless of whether the research is ongoing, under review, or due for publication.

This is where things start to fray. While subscriptions often operate on neat 12-month cycles, research timelines rarely do. In practice, the gap between starting a project and seeing it published (or even accepted) can stretch across 18 to 24 months, or longer. Even if the data was used entirely within a valid subscription period, the paper built upon it may be lost in the labyrinth of peer review, revisions, and resubmissions well beyond the subscription contract’s end.

Of course, there are also delays well outside of the researcher’s control—all the usual suspects: incomplete datasets/delays in receiving missing data points from vendors, the need for extensive cleaning and normalizing, changing research directions, sabbaticals, administrative overload, and, of course, teaching. To compound matters, very often funding does not allow for the renewal of subscriptions.

In such circumstances, deleting the data the moment a subscription lapses feels less like compliance and more like self-sabotage.

A More Nuanced Clause

In response to this persistent problem, we drafted the following clause at the University of Manchester Library. It’s not revolutionary, but it is practical and, crucially, it’s designed to balance the needs of researchers and rights holders:

The Authorized Subscriber may publish its research papers after the Term of this Agreement if the research carried out in connection with XYZ Data has been initiated during the Term. In connection with the aforementioned research, the Authorized Subscriber may retain a relevant amount of XYZ data to substantiate its research findings and for internal compliance purposes. Such XYZ data cannot be further used, and no other research can be based on the data retained. Subsequently, all retained data will be expunged in all forms and from all locations.

This clause achieves several things:

  • It gives researchers clarity: They know what they can retain, and under what conditions.

  • It allows for a time-limited, purpose-specific form of retention that supports transparency, compliance, and scholarly integrity.

  • It simultaneously reassures vendors: No further research can be done with the retained data, and it must ultimately be deleted.

We’ve been advocating for this clause to be included in our business data agreements during acquisition. The responses have been telling.

At one end of the spectrum, a small minority of vendors have simply declined—a firm “no,” sometimes with legal boilerplate attached. At the other end, an equally small number already include similar provisions in their standard terms which will either suffice or may require slight tweaking.

The majority, however, when asked, have agreed to either insert the above clause word-for-word into the main contract or append it as a supplemental document. It’s not always immediate, but with a bit of explanation, most vendors come to see that it’s in everyone’s interest.

That win rate suggests most vendors are open to a retention compromise, provided it’s phrased clearly and tied strictly to the original research purpose.

Pushing for Smarter Data Retention Terms

As a community, we recognize that adapting retention clauses to reflect real research lifecycles isn’t an academic luxury. It’s a practical necessity. From our position at the rockface (albeit with less rock and more data), we can collectively push for greater consistency across agreements, particularly in this area.

Judging from experience and from feedback I received at JMV 2025, I sense there is willingness on the part of business data vendors to work collaboratively with librarians and researchers to design data retention clauses that are:

  • Clear in their scope and expectations, avoiding ambiguity.

  • Timebound but fair, allowing data to be retained for the purpose of concluding or verifying a given research project, but with guardrails to prevent misuse.

  • Realistic about research workflows, which span months or years, and often involve multiple collaborators, institutions, and publication hurdles.

Next, let’s turn our attention to another thorny issue: liability and risk management.

2. Liability: Institutions are not Insurers of Last Resort

From the vendor’s perspective, the logic is straightforward: data is valuable, and they want to ensure it’s used correctly. If someone misuses it, there needs to be a mechanism for redress. Thus, the contract typically points to the institution, “Authorized Users”, or sometimes, to the contract signatory. It’s clean, it’s enforceable (in theory, at least), and it provides peace of mind to the vendor.

From the perspective of the institution, however, this approach is deeply problematic as it generally demands the institution take on a level of risk that is neither proportionate nor realistically manageable:

  • University procurement and legal offices are often reluctant to approve contracts that expose the institution to unlimited indemnity or to vaguely defined liabilities.

  • Researchers and budget holders balk at signing up to terms they can neither monitor nor enforce.

  • Libraries are often unfairly expected to police hundreds, or even thousands of users across dozens of datasets—without the ability to do so.

There’s a kind of contractual cognitive dissonance at play: on the one hand, vendors are selling access to institution-wide platforms and data environments that are, by their very architecture, intended to be used by large, distributed groups of people. On the other hand, the liability clauses often assume a level of institutional control that simply doesn’t exist. This is especially the case where access to a dataset is offered at scale, such as through platforms like WRDS.

As a slight side point, it’s worth noting that all too often the design of a business data product does little to prevent the very misuse the vendor’s contract is designed to punish. This leaves institutions in an impossible position: they’re held to strict contractual expectations that are, in practice, undermined by the product’s own design.

In a recent conversation with a legal specialist at one of our vendors, I asked why liability in their business data contracts is framed the way it is. His answer was straightforward: they are used to dealing with the corporate world, where it’s normal—and legally viable—for a company to take action against employees who misuse resources. Within that context, the liability model feels balanced.

Universities, however, are a different world entirely. Thousands of students may have access to a resource, and the institution cannot monitor every action or prevent every possible breach. Yet the contracts we receive rarely account for this fundamental difference.

The vendor’s counterpoint was equally pragmatic: from their perspective, students pose an asymmetric risk. They have no contractual leverage over an individual student, so their only option is to direct liability toward the institution, researcher, or contract signatory.

The conversation concluded with me explaining that this is not just unhelpful; it’s unworkable.

A Fairer Approach

To illustrate a better path, here’s a sample clause drawn from a real-world agreement that offers a more balanced perspective:

Nothing in this License shall make the Institution liable for any act by any Authorized User which gives rise to a breach of the terms of this License, provided that the Institution did not cause or knowingly assist or condone the continuation of such breach after becoming aware of an actual breach having occurred.”

The above recognizes that institutions do have a duty to educate users, to set expectations, and to respond appropriately if a breach occurs. But it also acknowledges a fundamental truth: Universities cannot—and should not—be expected to act as guarantors for every user’s behavior.

This clause offers a middle ground born of a partnership mentality. It doesn’t absolve institutions of all responsibility, but it protects them from being punished for things they didn’t cause and couldn’t have prevented.

In my experience, when vendors are willing to include language like this—or when they at least engage in a dialogue around liability—it fosters trust. It shows an understanding of institutional realities and helps the contract become what it ought to be: a mutual framework for responsible use, not a legal game of hot potato.

Practical Strategies for Shared Responsibility

Taken a step further, vendors could work with us to limit or even prevent misuse by providing:

  • Usage dashboards: Real-time monitoring tools that highlight suspicious activity patterns (e.g., bulk exports at odd hours).

  • Tiered access controls: Define roles—librarian, researcher, tutor, student—with differing privileges to limit high-risk operations.

  • Mandatory training modules: Short e-learning sessions; completion certificates helping to demonstrate “reasonable steps” in the event of a breach.

  • Audit trail logs: Retaining system logs for a defined period, so investigations can pinpoint where and when a misuse occurred.2

These controls not only reduce true risk but also signal to legal teams that the vendor is serious about helping prevent misuse, making liability negotiations far smoother.

What We’re Asking For

So, what are we asking of vendors, really? Not much, in the grand scheme of things:

  • Trust us to manage access responsibly and provide us with the necessary product design and tools to do this.

  • Don’t expect institutions to carry unlimited liability for unauthorised actions we neither condone nor can control.

  • Recognise that partnerships go both ways. Contracts should protect the rights of vendors, but not by turning universities into underwriters for the unknowable.

To put it another way: If we’re all serious about enabling high-quality research, we need our business data contracts to stop treating institutions like potential offenders and start treating them like partners.

3. The AI Conundrum

And now to our third, and perhaps most volatile, area of friction: AI. It’s fair to say that artificial intelligence is the latest frontier where contracts (plus universities and vendors in general) are scrambling to catch up with the pace of technological change. Trying to regulate AI can feel like trying to regulate a tornado.

Here’s a clause from a recent business data license:

[The researcher is not allowed to] use the Subscribed Products in combination with an artificial intelligence tool (including to train an algorithm, test, process, analyse, generate output and/or develop any form of artificial intelligence tool) except where such artificial intelligence tool is used locally in a self-hosted environment and does not share the Subscribed Product or any part thereof with a third party.

On the surface, it appears to make sense. Vendors, after all, are rightly concerned about uncontrolled exposure of their data to large-scale AI systems, especially cloud-based ones whose inner workings are largely opaque, and where ownership of content is uncertain. But in practice, this kind of clause poses enormous challenges for research.

The Scope of Prohibition

Let’s break it down. According to the above clause, a user cannot:

  • Feed licensed data into a machine learning algorithm for training or testing.

  • Use AI tools, such as TensorFlow, PyTorch, or commercial platforms, to analyse or process the content.

  • Generate summaries, annotations, visualisations, or other AI-driven outputs based on the data.

  • Develop new AI tools or derivative models that use the licensed content as a foundational dataset.

In short: don’t even think about using AI in any meaningful way.

The only sanctioned scenario? A self-hosted AI tool running entirely within your secure network, never touching the cloud and never exposing the data to any external service.

Why This Falls Short

Now, one can understand the intention here—to prevent proprietary content from being ingested by third-party AI tools, scraped by opaque algorithms, and redistributed without consent. But the implementation, frankly, leaves everyone in a bit of a muddle. Consider these examples:

  • A multilingual student translates licensed excerpts using MS Word’s built-in AI translation tool, with no indication that the content is handled by external AI systems.

  • A faculty member opens a dataset in Excel, unaware that Microsoft 365 Copilot is quietly suggesting trends or generating insights in the background.

  • A researcher opens a PDF, only for Adobe Reader to automatically scan the PDF and suggests annotations, powered by a cloud-based AI engine.

Are these breaches of contract? Under some current contracts, quite possibly. For the institution to determine that a breach had occurred would require a blend of clairvoyance, telemetry access, and a particularly alert IT department. And therein lies the problem: these clauses set a standard that is not only difficult to interpret but impossible to enforce.

More worryingly, such clauses risk stifling legitimate academic use. Many AI research environments are cloud-native. Insisting on local-only deployments stymies collaboration and scalability. Researchers don’t want to violate contracts; they want to produce world-class research. But if the terms are unclear—or so rigid as to make everyday workflows risky—the contract becomes ineffective.

It’s apparent that there’s an increasing divide between contractual restrictions and everyday use.

A Practical Middle Way

AI is evolving faster than most contracts can keep pace, but instead of drafting increasingly defensive clauses, data suppliers need to accept that some degree of AI integration is now baked into how software operates. Trying to prohibit it outright is like trying to ban the internet—and about as enforceable.

A more practical response lies in reasonable use and achievable protections. Guidance from organizations like the Joint Information Systems Committee, a UK-based not-for-profit organization that provides digital solutions for education and research, offers a promising way forward. Their AI licensing principles encourage vendors to avoid:

  • Prohibitions that block legitimate educational or research activity.

  • Clauses that require impossible levels of monitoring.

  • Language that shifts new, AI-related liabilities onto institutions (Taplin, 2024).

This kind of thinking respects both the integrity of proprietary data and the dynamic nature of academic work. It acknowledges that a faculty member using AI to write an abstract is not the same as a startup trying to train a commercial LLM—and that contracts should reflect that distinction.

I think there is a recognition of this problem amongst some vendors who are making moves in the right direction. I noticed the following in a recent contract I reviewed:

Notwithstanding the foregoing, Licensee may use the Data Content for internal purposes within the parameters set forth herein with general purpose business productivity software tools which contain AI functionality incidental to such tools’ general purpose and intended use, e.g. word processing software, email software, and spreadsheet software, provided such business productivity software and the AI functionality are implemented as a private instance that is not accessible to third parties and is solely for Licensee’s use.”

In other words, although researchers can’t use the data with AI technologies such as machine learning or neural networks, they are allowed to use the data internally with everyday software (like Word, Excel, or email) that might have built-in AI features, as long as the licensed content and resulting output are only available to authorized users, and the built-in AI features are incidental to the core purpose of the software in question.

Another side point: We are also seeing vendors develop and integrate their own AI systems within their products. While this is an important step in the right direction, and one I genuinely applaud, it comes with an inherent flaw: these tools, though naturally contractually compliant, are confined to the vendor’s own product or product suite. They cannot replicate the appeal of third-party AI platforms, which allow researchers and students to amalgamate business data from multiple vendors into a single analytical environment.

So, what would AI-allowing but risk-respecting clauses actually look like? Rather than erect an AI prohibition wall, contracts can embrace realistic guardrails:

Table 1.

Suggested AI contract language

Aspect

Standard response

Pragmatic Contract Language

Training AI Models

Absolutely forbidden

Allowed if model weights are not redistributed; metadata logs retained

Analytical Queries via AI

Blocked unless self-hosted

Permitted on approved cloud platforms with access controls

Generative Summarisation

Disallowed

Permitted for internal research and teaching, with no external redistribution

Model Development

Void for AI projects

Allowed for non-commercial research; results must not be sold

Progress will depend on proactive collaboration between libraries and vendors. Potential next steps could include:

  • Piloting sandbox environments where librarians and vendors can observe AI workflows in action.

  • Co-designing AI governance playbooks—determining ‘approved’ platforms, model-sharing thresholds, and audit protocols.

  • Regularly reviewing and updating AI clauses to reflect evolving machine learning best practices.

If done collaboratively, these measures could pave the way for vendor-approved and researcher-friendly AI clauses that provide the flexibility academics want, without undermining the protections vendors need.

Contractually Compliant AI Systems

Is there scope for libraries and vendors to co-develop a researcher-friendly AI environment—legally compliant, vendor-approved, and scalable across the academic sector? At least theoretically, yes.

At present, there is no single third-party AI platform that is both:

  • Widely accepted by multiple business data vendors, and

  • Explicitly designed for legal, contract-compliant integration of their datasets into one AI-powered environment.

That absence is the main reason vendors take such a cautious—sometimes outright prohibitive—stance toward tools like ChatGPT or other public cloud AI systems that:
  • Send data to external servers outside the institution’s control (raising issues of ownership),

  • May retain or train on uploaded data (depending on settings), or

  • Lack any explicit, vendor-approved compliance framework.

From a vendor’s perspective, the risk with these tools is uncontrolled dissemination. A single careless prompt could push significant amounts of licensed content into a third-party AI model, breaching intellectual property rights and upstream supplier agreements.

In contrast, a contractually compliant AI system would need to meet strict vendor requirements:

  • Operate in a closed, institution-controlled environment (self-hosted or private cloud).

  • Enforce strict data governance—no data leaves the system, and all usage is logged.

  • Apply per-vendor compliance controls to reflect each licence’s AI-use restrictions (if shared standards cannot be established).

  • Train AI models only on allowed datasets (or synthetic derivatives).

  • Have formal legal agreements with each participating vendor (mirroring the approach taken by WRDS, for example).

There are small-scale efforts in this direction. Some researchers have begun developing “private ChatGPT” equivalents on university servers, fine-tuned on institutionally licensed datasets. However:
  • These are typically project-specific.

  • They are not multi-vendor, multi-institutional platforms with blanket approval.

  • They lack the scale and contractual framework that would reassure vendors.

In the absence of such a compliance infrastructure, vendors are unlikely to greenlight researchers to freely process their licensed data through a general-purpose AI tool that merges multiple datasets.

Conclusion: Contracts That Work for Research, Not Against It

If we’ve learned anything from the three areas covered here—retention, liability, and AI—it’s that business data contracts often default to a posture of fear: fear of misuse, fear of loss, fear of complexity. That fear manifests as rigid clauses, imbalanced risk, and rules that not only miss the working realities of researchers but can actively inhibit scholarship.

What I’m proposing is not an abdication of responsibility, nor a naively open-handed approach to data stewardship. It’s a recalibration. We can have rigorous, protectable, and commercially respectful agreements, aligned to the reality of research. The bridge between those goals is built on mutual understanding, not unilateral imposition.

Beyond the language of clauses, the real work is relational. Vendors who come to the table open to dialogue and willing to understand academic workflows, to negotiate in good faith, and to embed reasonable exceptions or clarifications are the ones building durable partnerships. Librarians, in turn, can do more to surface use cases, explain compliance issues, and act as translators between scholarly intent and commercial terms.

So here’s the ask: When the next acquisition or renewal comes around, don’t treat the contract as a static form to be signed under duress. Treat it as an opportunity to open a conversation and push for the changes we clearly need to see.

I’d like to end with a few practical tips:

Build relationships with your vendors. Take up offers to meet, make introductions, and stay in regular contact with your sales reps. Stop by their stands at conferences. Having a personal connection can make all the difference when you need to resolve tricky contractual issues.

Don’t fight every battle by email. Keep a record of key conversations, yes, but avoid letting difficult discussions turn into a war of attrition in your inbox. A short meeting with the right stakeholders (even online, but ideally in person) often achieves more, and faster, than weeks of back-and-forth messages.

Address concerns, not just positions. In recent talks with two vendors, we reached far more accommodating AI clauses by acknowledging that their concerns were valid in the commercial sector, but less applicable in academia—and, in fact, risked prohibiting legitimate research.

Watch for unusual stipulations. Some contracts require resulting publications to acknowledge the data source—sometimes with specific wording. Others demand that authors supply the vendor with abstracts or even copies of pre- or post-publication papers. Part of your review should be checking whether potential authors can (and are willing to) comply.

Query the incomprehensible. If (read: when) you find a clause that makes no sense, even after the fifth read, ask the vendor. You may discover they don’t understand it either. In one such case, I was asked by the vendor to ignore a clause, as it was outdated that shouldn’t have been in the contract at all!

Notes

  1. JMV 2025-A joint meeting of the American Business Library Directors (ABLD), Asia Pacific Business School Librarians’ Group (APBSLG), Consejo Latinoamericano de Escuelas de Administración (CLADEA), and the European Business School Librarians’ Group (EBSLG), as well as representatives from other parts of the world.
  2. However, such measures would need to be carefully balanced against data protection requirements. In some jurisdictions—particularly within the EU and UK—making access to a resource conditional on additional data collection or monitoring could conflict with privacy regulations such as GDPR. Any implementation would therefore require clear institutional consent pathways and proportional data handling practices.

References

Taplin, B. (2024, June 19). Guidance on resisting restrictive AI clauses in licences. National Centre for AI. https://nationalcentreforai.jiscinvolve.org/wp/2024/06/19/guidance-on-resisting-restrictive-ai-clauses-in-licences/https://nationalcentreforai.jiscinvolve.org/wp/2024/06/19/guidance-on-resisting-restrictive-ai-clauses-in-licences/

Trauner, M. (2017). Buying the haystack: New roles for academic business libraries. Ticker: The Academic Business Librarianship Review, 2(2). https://doi.org/10.3998/ticker.16481003.0002.201https://doi.org/10.3998/ticker.16481003.0002.201

Further Reading

Silver, B. (2019). Help! I’m new to licensing and don’t know where to start. Ticker: The Academic Business Librarianship Review, 4(1). https://doi.org/10.3998/ticker.16481003.0004.104https://doi.org/10.3998/ticker.16481003.0004.104

Author Bio:

Amar Nazir is the Engagement Librarian to Alliance Manchester Business School with a specialism in business data management, at the University of Manchester, United Kingdom.