Tracking Open Access Usage

Blog

What did we discuss?

The expert panel lifted the lid on how open access usage is tracked so all stakeholders in the OA process have up to date information about who is looking at what, from where and when. How do we know who reads open access? What metrics can we use to track open access usage? And why is this important?

If you are an author, library, funder, or institution, you naturally want to know who is reading your articles and how often! We put so much time, effort, and resources into research, we want to know if it has an impact – it’s why we do the work we do. Authors want to see readers engaging with their work and institutions want an overview of how well their research is received. Open access usage statistics help libraries check OA agreements they have negotiated with publishers, and funders can use the data to figure out if their OA policy is successful.

Open access usage data is, of course, valuable for publishers who want to reach out to libraries, institutions, and funders with meaningful information to negotiate open access agreements. And like the other stakeholders, it helps publishers check ongoing agreements, and supplies an overview of which services a library might require going forward.

Consistent, credible, and comparable

Let’s jump right into how open access usage data is collected, and what the challenges are. All of our panelists spoke about COUNTER, a non-profit organization “that enables publishers and vendors to report usage of their electronic resources in a consistent way” as a cornerstone of their data collection strategies. COUNTER advocates for consistent, credible, and comparable data. Through COUNTER there are standard metrics and methodologies that are agreed between publishers, institutions, libraries, and vendors for tracking open access usage. Everyone agreed that the standards allow them to pull the data together and make sense of it.

It sounds like having standards that are transparent and agreed upon across all parties is great! And that’s true - having this foundation is an undeniably fantastic place to start building upon, but this is also where things start to get interesting. Imagine you are reading the latest open access article in your field from your institutional library – what kinds of things are being tracked and why? What kind of story can this tell?

Firstly, it is important to note that tracking takes place within GDPR guidelines. Andrew was keen to stress: “no personal data is stored or shared as part of this process. It is all about tagging usage to organizations and never trying to associate usage with individuals”. PSI & IPREGISTRY can show which organizations are reading what content through the IP address - 97% of usage comes through IP address validation and the majority of use still comes from campus. This is important so that publishers can go to organizations that are using their open access content, show them how much they are reading, and ask the organizations to support their offering.

Some of the usage is open to abuse. A publisher gave an award for the most read open access article of the month. When they looked at the data, all the usage came from a single IP address. You really have to be in depth with your analysis of your usage, rather than just looking straight at numbers. 12-15% of usage can be from robots, crawlers or Sci-Hub. So, it is important to be thorough!

Andrew Pitts, PSI & IPREGISTRY

A click is just a click… or is it?

We’ve learned that COUNTER agree standards for what is collected and how that is reported. However, in reality the data can come in different formats! There is debate about what should be used as a unit of measurement – the article or the journal, the book chapter or the book?

It’s tricky to track OA usage in hybrid journals because they offer both open access and paywalled articles. Stuart explained that there are changes to the COUNTER standards being proposed that will show OA usage more accurately in hybrid journals. One way is switching to the article as the unit of reporting rather than the journal title. This would allow hybrid publications to accurately show the open access usage by splitting out open access and paywalled articles. In theory the open access articles are the ones that should have the highest usage. And this is exactly the kind of hypothesis that can be tested if reports are based on articles rather than at journal level!

Laura mentioned that Jisc’s partner service JUSP (Journal Usage Statistics Portal) are currently exploring item level reporting for this reason. It helps them analyze their consortium deals managed through Jisc Collections. They noticed something unexpected happen when they began to analyze the usage data: “on occasion when things have been made open access, usage had dropped off, which would not be expected! Even although actual usage has gone up.”

Surely not! How is this possible? Cleaning and auditing data means removing usage made by bots and spiders, double clicking, and making sure all the formats of an article or chapter count as one item. Stuart pointed out - “What is the definition of a hit? We have metrics for the total number of investigations and requests but also unique items and requests. We can see if multiple clicks happen too often (30 second rule). An item is an item even if it is html and pdf, so these are counted as the same thing.” In the end, this supplies data for stakeholders on unique usage, which can be compared against the total usage.

The OA elephant in the room

One of you asked a great question that got the panel debating – thank you!

What prevents the emergence of non-publisher and non-institutional platforms that host Creative Commons licensed content, and have no interest in reporting to institutions or funders, making centralized OA reporting less meaningful?

This was a big elephant in the room. How do you get usage statistics out of Research Gate for example? Libraries must make decisions about how they negotiate read and publish agreements or where to recommend that authors publish – this can only happen is usage is fed back and tracked. These services receive a lot of traffic, so their usage is important to get the complete overview.

Romy agreed that different versions of usage need to be available to supply even better-quality data. The big question for another day was - how do you incentivize these services to share usage statistics? There should be an incentive to share their usage as part of the community so that authors and institutions can make informed decisions.

What’s the future of OA usage tracking?

We’ve learned about the current challenges faced when tracking open access usage – what are the panel going to be focusing on next?

For Laura at Jisc the next few months will be focused on engaging libraries, funders, and publishers so they can start understanding the challenges they face and finding practical ways to solve them. Jisc will be spending time considering the huge impact the new raft of regulations coming in with COUNTER 5.1 will have – how are all the different stakeholders going to implement it?

Stuart noted that Scholarly IQ has a similar approach to Jisc. They’ll be spending time asking: what do the stakeholders need? And what will help make open access usage tracking more meaningful for them? They are working on creating data bubbles that will match usage data with other information to give a richer view of the landscape.

To create a more streamlined experience Andrew shared that PSI & IPREGISTRY have been in discussions to have ORCIDs added to the data. If they can gather all the relevant information, this could mean authors would be able to login and see all their metrics in one place. The next step would be giving libraries and institutions access to this data too – the most important thing is making all the metrics available to the stakeholders.

From the discussion it is clear there is a real drive to think about how the metrics can be made as widely available to include as many authors, libraries, funders, or institutions as possible. The overall ambition is to show how OA is adopted and growing – something everyone wants to be a success!