Name: Citations, Data
Can apply to: Research data sets
Metric definition: The number of times a journal article or book has referenced a data set.
Metric calculation: Data citations are sometimes collected only in the formal sense (i.e., with the data set being listed in the References section of a paper, alongside journal articles). They can also be calculated in the informal sense (i.e., linked to from within the Methods section of a paper). It varies from tool to tool.
Data sources: Data Citation Index, Google Scholar (rare)
Appropriate use cases: Data citations should be used to understand how often research data has been reused in others’ studies, thereby indicating advancement of the field. Some fields (e.g.,crystallography and genomics) practice data citation at higher rates than others, and therefore evaluation of research from those fields may be more suitable scenarios for using data citations.
Limitations: Data citation is still relatively rarely practiced, with only half of journals providing instruction for how to cite data and more than 88% of all Data Citation Index records going uncited. Lack for formal referencing poses a challenge for using data citations from tools that only count such formal references in their data citation metrics. Critics of data citation claim that data citations merely mimic existing metrics that do not “recognize all players involved in the life cycle of those data from collection to publication”. Disciplinary coverage in the Data Citation Index (as of 2017) is skewed, favoring the life sciences (48% of records) over the social sciences (20%), physical sciences (23%), arts & humanities (7%), and multidisciplinary research (2%). Note that the Data Citation Index tracks citations for datasets and also related data studies (defined as “a description of studies or experiments held in repositories with the associated data which have been used in the data study”). The availability of data should be taken into account when attempting to make comparisons for data citation rates against other data sets, as in some disciplines, open access data is cited at higher rates (up to 69% higher for cancer research).
Inappropriate use cases: As with other citation-based metrics, data citations should not be interpreted as a direct measure of quality.
Available metric sources: Data Citation Index, Google Scholar (rare)
Transparency: Varies by provider–it is not always possible to see in-text citations to understand who has cited a dataset or what they have said about it.
Timeframe: In theory, data sets from any year can be referenced in scholarly literature.