The possibilities of Trove data#

We all know how to use Trove. You just type your search terms into the box and click the button. You then work your way through the first few pages of results, saving anything that looks useful. If, for example, I’m interested in the development of radio in Australia, I could search for the term ‘radio’ in Trove’s digitised newspapers. It’s easy!


Fig. 1 Screenshot of Trove search for the term ‘radio’ – 3,661,933 results!#

But have you ever wondered about the other three million results? The scale of Trove challenges our ability to understand. What does it mean when our search returns millions of possible matches? What does it tell us? What are we missing?


Fig. 2 Chart generated by QueryPic for the search term ‘radio’#

Here’s another view of the same search. This chart displays the number of matching newspaper articles by their year of publication. Instead of a list of the first twenty matches, we see everything at once. By viewing our search results in this way we can start to explore changes over time – looking at shifts in language, or the impact of new technologies. We can also compare the trajectories of multiple search terms. How does a search for ‘radio’ compare to ‘wireless’ or ‘telegraph’?


Fig. 3 Chart generated by QueryPic for the search terms ‘radio’, ‘wireless’, and ‘telegraph’#

These charts are created by a tool called QueryPic, which retrieves search data from Trove using its Application Programming Interface (API). Like a standard search box, APIs accept requests for information, but they deliver the results back in a structured form that computers can understand and use. Web sites are for humans, web APIs are for machines. This Guide provides many examples of how you can use the Trove API to retrieve and process data.

By accessing Trove’s search data directly, we can start to grapple with the challenges of scale. We can examine those millions of results from a range of perspectives, looking for patterns and anomalies. We can see them differently.

But that’s just the beginning. If the resources we’re interested in are digitised, we can often access their contents as data – such as the text of a book, or a digital image of a photograph. We can use computational methods to analyse and compare these resources, looking in depth at their structure and form.

Search results and digitised content are two types of data available from Trove, but there are more. This Guide describes the different types of data Trove provides, explains how to access them, and gives some examples of what you can do with this data. What new types of questions can we ask?

But there are new dangers and challenges too. If we look again at the charts created by QueryPic, we might wonder what the peaks and troughs actually mean. How are the results influenced by the number of newspapers published or digitised? If we’re going to use Trove as a source of research data, we need to ask critical questions about where that data comes from, what it represents, and what it is missing. That’s why this Guide starts by examining the nature of Trove itself. As researchers we should always seek to understand the context of our sources, whether a single handwritten letter, or a massive digital collection.