13. Downloading data from the Trove web interface#
13.1. Downloading images, PDFs, text, and audio#
Items that have been digitised by the NLA and made available through one of Trove’s digitised item viewers can usually be downloaded in a variety of formats. This includes newspapers, books, journals, images, maps, manuscripts, and oral histories.
The official Trove Help includes a page on Downloading that describes the options available in the various Trove categories for downloading images, PDFs, text, and audio. Different formats have different viewers, but generally speaking you just need to find the download tab in the sidebar, select a format, and click the button.
Items that are arranged in hierarchical structures, such as some images, maps, and manuscripts, might have an option to download a ‘collection’. If so a Download button will appear on the collection page. This isn’t always available, and there can be limits on how many items in a collection you can download at once. To find the ‘collection’ page, try using the breadcrumb links to move up the record hierarchy.
While many of the same download options are available across different Trove categories, they don’t always mean the same thing! For example, the ‘text’ you get from newspapers is not the same as the ‘text’ you get from books. This table summarises what’s available and describes some of these oddities.
Trove category |
Item type |
Download option |
Note |
---|---|---|---|
Newspapers & gazettes |
article |
image |
The ‘image’ option actually delivers an HTML page with embedded images. Long articles will often be sliced up in unfortunate ways to ‘fit’ an A4 page. To get the images themselves you need to extract them from the HTML and try to reassemble them. |
Newspapers & gazettes |
article |
||
Newspapers & gazettes |
article |
text |
The ‘text’ option actually delivers an HTML page that includes the publication details as well as the article text. If you just want the plain OCRd text you’d need to extract it from the HTML and remove the publication details. |
Newspapers & gazettes |
page |
||
Newspapers & gazettes |
issue |
||
Books & libraries |
single page, range of pages, or complete book |
image |
Images (single or multiple) are packaged in a zip file with an additional page of copyright information. |
Books & libraries |
single page, range of pages, or complete book |
||
Books & libraries |
single page, range of pages, or complete book |
text |
Unlike the newspapers, this is plain text with no formatting. |
Books & libraries |
single page, range of pages, or complete book |
image |
Images (single or multiple) are packaged in a zip file with an additional page of copyright information. |
Magazines & newsletters |
single page, range of pages, or complete issue |
||
Magazines & newsletters |
single page, range of pages, or complete issue |
text |
Unlike the newspapers, this is plain text with no formatting. |
Images, maps, & artefacts |
single item, range of items |
image |
Images are packaged in a zip file with an additional page of copyright information. As well as the standard JPEG format, some maps include an option to download high-resolution TIFF files. |
Images, maps, & artefacts |
single item, range of items |
||
Images, maps, & artefacts |
collection |
image |
A maximum of 20 images can be downloaded at one time. |
Images, maps, & artefacts |
collection |
A maximum of 20 images can be included in the PDF. |
|
Diaries, letters & archives |
single page, range of pages |
image |
Images are packaged in a zip file with an additional page of copyright information. |
Diaries, letters & archives |
single page, range of pages |
||
Diaries, letters & archives |
collection |
image |
Images are packaged in a zip file with an additional page of copyright information. Depending on where you are in the collection hierarchy, you might only get the top-level image. |
Diaries, letters & archives |
collection |
||
Music, audio & video |
oral history interview transcript |
text |
|
Music, audio & video |
oral history interview transcript |
||
Music, audio & video |
oral history interview |
audio recording |
MP3 files available at a variety of bitrates (the higher the bitrate, the larger the file), eg: 48kbps, 128kbps, and 256kbps |
Some download options you might expect to find are not actually available. These are listed in the table below.
Trove category |
Item type |
Download format |
Note |
---|---|---|---|
Newspapers & gazettes |
page |
image |
There’s no option to download a page as an image, just a page image embedded in a PDF. |
Magazines & newsletters |
article |
any |
There’s no option to download individual articles as images, PDFs, or text. While you search for individual articles, the viewer only presents pages. This is different to the newspapers where the viewer presents individual articles. |
What about image resolutions?
One confusing, and often frustrating, aspect of image downloads is their resolution (or size). You can use the Trove image viewer to zoom in close to many photographs and manuscripts, enabling you to pick up fine details. But if you download the same image, you could find the resolution is much lower. This means you’re limited in how you can use the downloaded image. The available resolutions vary across categories and formats, and you really don’t know what you’ll get until you download it. Many manuscripts, in particular, seem to have low-resolution downloads, which doesn’t help you much when you’re trying to decipher someone’s handwriting! But never fear, there are a few hacks and work arounds you can try to get higher resolution versions.
13.2. Download metadata using citations and BibTex#
BibTex is a file format used to save structured information about references, and is used by many tools to manage citations and build bibliographies. You can download item metadata in BibTex format using Trove’s ‘Citation’ tab.
In the main search interface, the ‘Citation’ tab includes a BibTex option. You can copy or download the BibTex record.
In the digitised newspaper viewer, the ‘Citation’ tab includes a button to download a BibTex record.
The Trove viewers for digitised books, journals, images, and maps don’t include a BibTex option.
This is a simple way of capturing metadata in a structured format, but the BibTex records don’t always include the full range of metadata available in Trove.
13.3. Downloading lists#
Trove Lists include a button to ‘Download this List’. Once you click the button you can choose your desired output format: CSV, JSON, XML, or as a list of citations.
The metadata provided by the List download option is quite limited. In particular, newspaper articles are missing information about titles, and the dates are not formatted according to the ISO standard. You can retrieve more and better metadata from Lists by using the Trove API.
13.4. Bulk export#
Trove’s new Bulk Export feature makes it easy to save the results of a search. But it has a number of limitations:
Number of results limited to one million
Version information is not included with work records
Text is not included with newspaper articles
For many research uses you’ll be better off using the Trove API or a tool like the Trove Newspaper Harvester.