Mapping Search Queries to GeoLibrary Metadata Fields

Geo4Lib Camp 2021

Phil White & Erik Radio

Feb. 10, 2021

outpw.github.io/slides/geometa.html

Phil White

Earth, Environment & Geospatial Librarian

Philip.White@Colorado.EDU

Erik Radio

Metadata Librarian

Erik.Radio@Colorado.EDU

Outline

Background
Review
Methods
Results
Discussion/Analysis

Background

Colorado GeoLibrary

Proposed way back in 2017
Launched in 2019 (geo.colorado.edu)
Provides access & discovery of Colorado GIS data

How are people using the GeoLibrary?

What sort of search terms are they using?

Do people search for subjects? Places? Other?

Do those terms match subjects & placenames?

Research Questions:

What query terms mapped most frequently to which metadata fields?

Is there a type of query that more generally appears to map to certain fields?

Answering these questions could improve search performance and UX.

Review

GeoBlacklight Repositories and Metadata Evaluation

GBL metadata lacks unified approach across institutions (Batista et al., 2017).

GBL UX study found disambiguation problems in both subjects & places (Blake et al., 2017).

Blake et al. also found users rely most on the description field.

Few query log studies; some have found organizational, data type, or publication to be most common (Schindler et al., 2019).

Query Log Analysis and Metadata

47% match between LC subjects and queries (Carlyle, 1989)

80% of all queries are short (Park and Lee, 2014)

Methods

Data

Search Queries: Google Analytics

Each query assigned a query type

Compiled a copy of the GeoLibrary catalog

Query Types

Type	Description	Example
Datatype	type of GIS data	Basemap, contours
Format	file type	Geotiff, shapefile
Locational	general type of place	Campsites, buildings
Place name	specific place	Continental Divide, Colorado Springs
Organization	corporate entity	Colorado DOT, Census Bureau
Person	human being	John Doe
Publication	issuance of a particular resource	Census tracts, Bureau of Land Management roads
Topical	subject of interest	Agriculture, aliens
Unknown	?	2000, trib

Methods

Python script counted number of times a query matched to a metadata field

Second Python script tallied each time a query of a category type matched to a field

Code repository is on GitHub

Results

Most popular queries (descending order):

topical

place name

locational

#### Results - Most matches found in *dc.description* - **topical** queries had high matches in *dc.title*, many less in *dc.subject* - **place name** queries matched frequently in *dc.publisher* and *dc.creator* - **locational** queries matched frequently in *dc.title*

Discussion

#### Analysis - Subjects not as useful as anticipated - Synonyms have potential for increasing matches - Best fields for matching remain *dc.description* and *dc.title*

#### Challenges - Datasets have unique descriptive requirements - Large gap exists between metadata professionals and data creators

Implications

~20 institutions use GeoBlackLight, and most share metadata via OpenGeoMetadata Repository

We found many "false positive" results happen because "Colorado" is in so many metadata fields; state institutions should not index "provenance".

Metadata creators should include rich description and title for improved discovery

This is easily replicable for other GeoBlacklight repos.

We would like to see if others have similar results. Would results be different based on different metadata practices?

Fin!

Thanks everyone!

Question time.