Research Questions:
- What query terms mapped most frequently to which metadata fields?
- Is there a type of query that more generally appears to map to certain fields?
Answering these questions could improve search performance and UX.
GeoBlacklight Repositories and Metadata Evaluation
- GBL metadata lacks unified approach across institutions (Batista et al., 2017).
- GBL UX study found disambiguation problems in both subjects & places (Blake et al., 2017).
- Blake et al. also found users rely most on the description field.
- Few query log studies; some have found organizational, data type, or publication to be most common (Schindler et al., 2019).
Query Log Analysis and Metadata
- 47% match between LC subjects and queries (Carlyle, 1989)
- 80% of all queries are short (Park and Lee, 2014)
Query Types
Type |
Description |
Example |
Datatype |
type of GIS data |
Basemap, contours |
Format |
file type |
Geotiff, shapefile |
Locational |
general type of place |
Campsites, buildings |
Place name |
specific place |
Continental Divide, Colorado Springs |
Organization |
corporate entity |
Colorado DOT, Census Bureau |
Person |
human being |
John Doe |
Publication |
issuance of a particular resource |
Census tracts, Bureau of Land Management roads |
Topical |
subject of interest |
Agriculture, aliens |
Unknown |
? |
2000, trib |
Methods
- Python script counted number of times a query matched to a metadata field
- Second Python script tallied each time a query of a category type matched to a field
- Code repository is on GitHub
Most popular queries (descending order):
- topical
- place name
- locational
#### Results
- Most matches found in *dc.description*
- **topical** queries had high matches in *dc.title*, many less in *dc.subject*
- **place name** queries matched frequently in *dc.publisher* and *dc.creator*
- **locational** queries matched frequently in *dc.title*
#### Analysis
- Subjects not as useful as anticipated
- Synonyms have potential for increasing matches
- Best fields for matching remain *dc.description* and *dc.title*
#### Challenges
- Datasets have unique descriptive requirements
- Large gap exists between metadata professionals and data creators
Implications
- ~20 institutions use GeoBlackLight, and most share metadata via OpenGeoMetadata Repository
- We found many "false positive" results happen because "Colorado" is in so many metadata fields; state institutions should not index "provenance".
- Metadata creators should include rich description and title for improved discovery
- This is easily replicable for other GeoBlacklight repos.
- We would like to see if others have similar results. Would results be different based on different metadata practices?
Fin!
Thanks everyone!
Question time.