Information Science

TableScraper.py

2008
thumbnail image
Python program to follow links on collection list web pages for Ramsey Library Special Collections, extract Dublin Core metadata from each page, and save the metadata to an XML file for import into Microsoft Access. The program relies heavily on Python’s regular expression library and lxml, an XML module capable of parsing invalid HTML.

Files Created: