About Us

Welcome to the Reading Time Machine Project! Our work focuses on digitizing the historical (published prior to 1997) holdings of the Astrophysics Data System using optical character recognition (OCR) and document layout analysis.

Generalizability in HathiTrust Documents

IJDT — 2023

We expand on our TPDL 2022 submission in the followup IJDT special issue contribution The Digitization of Historical Astrophysical Literature with Highly-Localized Figures and Figure Captions. The repository is digitization_at_high_localization for this work.

Generalizability in HathiTrust Documents

AEOLEAN — 2022

We discuss the issue of "Generalizability" in document layout analysis as part of an invited talk at the November 2022 AEOLIAN Workshop, Making More Sense With Machines - AI/ML Methods for Interrogating and Understanding Our Textual Heritage in the Humanities, Natural Sciences, and Social Sciences with the conference preceedings Generalizability in Document Layout Analysis for Scientific Article Figure & Caption Extraction. The associated repository is htrc_short_conf.

Figure Localization with OCR Features

Theory and Practice on Digitial Libraries — 2022

The repository associated with our first paper Figure and Figure Caption Extraction for Mixed Raster and Vector PDFs - Digitization of Astronomical Literature with OCR Features, published at TPDL 2022 is figure_and_caption_extraction.

Astronomy Image Explorer Updates

2018 — present

Updates to Astronomy Image Explorer were made to those that have astrometric solutions so that one can visualize these solutions in the World Wide Telescope.