A recent fun hack project of mine is Library Voices, which streams the live search feed that powers Toronto Public Library’s search dashboard to a random available speech synthesis voice from the Web Speech API.

When I posted the project on Facebook, a library friend of mine asked about potential privacy issues of library’s the feed and whether I had any thoughts about that. I figured I’d post my response here more publicly:

I’ve got a few thoughts along those lines without knowing the specific details of any critique:

  • I think the most practical danger of an open data feed of anonymized searches is that it provides an easy source of trawling for what people in general are searching for. There are a variety of problems that can arise from that, including attempts to deanonymize, to monitor and capture personal data that might be accidentally entered, or requests from state agencies for more information about who’s behind a specific search/searches. This isn’t just confined to library catalogue searches, but to public feeds of search action in general.

  • in a more general sense, it could be argued that the mere presence of an open data feed such as this one undermines principles of privacy in the use of library resources; I’m not sure how I feel about this critique personally. It strikes me as a bit head-in-the-sand, because the reality is the library is already collecting this data and has been for years, in an outside tool (Google Analytics) that is not under the library’s control.

  • If the critique is that libraries shouldn’t be using GA at all (and some have made this critique), that’s a broader critique. I’ve got various feelings about GA, but practically speaking GA is used by so many libraries because it’s a good, free, low-friction way of responding to the expressed need of funders, administrators and library staff for website metrics.

  • in the specific case of TPL (going on public systems information I could get as an outside user), the fact that TPL’s account system is not under HTTPS/SSL and that the entire website is not requestable under HTTPS/SSL is of significantly greater practical concern for me in terms of risk to individual user privacy.

Maybe speaking rather cynically, but given the asymmetry between the information security capabilities of libraries and the information security (and legal coercion) capabilities of state actors who might be interested in library data, I would personally assume that any state agency (and some non-state orgs) with a desire for library patron data has the ability to access it, and conduct myself as a library user accordingly.

The reality is that most libraries (some specific ones such as Watertown, where Alison Macrina works/worked, may be an exception to this) do not have the capability to protect their patron data against determined actors, and are unable/unwilling to invest the resources to do so vs. other priorities.

The obvious disclaimer applies that I worked at TPL for nearly a decade, and the specific one that while I didn’t write the code behind the search dashboard, I did work as part of the team that deployed it to the web. I wouldn’t have done so if I considered it to have substantial privacy concerns.