The Process — Pseudopeople

While a small amount of pseudopeople data is openly available as part of the Python package, access to the full dataset will require users to be both transparent and accountable to a committee of interested parties including civil society organizations, privacy experts and data subject representatives.

Principles

Our approach to data access, like our approach to the project overall, is based on several principles:

Trust. Building, maintaining and repairing relationships of trust – between data subjects, data reusers, and data creators – is essential to not only the success of this project interpreted narrowly, but also the consequences it might have more broadly for data subjects and their relationship with future data collection efforts.
Transparency. Given this, a necessary but insufficient principle is transparency; that all parties involved are transparent with each other about the process of developing, and using, the data.
Ongoing accountability. Transparency is insufficient because transparency is about information, not power. Real trust requires all parties to be responsive to each others’ concerns – requires, in other words, that power be distributed not only between developers and reusers, but, centrally, in a way that involves data subjects.
Simplicity. Our intention with this process is not to make it deeply onerous; that would defeat the purpose of making data available in the first place, and put burdens not only on data reusers, but on interested parties, all of whom have many other responsibilities in their lives. In so far as it is possible, we will try to make the process (and the responsibilities involved in it) as simple as we can.

Access requests

Requests for access should be opened by a reuser on the project GitHub page, providing a public venue where they can be evaluated and discussed. This request does not require much information, and consists of the data reuser providing:

The project name and purpose;
Who is involved in the project (and who, of those people, will have direct access to the dataset);
What funding the project is under, and what expectations with respect to open access and access to data come with that funding.

It also requires the data reuser to commit to:

Being responsive to further questions from interested parties;
Replace their version of the dataset when a new version is released.

Discussion

Once a request has been submitted, it will be discussed by the committee. This data is, at a very minimum, creepy-looking: although it has been designed to contains no actual private information, it certainly looks as though it does, and the look of data and the perception of its use can have real consequences. These are not only consequences for people in or relating to the dataset directly, but also the consequence of impacting the trust people have in this data, the decisions made by it, or the collection processes behind it.

Because of this obligation, and this pragmatism, it is essential to us that data access be reviewed by voices often left out of direct data conversations – voices representing the people in the data. To that end, an initial application will be followed by a discussion process with and between a committee of interested parties (see above). This discussion, and all other correspondence and work on the project, will occur in the same GitHub thread as the original application. This enables us to both provide people (data subjects) with transparency about who has access to data, and for what purpose, and have a relatively low technical bar to entry for interested parties (data subjects, and data reusers both).

The questions the committee asks, and their concerns, are likely project-specific in some respects and otherwise for the committee to decide: we cannot specify them right now. But when a project is approved of by the committee, the next step is enabling access.

Access

When a project is approved, we log the access approval publicly, linking to the GitHub discussion, on a central page. This provides interested parties with easy access to information about what projects are being undertaken, and for what purpose, and makes it convenient to have further discussion (see “ongoing responsibilities” below). Once this is done, the developers (data creators) will share a unique URL providing access to the full dataset, which will then be given to the applicants.

Our expectation is that reusers commit to limiting their use of this data to the specific project they have requested it for, and the people on the application, rather than sharing it for reuse.

Ongoing responsibilities

Consent and agreement are not one-time things: projects, and concerns about projects, change over time. Correspondingly, access to this dataset comes with not only initial responsibilities around the access request but ongoing responsibilities through the course of the project. Some of these, mentioned above, include commitments to (1) replace a defunct version of the dataset when a new one is released and (2) maintain the confidentiality of the download URL.

A more general obligation is for the reusers to be reasonably responsive to further questions from the committee – ranging from questions about the status of the project, to new queries or concerns that only came to mind after initial approval. If reusers are not responsive and collaborative, they may lose access to their right to download data.