On May 18, Kristin Tynski dropped a link into the Reddit community r/privacy: “I scraped court records to find dirty cops.” Tynski, who owns a marketing firm, had collected the public police records in Palm Beach County, where she lives, and wrote up her findings on data like traffic citations and race. She wondered if other Redditors might want to do the same in their counties. “If cops can watch us, we should watch them,” she wrote.

Exactly one week later, George Floyd was killed in custody of the Minneapolis police, his death captured on video by witnesses. As outrage began building in the streets of that city, Tynski once again took to Reddit. “I think I accidentally started a movement,” she wrote on May 26, describing how dozens of people had already joined her effort, which was now being organized in Slack. This time, there were more than just stirrings of interest. Tynski had no way to know, but the timing of her small data-mining experiment coincided with what some experts say is the biggest protest movement in US history. Thousands of Redditors upvoted her post, and then migrated to a new subreddit, r/PoliceData, coordinating an effort to collect public police records en masse. Their mission: “to enable a more transparent and empowered society by making law enforcement public records open source and easily accessible to the public.”

That kind of centralized, nationwide database doesn’t exist in the US right now. For years, researchers, journalists, and activists have turned to official records, from incident reports to misconduct complaints, as one window into police behavior in the United States. “The problem is that all of this data, although it’s public, is buried inside of these really crappy or antiquated public records portals,” says Tynski. Few states make it easy to mass-export law enforcement data, which can make the process tedious. Some states require a formal public records request to access the documents; sometimes people have had to sue for the data. And once the data has been downloaded, it has to be cleaned, combined, and standardized to create a national data set—the kind that might help researchers find patterns of racial bias, excessive use of force, or repeat complaints of misconduct. Tynski’s group, which calls itself the Police Data Accessibility Project, aims to do just that.

The Police Data Accessibility Project isn’t the first to try to amass public police data for analysis, but previous efforts have mostly fallen to universities and journalists. (The government has also made some effort: The FBI launched a new national use-of-force database in 2019, but participation by law enforcement agencies is voluntary.) The Police Data Accessibility Project, on the other hand, is a grassroots effort. More than 2,000 interested internet users have joined an associated Slack group, and over 6,000 have subscribed to r/DataPolice. (Advance Publications, which owns WIRED’s publisher, Condé Nast, is a Reddit shareholder.) Tynski’s project is also, in some ways, larger in scope. Unlike previous projects bound by geography or types of records, the Police Data Accessibility Project aims to aggregate all public police records nationwide into one easily searchable database. “The parameters are, what are local police forces publishing? We want all of that public data,” says Eddie Brown, a US Army veteran who has taken the role of chief operating officer for the group.

Doing so will be difficult, tedious, and technical work. So far, the members of the Police Data Accessibility Project have mostly spent their time building the custom scrapers needed to export files from data portals, rather than gathering the data itself. With so many volunteers chipping in, there have also been a number of debates about the ethics of the project: Should they include the names of police officers in their database? Should they use sources like Blue Leaks, a trove of stolen police documents released in June? The group has decided no on both counts, citing privacy and the importance of data custody, or having a legal right to the data in the set.