Who Gets Stopped in Chicago and What the Records Leave Out

Abstract

A traffic stop record is built to answer a short list of questions. Who was stopped, where, when, for what, and what happened. This paper takes a 4,999-row sample of historical Chicago police stop records, released through the Stanford Open Policing Project from Chicago Police Department data, and treats it as an object to be examined rather than a ledger to be trusted. The work has two parts that stay separate throughout. One is a synthesis of published policing research. The other is our own descriptive tabulation of that public file, counts and shares only, with nothing modeled and nothing imputed. The decisive feature of the sample is what it withholds. Subject race is recorded for 100 percent of arrest outcomes but 1.8 percent of citation outcomes, and it is blank for 73.3 percent of all rows, so any race-by-outcome rate computed from this file describes who was arrested, not who was stopped ^[1]. We report a racial composition only for the near-complete arrest subset (565 Black, 415 Hispanic, 256 White, 15 Asian or Pacific Islander), label it as an arrest descriptive each time it appears, and decline to read a stop-level disparity out of it ^[1]. The outside literature, national and Chicago-specific, has already settled that disparity question on far stronger data, including a national analysis of roughly 100 million stops and a 2024 study that benchmarked Chicago police stops against the racial mix of drivers actually on the road ^[2]^[3]. Our point is the narrower and more methodological one. Using the city's own records, we show the Knox, Lowe, and Mummolo result in miniature, that administrative data can hide the bias it is asked to reveal, and we mark exactly where honest analysis of a file like this has to stop ^[4]. The paper's argument is that the silence in these records is not an obstacle to the analysis. It is the most informative thing the records contain.

What a Stop Record Is Built to Hold

Start with the thing itself. A single stop record is a small, rigid container with labeled slots, and each slot expects one kind of answer. When did the stop happen, and at what hour. Where, given as a latitude and a longitude. Who was stopped, flattened to an age, a race, and a sex. Who made the stop, kept as a hashed officer identifier and a few service attributes. Was it a vehicle or a pedestrian. What was the cited reason. Then the slot the whole form is built around, the outcome, an arrest or a citation, marked by a pair of yes-or-no flags.

The sample here arrives as a comma-separated file with twenty-one columns. Reading across one row you pass the date and time, the latitude and longitude, the subject's age, race, and sex, a cluster of officer fields, the stop type, the violation text, the two outcome flags, the resolved outcome, and a raw race field that keeps the original source coding ^[1]. Beneath those headers sit 4,999 rows. The dates run 2012 through 2016. Of the rows, 4,934 are vehicle stops and 65 are pedestrian stops, so this is overwhelmingly a record of cars being pulled over rather than people stopped on foot ^[1]. On its face it is a tidy, finished-looking object, the kind of file that tempts you to start computing rates on the first afternoon you open it.

Some of those columns hold their values reliably. The hour of the stop is present on every row, and across the whole sample the stops cluster in the evening. The hourly counts climb through the afternoon and peak between five and eleven at night, where each hour carries somewhere between 259 and 305 stops, then fall to their lowest in the pre-dawn stretch, with the four-o'clock hour holding 82 stops and the five-o'clock hour just 65 ^[1]. Subject sex is present on essentially every row too, splitting 3,522 male against 1,477 female ^[1]. Officer race is recorded on 4,626 of the rows ^[1]. So the file is not uniformly thin. It carries some fields almost completely. That makes the one field it carries selectively, subject race, stand out more sharply, not less, because the gap is plainly not a matter of a department too overwhelmed to fill anything in.

That temptation is the subject of this paper. A stop record is not a neutral observation of an encounter on a Chicago street. An officer produced it, for the department, under whatever reporting rules and incentives applied at the moment of the stop. The record exists because someone decided a stop had occurred and then chose, field by field, what to write into the container. So every column is a decision as much as a fact. The latitude is where the officer logged the stop. The violation is the reason entered into the system, which need not be the reason the encounter started. And the race field, as the rest of this paper shows, is filled in or left blank in a pattern that has nothing to do with the demographics of Chicago and everything to do with how the stop was resolved. To read the file as a ledger is to assume the container was filled honestly, completely, and without selection. None of those assumptions holds.

So the organizing claim is inverted on purpose. What this sample reports matters less than what it leaves blank. And the blanks are not scattered the way a few skipped fields in a hurried shift would scatter. They cluster, and they cluster on the outcome. They run dense in exactly the rows a disparity analysis would most want to read and sparse in exactly the rows where the analysis is least likely to be asked anything hard. A reader who pulls the rows that happen to carry race, computes a racial composition, and calls it the composition of people stopped in Chicago will have produced a number that is precise, reproducible, and wrong about the thing it claims to describe. The number is real even though the claim built on it is false.

This is why the silence is the subject. We will not pretend the file answers the question it appears to answer. We take the structure of its missingness as the central finding, set that finding next to a literature that does have the data to measure stop-level disparity, and then say where the line falls between what these records support and what they cannot. The voice is meant to be careful rather than conclusive. Each count comes tagged with the slice of the file it came from. Each borrowed figure comes attributed. Where the honest answer is that the file does not know, the paper says the file does not know.

One note on provenance, because it bears on how much trust the container has earned. The records originate with the Chicago Police Department and were released under the Illinois Traffic and Pedestrian Stop Statistical Study Act, the state law that requires Illinois agencies to report stop data. The Stanford Computational Policy Lab compiled that material into the Stanford Open Policing Project, a standardized national collection, and the file used here is the Chicago sample drawn from it ^[1]. The chain cuts two ways. It means the data is real, official, and legally mandated, not something assembled or estimated for the occasion. It also means the data carries forward whatever the department chose to record and whatever the reporting regime of those years captured, which, as the missingness will show, was uneven in ways standardization could tidy but not repair. A record built to hold five answers has, in this sample, reliably held only some of them, and it has dropped the one the disparity question most depends on, in a pattern locked to the very outcomes under study. That dropped field, and the pattern of where it goes missing, is what this paper sets out to read. The missingness is the analysis here, not a flaw to scrub away before the analysis can begin.

An Honest Account of This Work

Before any number does any work, the kind of paper this is needs stating plainly, so that nothing downstream gets mistaken for a claim it is not. This is a synthesis of published research paired with a descriptive analysis of one real public dataset. Two distinct activities, kept distinct. The synthesis rests on peer-reviewed and institutional sources, attributed where they appear. The analysis is ours, and it is nothing more than counts and shares pulled from the columns of a file that was already public before we touched it.

The list of things we did not do is long, and worth saying out loud. We ran no experiment. There was no intervention, no control group, no randomization, nothing assigned or withheld. We scraped no new data, since the file arrived as a finished, standardized sample. We interviewed no one, so there are no quotes, no oral histories, no accounts of individual stops anywhere in these pages. We filed no records request, so no FOIA number appears, because we ran none. We fit no model and estimated no parameter. We counted. When a value was missing we recorded it as missing and reported how often, rather than guessing a value to fill the cell, because imputing race onto rows that lack it would manufacture the exact population the file fails to record. And we make no causal claim. The numbers describe who got written down. They say nothing about why anyone was stopped, searched, cited, or arrested, and any sentence that drifted toward a cause would be reaching past what a tabulation can hold.

That restraint is not modesty for its own sake. It follows from the chain of custody behind the file, which is worth tracing with some care, because each link shaped what the data could later say. The first link is the law. The Illinois Traffic and Pedestrian Stop Statistical Study Act required police agencies across the state to record and report a defined set of details for every stop, and it exists because the state wanted a way to monitor exactly the kind of racial disparity this paper is circling. The mandate produced standardized stop records, including a field for the perceived race of the person stopped, precisely so that disparities could be tracked. That history carries an irony worth noticing. The race field whose emptiness is the subject of this paper was written into the reporting requirement on purpose, as the instrument of accountability, and in this sample it is the field most often left blank on the stops least likely to end in arrest.

The second link is the agency. The underlying records are Chicago Police Department stop data, generated by officers on duty and reported to the state under that Act. Whatever the law required in principle, what reached the file is what officers actually entered, under the reporting habits and pressures of the moment. The third link is the Stanford Computational Policy Lab, which gathered records from Chicago and many other jurisdictions into the Open Policing Project, standardizing field names and value codings so stops from different places could sit on common terms ^[1]. Standardization can align column headers and harmonize how race categories are spelled. It cannot fill in a value an officer never recorded. So the missingness we document traces back to the recording practice of the originating agency, preserved faithfully through the cleaning rather than introduced by Stanford. What we hold is a 4,999-row sample from the Chicago portion of that collection, 4,934 vehicle stops and 65 pedestrian stops ^[1]. Every figure reported here comes from those rows and no others.

The sample is a sample, and the word carries weight. The full Stanford Chicago file is far larger, 2,108,098 stops spanning 2012 through 2020, shipped with shapefiles and an open license that would license a far more ambitious extract ^[1]. A fair reader will ask why, with that larger file sitting right there, the work runs on the thin sample instead. Because we chose to, and the choice should be visible rather than buried. The goal here was never to wring a citywide disparity estimate out of Chicago's records. The goal was to show, on a file small enough that anyone can open and check it, how the records behave, where they go quiet, and why the quiet matters for any disparity claim. The sample fits that purpose precisely because its limits are stark and easy to verify by hand. Pointing at the full file is part of the honesty. It tells the reader the door to a larger study stands open and that we did not walk through it.

It helps to be clear about what kind of knowledge a descriptive audit produces, because it is easy to mistake it for a weaker version of the disparity study and it is not. The disparity study asks whether police stop one group at a higher rate than the right benchmark would predict. That is a causal and comparative question, and answering it well takes a denominator, a benchmark, and methods like the outcome test and the veil of darkness built to handle confounding ^[2]^[5]. An audit asks something prior and humbler. Given these records, what can they actually support, and where do they fail. The two are not rivals. An audit cannot establish a disparity, and a disparity study that never audits its data can rest a confident conclusion on records that were quietly organized to mislead, which is the exact failure Knox, Lowe, and Mummolo formalize ^[4]. The descriptive work done here is the kind of groundwork the stronger studies either do themselves or assume someone has done. Putting it on the page, rather than burying it in a footnote, is the point of the paper. The reader sees the data get tested before any claim is hung on it.

There is a reason to be this exact about scope, and it is specific to the subject. Policing numbers travel fast. They get quoted in arguments and harden into claims that outrun their evidence. A descriptive count of who appears in a set of arrest records can be read, by someone in a hurry, as a statement about who police stop, or worse, about who breaks the law, and our tabulation establishes neither. The differential missingness this paper documents makes that misreading likely rather than merely possible, since the rows that carry race are overwhelmingly the rows that ended in arrest. So the guardrails go up front. The composition figures reported later describe the arrest subset of this sample. They are not population rates, not stop rates, not statements about behavior. That framing gets repeated where the numbers appear, not because the reader cannot hold it in mind but because the file's structure pushes hard toward the wrong conclusion, and the right framing has to be reasserted against that pressure.

What survives all the subtractions is still worth doing. A transparent audit of a real public file, reported down to its blanks, set beside a literature that does have the standing to measure disparity, has a clear shape. It says here is what the city's own records look like up close, here is the question they cannot answer alone and exactly why, and here is the body of work that can. That external scaffolding comes first, because the disparity question itself is not open. Only this sample's ability to address it is in doubt, and the size of that doubt is easier to judge once the strength of the outside evidence is on the table.

What the Outside Evidence Already Establishes

Whether American traffic and pedestrian stops carry racial disparity is not, in the research literature, an open question. It has been measured many times, at many scales, with methods chosen so that no single method's weakness could explain the result away. What our thin sample cannot show on its own, the outside work has already shown on firm ground. Laying that work out first is what lets the later sections be candid about the small thing our own numbers add.

Scale is where the national picture is hardest to dismiss. The Open Policing Project, in the analysis by Pierson and colleagues, assembled records on roughly 100 million traffic stops drawn from dozens of state patrols and city departments, then used them to test the stop and search decisions for bias ^[2]. Two of their methods carry the weight, and they attack different decisions. One is the outcome test, which looks at the rate at which searches actually turn up contraband. The logic is that if officers search one group on weaker evidence than another, the searches of that group will succeed less often, because the threshold for suspicion was set lower. Their central search finding is that Black and Hispanic drivers met a lower evidentiary bar for being searched than White drivers did, with their searches yielding contraband at lower rates, which means officers were searching them on thinner grounds ^[2]. The other tool, the veil-of-darkness design, goes after the stop decision rather than the search. The strength of the whole effort is not any one jurisdiction but the consistency of direction across an enormous and varied pool of stops. When a disparity points the same way across that many agencies, states, and local conditions, it gets very hard to pin on the paperwork habits of a single department or the crime profile of a single city.

The method matters as much as the result, because the benchmarking problem the national study had to clear is the same one our sample cannot clear. To say a group is stopped too often, you need to know how often it should be stopped, which means a denominator, the population actually at risk of a stop on the road. That denominator is famously slippery. Census residence figures describe who lives in a place, not who drives through it. Crash data and licensed-driver counts each capture a different and partial slice. There is no clean register of who was on which road at which hour, which is the number a fair stop rate would divide by.

The veil-of-darkness approach, introduced to the racial-profiling literature by Grogger and Ridgeway, gets around the missing denominator with a clever substitution ^[5]. It compares the racial mix of stops made in daylight against the mix of stops made after dark, on the logic that an officer's ability to perceive a driver's race is sharply reduced at night, especially before the driver is pulled over. If the same officers, patrolling the same roads, stop a higher share of Black drivers when they can see who is driving than when they cannot, that difference points to race entering the stop decision, and it requires no knowledge of the true driving population at all, because the daylight stops and the nighttime stops are each drawn from roughly the same pool of road users ^[5]. Grogger and Ridgeway built the test precisely to escape the denominator trap, and it became one of the tools the national analysis leaned on ^[2]^[5].

Naming the method makes a pointed admission about our own file. The veil-of-darkness design needs the time of each stop and, ideally, the local time of dusk, and our sample does carry the hour of every stop ^[1]. The raw material for a darkness comparison is partly present. We did not build that comparison, and we should say why rather than let the omission pass. The test also needs the driver's race, and in this sample race is recorded almost only on the rows that ended in arrest, so a veil-of-darkness analysis here would run on the arrest subset alone and inherit every selection problem this paper is about. An hour field without a race field on the same rows cannot support the design. So we report composition descriptively and refuse to turn it into a rate, not because the idea did not occur to us, but because the file lacks the one column the honest version of the test would require.

For Chicago in particular, the sharpest recent evidence comes from a 2024 study in the Proceedings of the National Academy of Sciences, and it rewards a careful description because it meets the denominator problem with unusually good data ^[3]. The authors needed to know who was actually driving on a given stretch of Chicago road, the very number the veil-of-darkness method exists to work around. Rather than work around it, they estimated it directly, using GPS data on the movement of actual road users to build a picture of the racial composition of drivers on specific roads. They then set that road-user mix against two things. One was the racial mix of police traffic stops on those roads. The other was the racial mix of citations issued there by automated speed cameras. The comparison is the point, and it is elegant. Both the officer and the camera are enforcing traffic law on the same roads against the same flow of drivers, so if their outputs diverge by race, the divergence has to live in the difference between them.

The finding is clean. On roads where Black drivers were roughly 50 percent of the people actually using the road, they were roughly 70 percent of police stops ^[3]. The cameras, by contrast, issued citations that tracked the road-user mix far more closely. A camera cannot see race. It triggers on speed, photographs a plate, and mails a ticket, with no officer in the loop to decide whom to pull over. So the camera's output is something close to a race-blind benchmark for what enforcement looks like when discretion is removed. When the machine's citations line up with who is on the road and the officers' stops do not, the gap between them points to discretion in the human decision to stop, not to differences in who was driving or how fast ^[3]. This is the closest thing in the literature to a clean within-Chicago benchmark, built on the same city whose records our sample is drawn from, and it is the result our file sits beside without being able to reproduce.

The disparity literature runs older and deeper than those two studies, and a pair of foundational works anchors it. Gelman, Fagan, and Kiss analyzed roughly 175,000 stops recorded by the New York City Police Department over a fifteen-month period in the late 1990s, the data assembled in the course of the New York Attorney General's review of the department's stop-and-frisk practice ^[6]. Their question was the one any disparity claim runs into. Black and Hispanic pedestrians were stopped far more often than White pedestrians, but New York's neighborhoods differ by race and by reported crime, so a skeptic could answer that the police simply went where the crime was. The authors met that answer head on. They compared stops not to residential population alone but to the previous year's arrest rates by race and by crime category, treating prior arrests as a rough proxy for the rate of criminal activity an officer might be responding to, and they modeled the stops precinct by precinct so that local conditions could not hide inside a citywide average ^[6]. The disparity survived. Black and Hispanic residents were stopped more often than White residents even after controlling for precinct and for race-specific crime rates, and the gap was largest for the stops that produced no arrest at all ^[6]. The control is the part that counts. The standard objection to any raw stop disparity is that it merely tracks where crime concentrates, so adjusting for precinct-level and crime-specific rates is exactly the test that objection calls for. The study became a reference point for reasoning about stop data without letting crime-rate arguments do unexamined work, and it predates by more than a decade the federal ruling that found New York's stop-and-frisk regime unconstitutional in its racial application.

If the New York analysis supplies the statistical anchor, the Suspect Citizens project supplies the texture and the sheer scale of routine traffic enforcement. Working from roughly 20 million traffic stops gathered across many states, Baumgartner, Epp, and Shoub draw a distinction that organizes their whole book ^[7]. On one side is the safety stop, made to address a genuine hazard, the driver running a light at speed, weaving across lanes, plainly endangering someone. On the other is what they call the investigatory stop, made not to address a hazard but as a pretext, a way to pull a driver over on some minor and easily found violation in order to investigate the person, run the plates, look in the car, ask where they are going. The two kinds of stop look identical on paper, since both can cite a real infraction, but they do different work and they fall on different people. The investigatory stop, they show, is aimed disproportionately at young men of color, who are pulled over more often, questioned and searched more often, and yet found with contraband no more often, which is the tell that the stops were never really about what was found ^[7]. The account reframes the ordinary traffic stop as a practice that can be racialized in its targeting even when each individual stop rests on a real, citable infraction. The discretion lies not in whether the law was broken but in which of the countless small infractions visible on the road an officer chooses to act on, and against whom. That reframing returns later in a specific way. The low-level, discretionary violations that dominate our sample are precisely the kind of infraction the investigatory-stop literature identifies as the mechanism, the broken headlight or the rolled stop sign that hands an officer a lawful reason to begin an encounter that is really about something else.

These works form a sturdy scaffold once they are set side by side. The national analysis establishes that the disparity is broad and that searches of minority drivers rest on weaker evidence ^[2]. The veil-of-darkness method shows how to detect bias without a clean denominator ^[5]. The Chicago road-user study supplies a local benchmark in which machines and officers diverge in a way that implicates discretion ^[3]. The New York analysis shows the disparity holds under crime and precinct controls ^[6]. And the Suspect Citizens work names the everyday mechanism and the people it falls on ^[7]. None of it depends on our file, which is why we can be relaxed about how little our file proves on its own. Given a literature this consistent, a reader might fairly expect a fresh sample of 5,000 Chicago stops to add one more confirming dot to the pile. It does not, and that is the finding. The records are built to withhold the very variable the disparity question needs, and they withhold it selectively. So the question our file can actually answer is not whether Chicago stops are racially disparate, since stronger data already answered that, but whether the city's own administrative records, read honestly, are even capable of showing it. The answer turns on missingness.

The Records Mask the Thing They Are Used to Measure

The finding the whole paper is organized around is a pattern in the blanks rather than a disparity in the counts. Subject race in this sample is recorded for 100 percent of arrest outcomes, all 1,267 of the 1,267 rows that ended in arrest, and for 1.8 percent of citation outcomes, just 68 of the 3,732 rows that ended in a citation. Across the whole file, race is missing for 73.3 percent of the 4,999 rows ^[1]. Read those three numbers together and slowly, because their relationship is the entire point. Race shows up almost only where the stop ended in arrest. It is almost entirely absent where the stop ended in a citation. And citations are the large majority of the file. The variable a disparity analysis most needs is the variable the records drop, dropped in lockstep with the outcome under study.

Share of Stops With Subject Race Recorded by Outcome

Figure 1. Chicago police stop sample, Stanford Open Policing Project (Chicago Police Department records). Race is recorded for 1,267 of 1,267 arrest rows and 68 of 3,732 citation rows, so administrative records mask the very population a disparity test would need.

Put the proportions the other way around to feel the full weight of the skew. Race is recorded on just 26.7 percent of the sample overall, 1,335 of the 4,999 rows ^[1]. And of those 1,335 rows that do carry a race, 1,267 are arrests and 68 are citations, so 94.9 percent of every race-bearing row in the file is an arrest ^[1]. Arrests are 1,267 of the 4,999 stops, about a quarter of the sample by outcome, yet they account for nearly all of the rows where race is known. The citation rows, three out of four stops in the file, contribute almost nothing to the pool of records that carry race. So when an analyst drops the blanks and works with what is left, the working dataset is essentially the arrest file with a thin scatter of citations mixed in, no matter how it is described. The disparity question is about who gets stopped, and the citation stops are most of the stopping, and the citation stops are exactly the records that arrive without the answer.

Consider what that does to the most natural analysis a reader might try. Suppose you ignore the missingness, take every row that happens to carry a race value, and compute the racial composition of stops in Chicago. The result feels legitimate. It uses real recorded values, and nothing is invented. But look at where those values live. Of the 1,335 rows in the sample with race recorded, 1,267 are arrests and only 68 are citations ^[1]. The race-bearing rows are overwhelmingly arrest rows. So the composition you just computed is very nearly the composition of people arrested at a stop, dressed up as the composition of all stops, and it describes the people stopped only if you forget where the numbers came from. Any racial pattern in that number is fused with a second pattern, which is which encounters happened to get a full record written. You cannot pull the two apart from inside this file. The race signal and the documentation signal ride in on the same rows, and the rows that carry both are defined by their outcome.

State the mechanism without ornament. A rate is a numerator over a denominator, and both have to describe the same population for the rate to mean anything. When race is recorded on essentially every arrest and almost no citation, the rows that supply your racial denominator are not a sample of stops at all but very nearly a census of one outcome. A naive race-by-outcome rate built on this file answers a question nobody asked, the racial makeup of the documented-arrest slice, while looking like it answers the question everyone cares about, who gets stopped. The arithmetic is correct at every step, yet the object it describes is the wrong one. That gap, between a precise number and the thing it gets taken to mean, is the failure at the center of this paper.

It is also a textbook case of an argument made formally by Knox, Lowe, and Mummolo, whose work is the theoretical spine under our descriptive observation ^[4]. Their paper makes a point that sounds technical and turns out to undercut a whole genre of analysis. When researchers study racial bias in policing, they almost always start from records of stops, because that is the data that exists. But a record of a stop only comes into being after an officer has already decided to stop someone. Everyone the officer saw and chose not to stop leaves no record at all. So the population sitting inside a stop dataset is a sample of people the police already selected rather than a sample of people on the street, and if that selection is itself shaped by race, then the dataset has the bias baked in before the analyst computes anything ^[4]. In the language of their paper, conditioning on the stop is conditioning on an outcome of the very behavior under study, and that conditioning biases the estimate in a direction that cannot be signed without information the records do not contain.

The encounters that never become records, and the records that come into being with key fields left blank, are not missing at random with respect to race and outcome. They go missing in patterns that distort the exact comparison researchers want to run. Knox, Lowe, and Mummolo work the consequence through formally and reach a conclusion that is uncomfortable and precise. Naive estimates of racial disparity drawn from administrative stop data are statistically biased, and the bias generally runs toward understating the disparity, so administrative records can mask the very bias they are used to measure ^[4]. An analyst who treats the records as a complete, neutral account will tend to understate or misstate the disparity rather than reveal it, and will do so while producing numbers that look perfectly rigorous ^[4].

We want to be careful about what our file does and does not do to that argument. It does not test, confirm, or refute it, because confirming it would require knowing the truth the records hide, which is the whole difficulty. What the sample offers is a firsthand look at the mechanism the argument identifies, on a single public file anyone can open and count. Knox, Lowe, and Mummolo describe records that selectively omit information in a way that biases disparity estimates ^[4]. We are holding such records, and the omission is not subtle. Race vanishes from 98.2 percent of citation outcomes and from none of the arrests ^[1]. The selection is clean enough to belong in their paper as an example. So the file enters the argument as evidence about Chicago's records rather than about Chicago's officers, a concrete case of administrative data behaving exactly as the theory warns, the missingness severe, outcome-dependent, and fatal to the naive estimate.

What makes this kind of selective recording so easy to miss earns a moment of its own. A spreadsheet with 4,999 rows and a subject-race column looks complete. The column is there. It has values in it. Open the file and the first screen of rows may show races filled in, because sorting or sampling can surface the arrest rows that carry them. Nothing in the structure warns you the column is present for one slice and absent for another. The missingness stays invisible until you cross-tabulate race against outcome, which is the one operation that springs the trap. Skip that step, run straight to a race-by-outcome rate, and you get a clean-looking number that means something other than what you think. The danger is not that the data lies. The danger is that it volunteers a confident answer to a question it was never equipped to settle. None of which means the underlying disparity is small or absent. The literature surveyed above is clear it is large and persistent ^[2]^[3]. The Knox, Lowe, and Mummolo result does not say there is no bias. It says records of this kind, taken at face value, will understate or distort it, because the recording process is itself shaped by the behavior under study ^[4].

The missingness inside the file has a companion problem outside it, and the two compound. Our analysis can speak only to records that exist, and within those records only to fields that were filled. But reporting by Injustice Watch, working with Bolts, found that a large share of Chicago traffic stops never enter the official record at all, nearly 200,000 stops in a single recent year, roughly a third of the total, concentrated on the South and West sides ^[11]. Sit with that next to our finding. Missingness here works at two levels at once. Within the records that exist, fields like race are blank in outcome-dependent patterns, which is what our sample shows. And at the level of whether a record exists at all, a substantial fraction of stops leave no trace ^[11]. An analyst working from official stop files is looking at a doubly filtered picture, rebuilding a population from records filtered twice over. The stops that produced no record are gone entirely. Among the stops that did produce a record, the field that matters most for disparity is usually empty unless the stop ended in arrest. Each filter is selective. Neither is random with respect to race or place.

This is why the paper refuses to convert its counts into a disparity claim, and the refusal deserves to be blunt, because the pull runs the other way. We have real Chicago police records. We have a clear numerical skew in the rows that carry race, reported in full further on. It would be easy, and it would look rigorous, to present that skew as a measured disparity in Chicago stops. It would also be wrong, for the precise reason laid out here. The skew lives in the arrest subset because that is the only subset the records describe, and an arrest subset is not a stop population. To dress the arrest composition as a stop-level rate would be to commit, in public, the exact error Knox, Lowe, and Mummolo warn against, on a file that demonstrates the error with unusual clarity ^[4]. The honest move is the opposite. Treat the missingness as the result, report the one slice the file fills in as nothing more than what it is, and lean on the external literature for any claim about disparity itself.

There is a larger lesson under the particular file, and it is why missingness is the heart of this paper rather than a methods footnote. Administrative data carries an aura of objectivity. An official body collected it, in the ordinary course of business, in standardized fields, and it feels like a record of what happened. But a record of what happened is only as complete as the practice that produced it, and recording practices are themselves shaped by discretion, incentive, and rule. When the habit is to log race fully on an arrest and rarely on a citation, the file quietly bakes that habit into every rate anyone computes from it, and the rate misleads anyone who forgets where the numbers came from. The file does not flag its own selection. It looks complete. The 73.3 percent of rows missing race do not arrive marked as a warning. They arrive as blank cells a careless analysis will simply drop, taking the disparity question's whole population out the door with them ^[1]^[4]. Reading the silence, instead of dropping it, is the only way to keep the file honest. There is one part of the file that is not silent, the arrest subset, and it can be read aloud as long as it is labeled for what it is.

Reading the One Slice the File Fills In

Race is near-complete only for arrests, so a racial composition can finally be reported, but only for that slice and only with the label carried in front of the number rather than buried beneath it. In the named-race arrest subset, the 1,251 rows that ended in arrest and carry a determinate race after dropping the 14 coded unknown and the 2 coded other, the composition is 565 Black, 415 Hispanic, 256 White, and 15 Asian or Pacific Islander ^[1]. As shares of that subset, that is 45.2 percent Black, 33.2 percent Hispanic, 20.5 percent White, and 1.2 percent Asian or Pacific Islander ^[1]. Those are the numbers. Everything that matters about them is in how they are read.

Racial Composition of the Arrest Subset of Chicago Stops

Figure 2. Chicago police stop sample, Stanford Open Policing Project. The 1,251 named-race arrest outcomes, the only race-complete slice. This describes who was arrested at a stop, not who was stopped, against roughly even-thirds Black, Hispanic, and White city shares.

So read them as exactly one thing. This is the racial composition of the people arrested at a stop in this sample, and not the racial composition of the people stopped. The distinction is no quibble. It separates a claim the file supports from a claim it does not, and the gap exists for the reason already established. Race is recorded on essentially every arrest and almost no citation, so the only population the file describes by race is the arrest population ^[1]. When the file shows 565 of these arrestees recorded as Black, it is describing who filled the arrest records, and nothing more. The stop population that produced these arrests is invisible here, because the citation rows that would round it out arrive almost entirely without race.

For context, and only as context, Chicago's resident population breaks into rough thirds, about a third Black, about a third Hispanic or Latino, and about a third White, with Black residents at about 29 percent of the city in the 2020 Census figures used here for reference ^[1]. Set the arrest subset against that backdrop and it skews Black, since Black residents are about 29 percent of the city but about 45 percent of this arrest subset ^[1]. We report the comparison because withholding it would be its own evasion, but it has to be read with the same discipline as the composition itself. A resident population is not a driving population, a driving population is not a stop population, and this is an arrest subset rather than any of those. The resident shares are external reference numbers, not something derived from the file, doing nothing here beyond giving the arrest composition a scale ^[1]. The skew is a feature of who got arrested and recorded, against the rough shape of the city. It is not a measured stop-level rate, and it is not a population rate.

There is a further reason the arrest subset cannot stand in for the stop population, and it compounds the recording problem rather than sitting beside it. An arrest records no neutral readout of what happened at a stop. It sits at the end of a chain of decisions. An officer decides to stop a car, then decides how to handle the encounter, then decides whether it ends in a citation, a warning, or an arrest. Each of those steps carries discretion, and the disparity literature finds discretion operating at several of them, not only at the first ^[2]^[7]. So the arrests in this file are the product of at least two layers of selection stacked on each other. Who got stopped, which the records cannot show by race, and then who, among those stopped, got carried all the way to an arrest. Reading the racial composition of arrests as if it described stops collapses both layers into one and silently treats the most heavily filtered outcome in the file as a window onto the least filtered. The arrest subset is more than a small slice of stops. It is a slice selected twice, by processes the disparity literature says are themselves shaped by race ^[2]^[6]^[7]. That is one more reason the honest label on these numbers is narrow.

Say the central caution once more, plainly, because the structure of the file pushes against it and a single statement will not hold the line. These figures describe arrests, not stops. The 4,999-row sample is too thin to carry a citywide claim and too selectively recorded to carry even a stop-level one, since the rows that would tell us who was stopped without being arrested are precisely the rows missing race ^[1]. Anyone who lifts the 45.2 percent figure out of this paragraph and reports it as the Black share of Chicago traffic stops will have stated something the data does not show. As an arrest descriptive the number is real, but it is no disparity, no rate, and no evidence about whom Chicago police choose to stop. The strong external studies already cited carry the disparity claim ^[2]^[3]. This slice describes only the arrests it is drawn from.

With the guardrails set, the subset has texture worth adding, since it sharpens the picture of who occupies these arrest records and connects to the literature without overstating anything. The records are heavily male. Across all 4,999 rows, subjects are 3,522 male and 1,477 female ^[1]. Inside the named-race arrest subset, men dominate every racial group, with 407 Black men against 158 Black women, 341 Hispanic men against 74 Hispanic women, 201 White men against 55 White women, and 10 Asian or Pacific Islander men against 5 women ^[1]. The male skew is the basic shape of the arrest records, not a marginal feature of one group or another. Most people arrested at a stop in this sample are men, and within the largest groups the male-to-female ratio runs better than two to one, and for Hispanic arrestees better than four to one ^[1].

A note on the sex field before it gets overread, since it carries the same hazard as the race field, only quieter. Subject sex is recorded on nearly every row in the sample, arrests and citations alike, which is exactly what makes it safe to report across the whole file in a way race is not ^[1]. So the 3,522-to-1,477 male-to-female split is a genuine whole-sample figure, not an arrest artifact ^[1]. But the per-group male and female counts above, the 407 Black men and the rest, live entirely inside the arrest subset, because they require race, and race only the arrests reliably carry ^[1]. The lesson the file keeps teaching is that two fields recorded at different completenesses cannot be crossed and read as if they described the same population. A whole-sample sex split and a race-by-sex breakdown drawn from arrests only are not the same kind of number, and the paper keeps them apart on purpose.

That pattern rhymes with the investigatory-stop literature, and the rhyme has to be named carefully so it does not become a claim it is not. Baumgartner, Epp, and Shoub describe young men of color as the population disproportionately subjected to aggressive, low-yield investigatory stops ^[7]. The arrest subset here is overwhelmingly male, and its two largest racial groups are Black and Hispanic subjects ^[1]. The shape of who appears in these arrest records is consistent with the population the investigatory-stop research identifies as most heavily policed ^[7]. We put it no harder than that. Consistency with a pattern documented elsewhere is not proof of that pattern in our file, and the missingness already detailed means our subset cannot count as independent evidence of investigatory targeting. What we can honestly say is that the people filling these arrest records are mostly men, disproportionately Black and Hispanic men, and that this is the same demographic the broader literature finds at the sharp end of discretionary stops ^[7]. The connection is offered as resonance, not as a finding.

A Window Onto a Program That Was About to Explode

Time is the one dimension the file records cleanly, and it gets the same handling as race. Report what the sample holds, label it as the sample, reach to outside sources for the citywide picture the sample cannot supply. Recorded stops per year in this file run 986 in 2012, 917 in 2013, 863 in 2014, 755 in 2015, and then 1,478 in 2016 ^[1]. The first four years drift gently down. The fifth nearly doubles the fourth. That late jump is the feature worth dwelling on, once the obvious caveat is in place.

Recorded Chicago Stops per Year in the Sample

Figure 3. Chicago police stop sample, Stanford Open Policing Project. A window onto the early part of the citywide surge the ACLU documented from about 86,000 stops in 2015 to about 490,000 in 2018, not a count of all stops.

These are sample volumes, not citywide totals. The file is a 4,999-row sample of the historical Chicago stop record, so a count of 1,478 stops in 2016 is a count of how many 2016 stops landed in this particular sample, not how many stops Chicago police actually made that year ^[1]. Nothing in the per-year series should be read as a measurement of police activity across the city. The shape of the series within the sample is real, but its absolute level is an artifact of sampling, and the year-to-year moves are moves in the sample's composition by year, which may or may not track the real citywide trend. The flag matters, because a line that climbs steeply invites a reader to narrate a surge, and the surge, if there was one, has to come from sources built to measure the whole city.

There was a surge, and the sources that measured it are external to this file. To read it, a little institutional history helps, all of it on the public record. In the years our sample covers, Chicago's most scrutinized stop practice was pedestrian stop-and-frisk. In 2015 the American Civil Liberties Union of Illinois reported that the Chicago Police Department had made more than 250,000 pedestrian stops that did not lead to an arrest in a single summer, a per-capita rate that outran New York City at the peak of its own stop-and-frisk program, and that those stops fell disproportionately on the Black community ^[8]. That report led to a settlement under which the department agreed to record far more detail about each stop, on a form known as the Investigatory Stop Report, with an outside evaluator reviewing the data. One documented consequence of that agreement was that pedestrian stops dropped steeply while traffic stops climbed, as enforcement shifted from stops on foot toward stops of cars, where the new reporting requirements bit less hard.

The traffic figures capture that shift. Chicago traffic stops rose from about 86,000 in 2015 to about 490,000 in 2018, a nearly sixfold increase in three years ^[8]^[10]. On the traffic ramp specifically, a later ACLU analysis found Chicago stops more than tripled between 2015 and 2017 ^[9]. And the 2018 total came with a racial breakdown the city's own scale makes stark, the large majority of those roughly 490,000 stops falling on minority drivers and more than 300,000 on Black drivers in particular ^[10]. The later ACLU work also looked at what the searches during these stops actually turned up, and found that officers recovered contraband in a smaller share of the vehicles they searched of Black and Latinx drivers than of the vehicles they searched of White drivers, the same outcome-test pattern the national analysis found, now inside Chicago ^[9]^[10]. These are the numbers that describe the program. They belong to the ACLU, not to us, and not to this file ^[8]^[9]^[10].

Now the late jump in the sample can be placed against that backdrop, carefully. The within-sample rise from 755 recorded stops in 2015 to 1,478 in 2016 sits at the leading edge of the documented citywide growth ^[1]^[8]. It is the earliest part of the curve the ACLU traced from about 86,000 stops in 2015 toward about 490,000 in 2018 ^[8]^[10]. We do not claim the sample measures that growth, because a sample cannot, and the magnitudes are not comparable. Our 2016 count is in the low thousands while the citywide figures run into the hundreds of thousands. The narrower claim is defensible. The sample is a window that happens to open at the moment the program began to expand, and the near-doubling visible inside it is consistent with the front of a citywide surge that external sources measured properly ^[1]^[8]^[10]. That window covers only a few years, and it closes before the surge crested.

The early close is itself a limit worth stating directly. The sample years span 2012 through 2016 only ^[1]. The ACLU surge ran past the sample's last year, peaking around 2018 near 490,000 stops ^[10]. So the sample cannot show the full ramp even in miniature, because the file ends before the steepest part of the climb. Picture the per-year line as the first segment of a much longer curve whose later, dramatic rise is documented elsewhere and is not present in this file at all ^[1]^[8]^[9]^[10]. To narrate the whole surge from the sample would be to project a trajectory the file never recorded. We can point at where the sample sits on the curve. We cannot draw the rest of the curve from it. A reader who had only this sample, and who never reached for the external accounting, might conclude that Chicago stops were flat to declining with one odd uptick at the end. The fuller record shows the uptick was the opening of a sustained escalation. The sample reports its own years accurately enough. It is simply too short and too thin to see what came next, which is why the honest reading pairs the file's last data point with the documented trajectory it was only starting to trace.

The temporal framing feeds back into the missingness that anchors everything. A program that expands nearly sixfold in three years generates an enormous volume of new records, and the question of what those records capture grows correspondingly more consequential ^[8]^[10]. If race is recorded on arrests and dropped on citations, as the sample shows, then a surge dominated by non-arrest stops, by citations and stops that end without an arrest, is a surge in records mostly silent on race ^[1]. The bigger the program, the larger the body of stop records that cannot, on their own, answer the disparity question, and the more the external benchmarking studies have to carry the weight ^[2]^[3]. The sample's small window onto the start of the surge is, in that sense, a window onto the start of a documentation problem at scale. The number of stops was about to climb steeply ^[8]^[10]. The share of those stops whose race was being recorded, if our sample is any guide to the practice, was not ^[1]. The per-year counts are sample volumes, the surge figures are the ACLU's, the sample ends in 2016 and cannot show the full expansion, and no causal claim is made about why stops rose, since that would take evidence about policy, deployment, and enforcement priorities the file does not hold. What stays is a placement in time and a connection, that the small sample opens at the leading edge of a documented, fast-growing program, and that the growth makes the file's silences matter more, not less. Whether those silences fall hardest on the stops that look minor is a question the violation field can answer, and the violations are where the discretionary character of the whole program comes into view.

The Small Infractions That Open the Door

What were these stops for? The file answers in its own administrative vocabulary, and the answer paints the encounters as overwhelmingly low-level and officer-initiated. The most common recorded violations in the sample are stop at a stop sign with 521, driving on a suspended license with 506, the two-headlight equipment requirement with 332, driving while using a cell phone with 215, disobeying a red signal with 179, no valid registration with 169, driving while never issued a license with 135, and driving 21 to 25 miles per hour over the limit with 121 ^[1]. Read down that list and a character emerges. For the most part these are minor matters rather than the high-speed, high-danger violations a reader pictures at the words traffic stop. The rolled stop. The burned-out headlight. The lapsed paperwork. The phone in the hand.

Most Common Recorded Stop Violations in the Sample

Figure 4. Chicago police stop sample, Stanford Open Policing Project. The top reasons are low-level and discretionary, matching the investigatory-stop pattern in the Suspect Citizens literature.

Look at what sits high and what sits low, and group the entries by what kind of thing they are. A rolled stop sign and a suspended-license check lead the file, together more than a thousand stops between them ^[1]. Several of the top reasons are equipment and document problems rather than driving behavior at all. The two-headlight requirement at 332, no valid registration at 169, and driving while never issued a license at 135 describe the condition of the car or the status of the paperwork, not how the vehicle was being driven in the moment ^[1]. A burned-out headlight is a state a car can sit in for weeks. An expired registration is a date on a sticker. These are the kinds of violation an officer can find on a parked car, which is to say they require almost nothing of the driver's conduct and a great deal of the officer's choice about which cars to scrutinize. The two driving-conduct entries near the top, the rolled stop sign at 521 and the disobeyed red signal at 179, are real moving violations, but they are also among the most common and most discretionary calls an officer makes, since how complete a stop has to be before it counts as a stop is exactly the kind of judgment that leaves room for choice ^[1].

Meanwhile the one entry that plainly marks hazardous driving, traveling 21 to 25 miles per hour over the posted limit, sits at the bottom of the top group with 121 stops, below the rolled stop sign, the suspended license, the headlight, the cell phone, the red signal, and the registration check ^[1]. The pattern in the sample is that the most common reasons to be stopped are minor and discretionary, while substantial speeding, the conduct most directly tied to crash risk, is recorded less often than equipment and paperwork problems ^[1]. We describe this as the shape of what got written down, since that is exactly what a violation tabulation reports, the reasons entered into the records, not the universe of conduct on the road. The file cannot tell us how often each of these infractions actually occurred on Chicago streets, only how often each was the reason an officer wrote down for a stop, and the distance between those two things is itself a measure of discretion.

This texture connects directly to the investigatory-stop literature, and the connection is the analytical reason the section exists. Baumgartner, Epp, and Shoub draw a line between safety stops, which address a real hazard, and investigatory stops, which use a minor, citable infraction as a lawful pretext to start an encounter aimed at investigating the driver rather than at traffic safety ^[7]. The mechanics of the investigatory stop are exactly the kind of low-level violation that dominates our file. A broken headlight, an expired registration, a stop sign taken a hair too lightly, each hands an officer a defensible reason to pull a car over, and each can be found on a great many vehicles by an officer who is looking for a reason ^[7]. The Suspect Citizens account is that these pretext-eligible infractions are the everyday machinery through which traffic enforcement becomes racialized in its targeting, because the discretion to pick which of the countless minor violations on the road to act on is where bias can enter ^[7]. Our violation tabulation shows a file made largely of precisely those pretext-eligible infractions ^[1]^[7].

The framing gets attributed fully, and the conclusion stays out. The distinction between safety and investigatory stops, and the finding that investigatory stops fall disproportionately on young men of color, belong to Baumgartner, Epp, and Shoub and rest on their roughly 20 million stops, not on ours ^[7]. What our file contributes is descriptive corroboration of one premise of that framework, that the stops in question are dominated by low-level, discretionary violations, which is what makes them pretext-eligible in the first place ^[1]. We are not claiming any particular stop in the sample was a pretext, that officers used these violations as cover, or that the violation pattern proves racialized targeting. Establishing intent or pretext would take evidence the file does not hold, and the missingness already discussed means we could not even pair these violations with race across most of the file ^[1]. The honest statement is that the recorded reasons are the kind of minor infraction the investigatory-stop literature identifies as the mechanism of discretionary policing, and that the file's composition is consistent with a high-discretion enforcement pattern ^[1]^[7].

The suspended-license entry deserves a second look, since it sits second on the list with 506 stops and is not a driving behavior at all ^[1]. A driver's license status is not visible from outside a moving car. An officer generally learns it by running a plate or a name, which means the stop frequently comes first and the suspended-license discovery second, or the stop is initiated against a vehicle whose registered owner is already flagged. Either way, the violation describes the driver's standing with the state rather than anything dangerous happening on the road in that moment. The investigatory-stop literature treats exactly this kind of status-and-paperwork enforcement as characteristic of stops aimed at contact and investigation rather than at safety ^[7]. We do not claim the suspended-license stops in our sample were investigatory, since the file cannot show an officer's reason for initiating any particular stop. The point is narrower and still worth making. A file whose second most common stop reason is a license-status check, rather than a moving hazard, is a file describing a program tilted toward enforcement that depends heavily on officer-initiated checks ^[1]^[7].

Where speeding sits in the list tells its own small story. Driving 21 to 25 miles per hour over the limit, a genuinely hazardous act, appears 121 times, near the bottom of the top reasons and well under the stop-sign and suspended-license counts ^[1]. The high-volume reasons are the ones most available to an officer's discretion rather than the ones most tightly tied to crash risk. That ordering fits a regime in which the stop is often a tool for contact rather than a response to danger, the regime the literature describes ^[7], though the file alone cannot establish intent and we do not read intent into it.

The discretionary character of these stops ties the section back to the heart of the paper, and it is the right place to end. The high-discretion stop is exactly the category where the missing-race problem bites hardest. A stop premised on a serious, unambiguous hazard leaves little room for an officer's choice about whom to stop. A stop premised on one of the countless minor infractions visible on ordinary cars leaves a great deal of room ^[7]. When the stops are dominated by these low-level reasons, the question of which drivers an officer chose to act on becomes central, and that question can only be answered with complete, stop-level race data ^[4]. But complete race data is exactly what the file lacks, since race is recorded on arrests and dropped on citations, and citations are where most of these minor-violation stops would land ^[1]. So the file presents the worst combination for honest analysis, a body of highly discretionary stops whose racial pattern is the thing that matters and the thing the records leave blank ^[1]^[4]^[7]. The violations show a high-discretion program. The missingness ensures the program's most important variable goes largely unrecorded. The two findings are one finding seen from two sides, and together they are why this paper claims so little from the file directly and leans so heavily on the literature that has the data to claim more.

Where the File Goes Quiet

Honesty about a dataset means gathering its limits in one place and stating them without softening. This file has several, and they compound.

Begin with geography, since a map was the original intention. Only 3,753 of the 4,999 rows carry usable coordinates, 75.1 percent, which leaves 1,246 rows, 24.9 percent of the sample, with no latitude and longitude at all ^[1]. A quarter of the sample is geographically invisible. That gap is its own small instance of the paper's theme. If the rows without coordinates are not a random quarter of the file, and there is no reason inside the file to assume they are, then any map drawn from the located three-quarters carries an unknown selection just as the race analysis does.

Even the rows that do carry coordinates resist the analysis first planned. The file ships raw latitude and longitude with no community-area field, so turning the dots into a statement about Chicago's seventy-seven community areas would require an external spatial join, overlaying the points on a community-area boundary file and assigning each stop to the area it falls in, which is a real analytic step we did not perform and did not want to perform silently. The planned community-area analysis was therefore reduced to a coverage statistic, and the paper declines to draw a map. The external literature is clear that Chicago stops concentrate on the South and West sides, both in the 2024 road-user study, which located the disparity on specific roads ^[3], and in the reporting on unlogged stops, which found the unreported stops clustered in those same parts of the city ^[11]. We could almost certainly produce a map from our located points that looked consistent with that finding. We will not, because a map made from three-quarters of a sample, un-joined and unweighted, would carry the visual authority of a settled spatial result while resting on none of the work that would justify it. Showing the coverage number instead of the map is the honest version of the same information. It tells the reader exactly how much of the sample can be placed and refuses to pretend the located portion stands in for the whole.

The race missingness, already the spine of the analysis, belongs in this reckoning restated plainly. Subject race is absent from 73.3 percent of rows and present for 100 percent of arrests against 1.8 percent of citations, so the absence is not random but tied to outcome ^[1]. That single fact is why no stop-level racial rate appears anywhere in this paper, and why the one composition we do report is labeled an arrest descriptive every time it is mentioned.

The rest of the limits follow from the kind of object this is. It is a 4,999-row sample drawn from a much larger file, so every count here is a sample volume, not a citywide total, and the per-year and per-violation tabulations carry the sampling's fingerprints. The arrest composition of 565 Black, 415 Hispanic, 256 White, and 15 Asian or Pacific Islander people describes who was arrested in this slice, not who was stopped and not the city's population ^[1]. The surge figures that frame the year counts, the rise from about 86,000 stops in 2015 to about 490,000 in 2018, are the ACLU's, drawn from the full record, not anything computed from this file ^[8]^[10]. The rough even-thirds resident shares used as a reference point come from Census context, not from these rows ^[1]. And nowhere in this paper is there a causal claim. We report patterns in what was recorded. We do not assert that race caused any stop, any citation, or any arrest, because the file cannot support that and was never built to.

All of it sits inside an institutional backdrop the file gestures at but does not itself prove. In January 2017, after a thirteen-month pattern-or-practice investigation opened in the wake of the release of the Laquan McDonald video, the United States Department of Justice issued its findings on the Chicago Police Department ^[12]. The report concluded that the department had engaged in a pattern or practice of using force, including deadly force, in violation of the Constitution, and it tied that pattern to systemic deficiencies in training, supervision, and accountability. On the question this paper circles, the Justice Department found that the burden of these practices fell disproportionately on Black and Latino residents and neighborhoods, and it documented data systems and review processes too weak to catch or correct the disparities, which is the institutional cousin of the recording problem our sample shows ^[12]. Those findings became part of the legal foundation for the federal consent decree that now governs the department and for the stop reforms that followed. The finding rests on a far broader evidentiary record than one sampled spreadsheet, including interviews, ride-alongs, and the department's own files. We note it as the context these records live inside, not as something our analysis demonstrates. The file is one small artifact produced by the institution the Department of Justice examined. It is evidence of how that institution recorded its stops. It is not, by itself, proof of the pattern the federal investigation established.

What Honest Records Would Have to Show

The silence in this file is not a dead end. Read correctly, it is an argument about transparency, and that is where this paper lands rather than on a verdict about Chicago's drivers.

The thesis has held steady. This sample is too thin and too selectively recorded to prove a stop-level disparity on its own, so the contribution was never to extract one. It was to pair the strong external literature with a transparent audit of how the records mask the bias they are used to measure. Each piece does work the others cannot. The national Open Policing analysis and the Chicago road-user study establish that the disparity is real, large, and tied to officer discretion rather than to who is actually on the road ^[2]^[3]. Knox, Lowe, and Mummolo explain, in formal terms, why a file like this one cannot establish that disparity by itself, because the recording is conditioned on the behavior under study ^[4]. The reporting on unlogged stops shows the gap is structural rather than a quirk of our particular extract, since roughly a third of stops never enter the official record at all ^[11]. Our own audit then exhibits the masking firsthand, in the city's public data, with race recorded on every arrest and almost no citation. Together, the outside evidence supplies the conclusion and our analysis supplies the demonstration of why the conclusion could not be drawn safely from records built like these.

The remedy is implied by the exact shape of what is missing, and naming it precisely is more useful than a general call for better data. The gap that does the most damage is the race field, so the first fix is the obvious one. A record that could support an honest disparity test would carry subject race on every outcome, citations as fully as arrests, instead of letting the column vanish for the most common result. Coordinates are the next gap. They would need to be present on every stop, so that no quarter of the data falls off the map and a spatial analysis does not inherit a hidden selection. The hardest gap to close is the one the records cannot see at all, the stops that produce neither a citation nor an arrest and, under the unlogged-stop reporting, currently leave no trace, the very encounters a fair denominator would have to count ^[11]. None of this is a technical leap. Police already record race on arrests, already capture coordinates on most stops, and already, under the Statistical Study Act and the consent decree, operate under a legal duty to document stops. The fix is completeness, applied evenly to the fields bias would move through, rather than completeness that switches on only when a stop ends in arrest.

Completeness alone would not settle the disparity question, to be clear. It would only make the question answerable from the records. Even a file with race on every row would still need a denominator, the road-user population the 2024 PNAS study had to estimate from GPS data ^[3], and would still face the deeper Knox, Lowe, and Mummolo problem that a stop record begins only after a stop has been chosen ^[4]. Better recording removes the particular obstacle this paper documents. It does not remove the structural one. What it does is shift the work from arguing about whether the data can be read at all to doing the harder and more interesting analysis the better data would permit.

For a reader who wants to go further than our sample allows, the road is open. The full Stanford file holds 2,108,098 Chicago stops spanning 2012 to 2020, and it, along with the reproducible code behind every figure here, is available from the data page for this paper ^[1]. A race-complete extract, or a larger and spatially joined analysis, can be built from that fuller record by anyone willing to do the work. We chose not to, deliberately, so that this paper could do one thing well, which is to show what a thin and selectively recorded sample can and cannot honestly say. The file in front of us answers a narrower question than it appears to, and the most useful thing we can do is say so, mark the slice we can describe, and decline the rest. The disparity is documented elsewhere, on stronger evidence than we hold. Our task was the smaller and more exacting one, to read the silence in the records without filling it in, and to claim less rather than more.

Citations

12 sources cited.

Who Gets Stopped in Chicago and What the Records Leave Out

Abstract

What a Stop Record Is Built to Hold

An Honest Account of This Work

What the Outside Evidence Already Establishes

The Records Mask the Thing They Are Used to Measure

Reading the One Slice the File Fills In

A Window Onto a Program That Was About to Explode

The Small Infractions That Open the Door

Where the File Goes Quiet

What Honest Records Would Have to Show

Citations

Primary Sources

Secondary Sources

More research

Who Gets Stopped in Chicago and What the Records Leave Out

Abstract

What a Stop Record Is Built to Hold

An Honest Account of This Work

What the Outside Evidence Already Establishes

The Records Mask the Thing They Are Used to Measure

Reading the One Slice the File Fills In

A Window Onto a Program That Was About to Explode

The Small Infractions That Open the Door

Where the File Goes Quiet

What Honest Records Would Have to Show

Citations

Primary Sources

Secondary Sources

Related

More research