Modern Crime Analysis: For @#$&'s Sake, Just Give Me the Data

With a full set of RMS data tables linked to my SQL-based querying application, there's no question I can't answer. Without it, there are few questions that I can answer.

Imagine being hired as a head chef for a new restaurant. You show up for your first day of work and find the kitchen full of pots and pans but no ingredients. When you ask the facilities manager where the food is, he tells you that it's all locked in a separate room, but if you tell him what you need, one ingredient at a time, he'll bring them out to you when he gets around to it. Or imagine you're hired to play the piano in a hotel lobby, only when you arrive for your first shift, you find that you're not actually allowed to touch the piano's 88 keys. Instead, you have to push six buttons, each of which presses a predetermined combination of keys. With this, you somehow have to create music and respond to visitors' requests for songs.

These precise scenarios affect crime analysts everywhere, daily, and they have for about as long as I can remember. I have consulted with police agencies for over 20 years, and by far the top issue that I have encountered when evaluating crime analysis units is the simple inability for analysts to access the agency's data at an appropriate level. Everywhere I go, I find analysts fighting with their IT staffs and records management system (RMS) vendors, making do with half-measures, and hand-entering data that already exists in databases into their own, separate, data systems because they can't get at the official data. These analysts often find no support from their agency's executives, who simply accept pathetic excuses from IT directors and RMS representatives for why what the analyst wants cannot be done. The advent of "cloud-based" systems has added a new level of difficulty, as the nature of such systems precludes some of the traditional mechanisms for connecting to data, and many cloud-based vendors haven't bothered to create alternative solutions.

This situation is absurd. It needs to stop.

Let me be clear: For crime analysts to do their jobs, they need direct, timely access to the agency's call-for-service (CAD) and crime (RMS) data in its original relational format. The only asterisk I would put after this statement is that if the CAD or RMS needlessly stores data as a bunch of indecipherable codes, then analysts could use access to replications of those original tables with the codes translated. (But if such views don't exist, just give analysts the raw data. They'll figure it out.) Otherwise, any attempt by the IT staff or RMS vendor to provide special reporting tools, views, or custom reports that only access some of the data, and only in certain formats, will retard rather than enhance analysts' abilities to do their jobs.

ODBC has been around for 30 years. Stop saying you don't support it.

Any of the following solutions are acceptable:

A direct connection to the RMS and CAD databases.
A direct connection to a replication of the RMS and CAD databases, updated at least once a day, with all the substantive data tables.
An export of all substantive data tables to a series of delimited or fixed-width text files, updated at least once a day.

Solutions that are not acceptable including anything that exists wholly within the RMS application. I need to be able to write complex queries across tables, synthesize data from multiple sources, and extract data to a variety of technologies. No RMS system, no matter how advanced, will ever do everything I need from within the system.

The number one question that any agency with a crime analyst needs to ask of its current or potential RMS vendor is: which of these three solutions do you support? If the answer is "none," leave them sputtering about their special customized crime analysis dashboard and walk away.

Frankly, this should be all I have to say. It should be all any analyst has to say. Analysts are presumably hired for their expertise, and nothing is more annoying than being told, "No, you don't really need that" by people who don't have to do this work. But since that approach clearly doesn't work, I'll spend the rest of this article explaining why analysts know what we're talking about.

I need to be able to answer complex questions.

It's great that your records management system's reporting tools let me count how many crimes we had last quarter, or find a list of all incidents involving a particular person or address. That allows the chief and everyone else in my agency to answer simple questions without having to bother me. But as an analyst, I need to answer far more complex questions--questions that involve both filtering and aggregation, and that involve linking multiple tables, sometimes across multiple databases.

For instance, an agency I recently worked with had a shoplifting problem involving teenagers. To help analyze it, I wanted to know what the teens were stealing. Specifically, I wanted a count of each type of item stolen in the last three years for all incidents in which the offense was coded as shoplifting, the person's involvement was coded as "suspect" or "arrestee," and the person's age was less than 19.

Any SQL-based querying tool capable of handling relational data can answer questions across multiple tables, but few records management systems offer such capabilities in their "reporting" screens.

Such a query is a cinch using SQL or any number of applications that write SQL for you, but most records management systems simply would not support a query that crosses multiple tables this way, and even if they did, they would not be able to present the result in aggregated form.

Let me throw another wrench into it: Technically, the query I designed in the screenshot above is going to show a bit of misinformation. If more than one teenager was suspected or arrested in the same incident, the stolen property type will be duplicated for as many teens were involved, regardless of how many items of property were actually stolen. To control for this, I need to first design a query that shows me which incidents of shoplifting involved teens, but (in SQL parlance) GROUP BY the case number so that regardless of how many teens were involved, each case number shows up only once. I then need to link this query into a second query asking what items of property were stolen in those incidents. Have you ever seen a records management system that allows you to ask one question, save the answer, and then use it as a filter in a second question? (Or, in database parlance, build a query on top of a query?) If so, please leave me a comment and let me know what system it is. It still won't obviate the need for the analyst to have direct access to the raw data, but I'll be impressed.

And even when RMSes allow for complex querying, they often do not offer flexible means of reporting the results, including aggregation.

Records systems vary significantly in their querying capabilities, but in my experience, few of them allow the user to:

Query multiple tables at once; for instance, I need a list of all white males currently in their 40s, over 6 feet tall, who we've ever suspected or arrested for residential burglary.

Allow the user to select what fields appear in the results windows; for instance, in the results screen for the search above, I want to see the burglary modus operandi factors that we track, such as the point and means of entry.

Allow the user to aggregate or crosstabulate the results; for instance, I need a crosstabulation by sex and age of everyone we've ever arrested for drunk driving.

Allow the user to create data ("expressions") in the process of querying (or at all); for instance, I need to create a field called "quarter" that extracts the quarter of the year from the date, then count the number of offenses by that field.

This is why the best vendors don't even try. They might offer a few basic internal searches, but for anything complex, they suggest the user use an external SQL-based reporting tool like Crystal Reports. That's fine with me, as long as you don't mind if I use the same connection to access the data with other technologies.

But if you still want to insist that your RMS allows searches of this complexity internally, that's still not enough because:

I need to be able to synthesize data.

Not all the questions I need to ask can be answered by the CAD and RMS data. Very often, I need to be able to ask questions that cross a variety of tables obtained from different places. Let's look at some examples:

I've obtained a list of all property pawned in my jurisdiction. I want to see if any items of stolen property in the RMS match the serial numbers of the pawned property. I also want to know if any of the pawners are known burglary or theft offenders.

The state offers a website where I can download all the individuals with active warrants. To help prioritize the service of those warrants, I want to identify which of those people on that list are on my list of "Top 50 Offenders," as determined by a separate query of the RMS data.

My local prosecutor's office sends me a monthly text file containing adjudications for the month. I want to link this data with the original crime data to run a variety of reports.

Here's one I had to do just recently: To help evaluate an agency's drunk driving prevention efforts, I needed to know the number of drunk driving crashes the agency had experienced each year for the past 10 years. Unfortunately, the "alcohol-involved" field in the crash database was unreliable. To get a rough estimate, I had to determine the number of incidents that started as CAD calls for traffic accidents and resulted in arrests or summonses for drunk driving. This required me to join tables from the CAD system to tables in the RMS system, something rarely possible even when the two systems are provided by the same vendor.

Even more common, I need to be able to synthesize CAD and RMS data with data that I track myself. It's rare that the CAD and RMS provide all the data that an analyst needs, even if the analyst's question is limited to calls for service and crimes. Many analysts, for instance, code their own modus operandi variables in their own databases. To perform the type of "Top Offender" query described above, I need to be able to assign weights to various offense types. I may have lists of addresses for which I need to run regular queries for administrative reports (all crimes at hotels, all crimes at public housing). If your RMS doesn't automatically assign coordinates to addresses, I need to be able to join addresses to an external "coordinate library." Does your RMS let me create and manage additional tables that I can then join to the standard tables? If so, wow. But it still isn't enough.

I need to be able to use data with a variety of technologies.

To answer the types of questions I've already offered, I want to be able to connect to the CAD and RMS data with some kind of SQL-based querying tool (I'm a big fan of Microsoft Access). But even if the RMS offered full SQL capabilities in its internal reporting tools, I'd still need to occasionally get the data out of the system. Every once in a while, someone comes along with a mapping, reporting, or statistical tool that does some unique and amazing stuff with data, and I need to be able to feed my data into it. Some examples:

CrimeStat, a free application that calculates a variety of spatial statistics, many of which I've never seen in any commercial application
Jerry Ratcliffe's Near-Repeat Calculator
Joel Caplan's Risk Terrain Modeling software
My own threshold database and top offender modeling database (links not offered because I'm in the process of updating them)

If you want to use this database to identify and manage intelligence on repeat offenders, you have to be able to get your data into it.

Of course, the most common example will be a simple GIS application for mapping. To identify hot spots, determine spatial correlations, and perform other GIS tasks, I have to be able to get the data out of the RMS and into the GIS. No, your internal crime mapping system isn't sufficient.

Keep in mind that being able to extract the data for use in these technologies isn't as simple as an "export" button that sends some of the most common fields to a flat file. I need to use the external applications with all of the querying complexity described above. I may need to map all juvenile burglaries with televisions stolen. I may need to map all crimes committed by individuals on my "Top Offender" list. Simple exports don't do the job--only access to the full dataset in its original relational format does the job.

And speaking of the need to copy data out of the original database . . .

I need to change data.

I hope this doesn't come as any surprise, but your data sucks (another topic to be explored in more detail another day). Addresses are wrong. Time fields aren't filled in. Crimes are mis-coded. The same people appear multiple times in the master name index. Out of laziness or impatience or insufficient choices in the libraries, officers have coded "Other" or "Miscellaneous" for substantial percentages of your location types and property types. I can fix a lot of these errors with update queries, but I suspect you're not keen on my running those queries on the production database. This is another reason that I need to be able to extract the data. All of it.

I hope I've made my case, but let's deal with some . . .

Common Objections

"Our data structures are proprietary." Oh, get over yourself. Tables, fields, and the relationships between them are boring. There are ways to make them more efficient, sure, but nothing that isn't taught in any Databases 101 course. If your RMS has any value, it's in the application that works with the data, not in the structure of the tables. I've inspected the structures of dozens of records management systems and have never said, "Ooh, they put an index on the 'DOB' field. I shall steal this trade secret and make millions."

"Analysts could change or delete records through the ODBC connection." Sure, if you set it up as a two-way connection. Don't do it that way. And if you're really concerned about it, set up a replication database and have the analysts connect to that.

"Analysts need to be restricted in what data they can access." There are some legitimate concerns here, such as if the RMS is regional and the participating agencies don't want analysts from other agencies to have open access to all their data. But this is easily overcome with user permissions or by setting up a replication database that only replicates what the analyst is allowed to access.

"Open access puts a strain on the performance of the system." This flies in the face of the experience of many large agencies where analysts are running frequent queries with millions of records. But again, a replication database solves this problem.

"There are hundreds of tables in the database. It's too complex for the analysts to figure out." Give them a chance. They're not stupid. (If they are, fix your hiring practices.) Sure, it will take time, but it's time worth spending. Analysts have access to peer support and a variety of free technical support resources to help them.

Let me offer a universal counter to all objections: This is done all the time. Yes, I have couched this issue (analysts not having access to their data) as a quite common and age-old problem. It probably affects at least 50% of full-time crime analysts. But the other 50% are working for agencies in which these objections have been overcome, or were never raised in the first place. There's nothing so unique about your agency or your RMS that you can't figure it out with a little effort.

If anyone has additional objections or new arguments for any of the above, please offer them in the comments, and I'll respond to them individually.

While I'm at it . . .

IT directors, it's your job to support the industry standard software that analysts need to do their jobs, not dictate what software they're allowed to use. I'll offer a separate entry on why Microsoft Access is an ideal solution for a lot of crime analysis tasks, but for now it would just be nice if you'd stop deleting it from your analysts' computers. It's practically free, it has more capabilities than you think, analysts use it successfully even at very large agencies, and, no, Microsoft is not "moving away from it." Your prejudice against it is, and always has been, irrational.

In Summary:

Records management vendors: Have a solution ready for your clients. Don't make them beg and threaten. Public safety is at stake. Whatever solutions you think you offer short of direct access to the full dataset (or a replica thereof), I promise you, it's insufficient. If I didn't provide enough examples here, I can provide more. It's not your fault. No vendor could possibly anticipate all the ways an analyst might need to work with data.

IT staff: Try to remember that your role is to support your agency's needs, not dictate those needs. Stop blocking analysts from the material they need to do their jobs. Public safety is at stake.

Police executives: Support your analysts' efforts to get satisfaction from your RMS vendor and your IT staff. Demand that any prospective RMS vendor tell you how analysts will be able to access the data. Public safety is at stake.

Analysts: Never stop fighting. And make sure you're working on your data skills so that when you do have access to your full dataset, you actually know what to do with it. Public safety is at stake.

Feel free to leave comments with your experiences getting direct access to data from specific systems and vendors.

6 comments:

AnonymousSeptember 15, 2022 at 1:18 PM
Excellent points. I fought some of these battles as a commander. Proprietary software, IIT and RMS managers putting up Road blocks to data.
DPSeptember 15, 2022 at 4:46 PM
I'm curious, Chris, have you yourself ever faced this problem, or is the "I" in your entry a hypothetical generic analyst?
Brett RaileySeptember 15, 2022 at 8:35 PM
Christopher, thank you for this posting and for thoroughly walking through each point. Thanks to working with you, Deb and Jim over the years I actually understood the post!

You may remember that your comments regarding CAD/RMS vendors not allowing direct access to the data is exactly what led me to change vendors after being "held hostage" by the same vendor for years.

Keep spreading the word and I promise to keep doing my part to convince other executives to open their eyes/minds and assist their analyst.

I welcome all comments on the subjects I discuss and do not moderate them for content.

Pages

Thursday, September 15, 2022

For @#$&'s Sake, Just Give Me the Data

6 comments: