Data Mining: Safety in numbers

Some smart fleet owners are mining data on drivers and their operations to identify risks and prevent accidents.

In the early 1990s, a largely unknown insurance company made some insightful and profitable discoveries. Looking for patterns in data it obtained about customers, Progressive Insurance found that the size of a customer’s motorcycle engine, his occupation and his credit score all were reasonable predictors of accident claims.

The insurer used these insights in establishing premiums, and it was innovations like this that helped Progressive grow to become one of the largest insurers today, says Richard Valeminsky, chief technology officer of Valen Technologies, a firm with expertise in predictive analytics for the insurance industry.

Progressive was a pioneer in the commercial application of data mining – the term encompassing various methods for unearthing patterns and valuable information that otherwise aren’t obvious. Data mining gained some notoriety recently when it was reported that the National Security Agency had used the phone records of tens of millions of Americans to root out potential terrorists. This controversy aside, data mining in the private and public sector is a great tool for empowering continual improvement.

A highly active enterprise like a trucking fleet generates oodles of data, but few trucking executives seem to leverage all the information they gather in the normal course of business.

“There is a lack of data mining in the industry,” says Jeff St. Pierre, vice president of risk management and driver services for Panther Expedited Services, a 1,500-truck carrier based in Seville, Ohio. “Most people focus on the end result and try to manage the end result, but do not focus on the indicators. They react to a speeding ticket, an accident, or not being on time to a delivery, yet they never seem to address the behaviors inherent in failures.”

St. Pierre uses data mining routinely to identify behavioral indicators among drivers, such as time management problems. A problem with time management may first show up as a failure to perform a pre-trip inspection; it later may lead to an on-road failure, then a speeding or a “near miss” incident, and eventually a rear-end collision. These incidents are not unrelated, St. Pierre says, but many fleets often treat them as such.

St. Pierre and other fleet safety managers now are using data mining technology – with various applications and levels of complexity – to identify new patterns and risk indicators, predict specific kinds of accidents before they happen, and make educated decisions based on these assessments.

Gathering all the data
The first stage of data mining, experts say, is building descriptive models to identify patterns and correlations among different data sets. Using various statistical methods such as scatter plots and regression analysis, a fleet may find correlations between the frequency of accidents and the different geographic areas and lanes where it operates, for example. Or accidents might correlate to certain types of customers.

One of the challenges is that many fleet owners don’t store all their safety-related data in centralized databases. Schneider National recently built a “safety data mart,” a large Oracle database for all safety-related information. The data mart includes information such as injuries and accidents, driver demographics, loads and wireless location data from the carrier’s mobile communications system.

Schneider also is pilot-testing new vehicle-based equipment to report hard braking incidents, following distances and other driver behavior captured by the engine’s electronic control module (ECM). Over the next year or two, this data will be added to the data mart and correlated with other data to identify new patterns, says Ted Gifford, vice president of engineering and research for the Green Bay, Wis.-based carrier.

But you don’t have to be an industry giant like Schneider to be a data miner. Standard, PC-based data tools can give you big insights.

“We are currently discussing and designing an accident database,” says Brett Vitrano, operations research analyst for Accelerated Freight Group, a 94-truck carrier based in Theodore, Ala. To create the database, Vitrano is using Microsoft Access to input driver demographics, motor vehicle reports (MVRs) and text from accident reports – such as the description and cost – and to index scanned documents.

AFG gives drivers safety training based on speeding and other behaviors that management detects using its GeoLogic mobile communications system, which captures data from the vehicle’s ECM. Vitrano downloads data weekly into an Excel spreadsheet application to review driver performance. Each driver also receives a personalized report on his miles, idle time, miles per gallon, percent over speed and RPM in seconds and percentages, he says.

In the past year, since AFG began providing drivers with performance reports, speeding has gone down and fuel economy has gone up. “This all translates into savings to the bottom line,” Vitrano says. “I can also say, unofficially, that I haven’t heard of any accidents or violations.”

Once the Access database is complete, management will be better able to identify risky drivers and the correlation between speeding and accidents, for example, by tying speeding problems identified in a weekly report to drivers’ prior incidents, Vitrano says. AFG plans to use such correlations to better target safety training for each individual.

“Each driver will have a profile in this database so we can begin to assemble a picture of what the driver who causes the most accidents or violations ‘typically’ resembles,” he says.

One rich source of data that carriers may overlook is their motor carrier safety profiles, which reflects the data used by the Federal Motor Carrier Safety Administration in SafeStat. Some of this data – details on crashes, vehicle and driver inspections and traffic violations – is publicly available on the Web at, but motor carriers can register for a PIN number to obtain much more detailed information on their own operation, including driver names and license numbers. And the data is available in delimited format, allowing for easy manipulation in spreadsheets and other software.

In addition to feeding other data mining efforts, SafeStat data can be a research gold mine of its own. For example, using SafeStat data on his own company and six other carriers, Jeff Davis analyzed the impact of speeding and a carrier’s SafeStat score. Davis, vice president of safety and human resources for Dayton, Ohio-based Jet Express, estimates that at least a third – and possibly a majority – of roadside inspections are triggered by a moving violation. Inspections tend to produce violations, and violations lead to higher ISS-2 numbers, which inspectors at weigh stations use to prioritize trucks for routine inspections. These inspections produce still more violations that drive up SafeStat scores and increase the chances for compliance reviews. “With each roadside inspection, you have to determine the trigger event,” Davis says. “If not, you never correct the problem.”

Grappling with complexity
Schneider National began focusing on safety data mining about three years ago, Gifford says. The company uses several standard statistics packages such as Microsoft Excel and MiniTab and advanced packages such as Enterprise Miner, an SAS product, along with CART from Salford Systems. Using these tools, analysts have built descriptive models based on statistical methods such as clustering and regression and have identified patterns and correlations among accident frequencies and geographic regions, operating centers and traffic lanes.

Regression is used to determine associations between a dependent variable and more dependent or more independent variables. For example, Schneider now has a better understanding of the values of driver experience and tenure (the dependent variables) and how these relate to expected accident cost (the independent variable). “It showed us that the value of retention of experienced drivers is more pronounced than we thought,” Gifford says.

“We learned when in a driver’s career that the risk of crash and injury lines flattens with driving experience,” says Don Osterberg, Schneider National’s vice president of safety and driver training. “Among other things, this enabled us to attribute the true costs of driver turnover, which in turn led to the development of a ‘build the core’ strategy. Additionally, we developed a decision support model to assess the expected costs and benefits of anticipated initiatives to improve driver retention.”

Schneider National also has researched DOT preventable accidents and lost-time injuries extensively. Its data mining efforts have been most successful in determining statistically reliable patterns from analyzing accident frequencies and the type of injuries that occur.

“Only about a quarter of lost-time injuries are related to traffic accidents,” Gifford says. “Most of them are associated with loading and unloading, or slipping and falling.”

Schneider National’s information technology sophistication is considerable, of course. Most fleets lack the resources to build complex models that require expertise in disciplines such as statistics and computational learning theory, Valeminsky says. That’s one reason why Dupre’ Transport turned to an outside firm for safety data mining.

In recent years, the Lafayette, La.-based carrier has implemented several new safety initiatives, training programs and technologies, such as circadian driver fatigue monitoring, satellite communications, a collision warning system and electronic logs.

Last year, Dupre’ Transport began working with FleetRisk Advisors, a firm that provides technology-based risk management services for the transportation industry. FleetRisk Advisors uses pattern recognition and predictive technology developed by Valen Technologies for the insurance industry and adapted for commercial transportation, Valeminsky says.

“When you look at a dataset, most of the information is not predictable,” Valeminsky says. “You take what is predictable and put it in a model to make it not only accurate but as simple as possible.”

Dupre’ Transport and FleetRisk Advisors are currently in the research phase of building a predictive model, says Doug Place, chief financial officer of Dupre’ Transport. They are evaluating about 300 data sets now and are continually adding and deleting data elements to find those that correlate to the actual accidents, he says. Based on early results, driver performance – as measured in engine idling time, fuel mileage, shift start and end time, and customer service failures – is among the many elements that correlate with a driver’s safety skills.

“When FleetRisk first came to us, our objective was to determine risk factors that correlate to accidents,” says Al LaCombe, director of safety for Dupre’ Transport. “We are finding that there are a lot of data elements that are tied into safety.” Monthly reports from FleetRisk Advisors will help Dupre’ managers work with drivers individually on performance and safety.

“We already have a lot of measurements in our company,” LaCombe says. “But this has given us more diversity and ways to look at information so that in the future, we can get the drivers more involved and more proactive.”

Getting ahead of the curve
Data mining identifies relationships and trends hidden deep within data, and fleets can use it to identify driver profiles and even specific drivers that statistically are the most likely to have an accident. This particular type of data mining, called predictive modeling, helps describe mathematically the likelihood of an outcome, such as a crash, given a set of values for the input variables, such as a driver’s age or experience.

At Quality Carriers, the largest chemical bulk tank carrier in the nation, predictive modeling is an integral part of its safety program, says Bob Bonnett, vice president of safety and security of the Tampa, Fla.-based carrier.

Quality Carriers’ claims management system interfaces with a system called the Virtual Fleet Risk Manager (VFRM) developed by Zurich Services’ Risk Engineering, a firm that specializes in risk management. Quality Carriers uses VFRM to identify drivers who statistically are more likely to have specific kinds of accidents, such as rollovers or spills, and then intervene with targeted safety training beforehand to reduce that risk.

VFRM uses a points system to create a risk index for each driver. At Quality Carriers, the company assigns a pre-defined point value for 190 different violations depending on severity, from preventable accidents to spills, injuries and regulatory compliance. For each violation, they also assign a time spread value from one to five years, again depending on severity. The driver point values decreases, day by day, over the time spread. For example, a 3-point speeding violation with a 1-point time spread decreases to 0 points after one year.

“By using the Virtual Fleet Risk Manager, we have been able to identify the at-risk drivers,” Bonnett says.

Using this ranking system, Bonnett says he specifically targets drivers with point totals at or above the 90th percentile. The distribution of safety violations is exponential, meaning that the vast majority of drivers are close to the average of the risk index; drivers in the 90th percentile statistically are the most likely to have the next accident. After identifying these high-risk drivers, Quality Carriers uses VFRM to track the completion of training and intervention programs tailored for each driver.

The company also uses VFRM to identify driver profiles that correlate with the most risk; for example, Quality Carriers’ management has found that drivers with less than two years of experience are 80 percent more likely to have a rollover than drivers with more than two years experience. As a result, the carrier recently created a rollover prevention campaign in which less-experienced drivers who fit the VFRM profile are put through an aggressive training and monitoring program. And that’s probably just the beginning, as a number of problems, including spills and out-of-service violations, also are correlated with the risk factor of experience under two years.

Data-driven risk management ensures a safety program grounded in reality. “The facts are the facts,” Bonnett says. “Guessing hurts you every time.”

Bonnett credits Quality Carriers’ data mining efforts – and the safety campaigns and training they have spawned – for much of the carrier’s recent safety success. Through June, preventable accidents are down 32 percent. Driver terminations, due to safety problems, are down 23 percent. Spills are down 5 percent, and injuries are down 24 percent, Bonnett says.

At Panther Expedited Services, St. Pierre uses VFRM to plot violations such as missed pre-trip reports, roadside inspections, MVRs and log violations. These data points may not yield a significant result individually, but together they can denote a behavioral problem such as aggressive tendencies, time management problems or anger management issues.

Ultimately, by knowing the behavior, you can look at data to identify when a driver might have the next speeding ticket, missed pre-trip inspection, lane change accident or near-miss incident, St. Pierre says.

Tracking and predicting areas of risk and undesirable habits and even intervening with safety awareness programs and training doesn’t maximize the potential of data mining. Carriers also should target data mining at tracking the effectiveness of their response to ensure it actually led to an improvement.

“Identifying a trend is only half as important as using it,” says St. Pierre, who ties a specific intervention strategy to each behavior – such as requiring the driver to watch a training video, or building a relationship with them and training them personally on patience and time management.

“So many senior drivers believe they must never fail a customer, but they are failing themselves,” St. Pierre says. “They think they are supposed to make everyone else a success.”

After drivers complete a behavior-based training program, St. Pierre uses VFRM to track the success of the intervention to determine if the trend changes. “If it didn’t, you didn’t predict the end behavior correctly,” he says. “Or the driver could be incapable of intervention. If it does change, you add positive feedback to reinforce the change.”

A strategic tool
In the final analysis, no computer model can predict incidents or outcomes with certainty. But the technology at least can give executives a better understanding of the risk and costs associated with various activities in their operations.

“Being able to understand how to allocate costs is where we have had more success in data mining,” Gifford says. “In the predictive area, most of our time has been focused on trying to predict what the costs are going to be.”

When looking at a new bid for a dedicated fleet, a significant piece of completing the bid is to determine the expected safety costs, Gifford says. Schneider National uses data mining to predict the expected injury rates based on the characteristics of the freight, such as the pickup and delivery terminals it uses.

“The data-mining-enabled analysis provided insights into the risk associated with specific loads, lanes and freight types, which led to more granular pricing tools based upon expected risk exposure,” Osterberg says.

“Safety is at the forefront in making the investment in this kind of research and analysis to uncover anything we can to lower our safety risk exposure,” Gifford adds. “We are making very good progress.”

Road-tested research
Jet Express hits the highway to gather data

Data mining generally involves analyzing information the company generates in the normal course of business, but Jeff Davis has staged actual on-highway tests of his hypothesis that speeding is the leading safety risk factor.

Seventy percent of drivers for Dayton, Ohio-based Jet Express have clean motor vehicle records (MVRs), says Davis, the carrier’s vice president of safety and human resources. There isn’t enough historical data to unearth statistically reliable findings regarding accidents and speed, Davis says. So three years ago, Davis conducted an actual demonstration.

For a five-day period, a fleet manager from Jet Express rode with a driver over the same 34.9-mile section of Interstate 75 at the same time each day, in both directions. The company tracked the number of lane changes and brake applications for the truck moving at the posted speed limit and at 10 mph over. At 10 mph over, brake applications – which indicate speeding that could lead to a rear-end collision – averaged three per trip. At the posted speed limit, this dropped to zero. At 10 mph over, the number of lane changes – risky due to a truck’s blind spots – doubled from the average of 12 while driving at the speed limit, he says.

Jet Express tracks speed in two ways: First, it downloads speed data from engines’ electronic control modules weekly; second, managers conduct roadside observations in the field and file forms. The road observation program involves the whole management team; Jet Express’ Kevin Burch consistently completes the most reports each quarter. Using drivers’ motor vehicle records, SafeStat scores, speeds and warnings, Jet Express develops a profile for each driver.

“Without spending thousands of dollars, that’s what we’ve come up with as the basic indicators,” Davis says. “I’m a believer, and have data to show it in our fleet, that this whole process of monitoring speed will eventually clean up driving and reduce accident rates.”

Case Study 1
Dupre’ Transport, Lafayette, La.

Over the past year, Dupre’ Transport has worked with risk management firm FleetRisk Advisors to determine the relevant factors associated with accidents and to calculate the “risk signature” and risk profile for each driver. The initiative models relevant driver data such as tenure, fatigue index, marital status, etc., against a database compiled using various sources of information from Dupre’ Transport.

“We are using over 250 data elements and are trying to narrow down to those which correlate the greatest to accidents,” says Doug Place, chief financial officer of Dupre’ Transport. “We are making progress and improving our accuracy with each month’s run of the engine.”

Case Study 2
Panther Expedited Services, Seville, Ohio

While working for a major tank carrier four years ago, Jeff St. Pierre helped Zurich Services develop the Virtual Fleet Risk Manager. St. Pierre – now vice president of risk management and driver services for Panther Expedited Services – uses VFRM’s driver indexing and other features to identify behavioral problems, addresses them through targeted training programs and tracks results.

“Identifying a trend is only half as important as using it,” says Jeff St. Pierre,
vice president of risk management and driver services for Panther Expedited Services. Through its data mining efforts and targeted training programs, in the past year the company saw the “index values” of frequency improve by 32 percent and severity by 17 percent.

Case Study 3
Quality Carriers, Tampa, Fla.

Using a custom-built claims management system, safety managers at Quality Carriers sort information in multiple ways to identify areas of risk based on several dimensions, including frequency, geographic region, terminal and driver experience. And the carrier uses Virtual Fleet Risk Manager to identify driver profiles that correlate with the most risk and responds with targeted safety training and awareness.

An example of information gleaned from Quality Carriers’ claims management system is that 27 percent of incidents occur in or around Texas and New Jersey, says Bob Bonnett, vice president of safety and security. The carrier developed a program specifically to reduce incidents in those two areas. “As we drive those down, we will address other states,” Bonnett says.

Case Study 4
Schneider National, Green Bay, Wis.

Schneider National has built a “safety data mart” – a large Oracle database that includes such information as injuries and accidents, driver demographics and loads – and is pilot-testing new equipment to report hard braking incidents, following distances and other driver behavior. Schneider’s safety data mining initiative helps in driver hiring and training and in financial analysis, says Ted Gifford, vice president of engineering and research.

Schneider is trying to use data as part of its overall financial analysis. “We are able to better attribute and allocate [risks and costs] so we understand what the cost is of doing business in certain areas,” says Don Osterberg, vice president of safety for Schneider National.

Case Study 5
Shaw Industries Group Inc., Dalton, Ga.

Large carpet and flooring manufacturer Shaw Industries Group Inc. found a data mining treasure trove in a recent study by the American Transportation Research Institute that correlated the likelihood of a crash with various types of moving violations and convictions. For example, the study found that a “following too close” violation increases the driver’s likelihood of a future crash by 40 percent. Shaw uses an application from PeopleNet called the Onboard Event Recorder to track hard braking incidents to identify drivers that may show signs of violating proper following distances. Shaw is building a report using PeopleNet, driver moving violations and past accidents to identify drivers in the top 10th percentile for incident frequency. Those drivers will be assigned extra defensive driver training to change their behaviors.

Shaw Industries Group Inc. is building its data capturing gradually, says Greg Whisenant, transportation safety manager. “We want to make sure we have someone who can manage it first,” Whisenant says. “If you are not looking at data and the driver has an accident, but you have not done anything with that data, you could have a liability.”

Data analysis tools
A variety of software tools can be used by both beginners and experts for safety data mining. To look for trends or correlations between different data sets, businesses must first have the right technology to store and search large volumes of safety data. This requires having a centralized database, often referred to as a data warehouse or data mart. This technology is widely available in many varieties from large corporations such as Microsoft, Oracle, Sybase, Informix, and IBM.

One of the most simple and widely used databases for small businesses is Microsoft Access and Microsoft SQL. For example, Accelerated Freight Group, a 94-truck carrier based in Theodore, Ala., is in the process of developing a safety database using Microsoft Access, says Brett Vitrano, operations research analyst.

Once a central database is in place, the next step for data mining is data analysis. Microsoft Excel is the standard data analysis tool in any industry. As a MicroSoft product, it easily imports data from any database, is easy to use, and has many robust statistical analysis tools. A host of companies, such as Palisade, also provide more accurate and robust statistical analysis than come standard in Excel. These packages are offered as Excel add-ins for added flexibility and ease of use.

Some software providers also offer statistical packages as stand-alone applications. One of the most well known is MiniTab, a widely used analysis package at college campuses and in many industries.

For complete data mining solutions – which are well beyond the scope of most fleets – large organizations such as Schneider National use packages such as Enterprise Miner, an SAS product, and CART from Salford Systems, says Ted Gifford, the company’s vice president of engineering and research. These packages have advanced predictive and descriptive modeling tools and algorithms, which include decision trees, neural networks, autoneural networks, memory-based reasoning, linear and logistic regression, clustering, associations, time series and more.