StreetLight-Data_Methodology-and-Data-Sources
StreetLight-Data_Methodology-and-Data-Sources
Table of Contents
As the mobile data supply landscape has evolved and matured over time, we have determined
that a combination of navigation-GPS data and LBS data is best suited to meet the needs of
transportation planners. Our team phased out the use of cellular tower data because its low
spatial precision and infrequent pinging frequency did not meet our standards for use in corridor
studies, routing analyses, and many other Metrics. LBS data is suitable for these studies and
offers a comparable sample size to cellular tower data.
As of July 2019, StreetLight’s data repositories process analytics for about 79M devices, or
~28% of the adult U.S. and Canadian population, and about 13% of commercial truck trips. As
detailed later in this report, sample size varies regionally, historically and by type of analysis
conducted.
Our data supply grows each month as updated data sets are provided by suppliers. We
currently use one major navigation-GPS data supplier, INRIX, and one LBS data supplier,
Cuebiq. See Table 1, below, for more details on the different locational data sources StreetLight
Data has recently evaluated.
© StreetLight Data AADT 2018 V2 Methodology and Validation June 2019 │ Page 1
Type Pros Cons Notes
© StreetLight Data AADT 2018 V2 Methodology and Validation June 2019 │ Page 2
Our Navigation-GPS and LBS Data Sources
In this section, we will explain why access to two different Big Data sources is uniquely
beneficial for transportation professionals. First, it is important to note that StreetLight InSight is:
• The first and only on-demand platform for planners to process Big Data into customized
transportation analytics to their unique specifications, including the type of Big Data they
would like to use.
• The first and only online platform that automatically provides comprehensive sample size
information for analyses. (See more information on sample size on page 8 of this report.)
We selected navigation-GPS and LBS data because they are complementary resources that
provide unique and valuable travel pattern information for transportation planning. See Figure 1
below for a visualization of these data sources.
Figure 1 – Filtered visualization of a subset of unprocessed navigation-GPS and LBS data near a
mall in Fremont, California.
Cuebiq, our LBS data supplier, provides pieces of software (called SDKs) to developers of
mobile apps to facilitate LBS. These smartphone apps include couponing, dating, weather,
tourism, productivity, locating nearby services (i.e., finding the closest restaurants, banks, or gas
stations), and many more apps, all of which utilize their users’ location in the physical world as
part of their value. The apps collect anonymous user locations when they are operating in the
foreground. In addition, these apps may collect anonymous user locations when operating in the
background. This “background” data collection occurs when the device is moving. LBS software
© StreetLight Data AADT 2018 V2 Methodology and Validation June 2019 │ Page 3
collects data with WiFi proximity, a-GPS and several other technologies. In fact, locations may
be collected when devices are without cell coverage or in airplane mode. Additionally, all the
data that StreetLight uses has better than 20-meter spatial precision. (Similarly, our partner
INRIX collects some LBS data from navigation-oriented smart phone apps).
Navigation-GPS Data
Navigation-GPS data has a smaller sample size than LBS data, but it does differentiate
commercial truck trips from personal vehicle trips. This makes navigation-GPS data ideal for
commercial travel pattern analyses. Navigation-GPS data is also suitable for very fine resolution
personal vehicle travel analyses (e.g.: speed along a very short road segment) because of its
extremely high spatial precision and very frequent ping rate.
INRIX, our navigation-GPS data supplier, provides data that comes from commercial fleet
navigation systems, navigation-GPS devices in personal vehicles, and turn-by-turn navigation
smartphone apps. (These apps produce data that are like the LBS data described above).
Segmented analytics for medium-duty and heavy-duty commercial trucks are available. For
commercial trucks, if the vehicle’s on-board fleet management system is within INRIX’s partner
system, INRIX (and thus StreetLight) will collect a ping every one to three minutes whenever the
vehicle is on, even if the driver is not actively using navigation.
For personal vehicles, if the vehicle is in INRIX’s partner system and has a navigation console,
INRIX (and thus StreetLight) will collect a “ping” every few seconds whenever the vehicle is on,
even if the driver is not actively using the navigation system. This provides a very complete
picture of vehicles’ travel patterns and certainty that the trips are in vehicles.
The ETL process not only pulls the data from one environment securely to another, but also
eliminates corrupted or spurious points, reorganizes data, and indexes it for faster retrieval and
more efficient storage.
© StreetLight Data AADT 2018 V2 Methodology and Validation June 2019 │ Page 4
Step 2 – Data Cleaning and Quality Assurance
After the ETL process, we run several automated, rigorous quality assurance tests to establish
key parameters of the data. To give a few examples, we conduct tests to:
In addition, StreetLight staff visually and manually reviews key statistics about each data set. If
anomalies or flaws are found, the data are reviewed by StreetLight in detail. Any concerns are
escalated to our suppliers for further discussion.
Step 4 – Contextualize
Next, StreetLight integrates other “contextual” data sets to add richness and improve accuracy
of the mobile data. These include road networks and information like speed limits and
directionality, land use data, parcel data, and census data, and more.
For example, a “trip” from a navigation-GPS or LBS device is a series of connected dots. If the
traveler turns a corner but the device is only pinging every ten seconds, then that intersection
might be “missed” when all the device’s pings are connected to form a complete trip. StreetLight
utilizes road network information including speed limits and directionality, to “lock” the trip to the
road network. This “locking” process ensures that the complete route of the vehicle is
represented, even though discrepancies in ping frequency may occur. Figure 2, below,
illustrates this process.
© StreetLight Data AADT 2018 V2 Methodology and Validation June 2019 │ Page 5
Figure 2: “Unlocked” Trips becoming locked trips.
As another example, if a device that creates LBS data regularly pings on a block with residential
land use, and those pings often occur overnight, there is a high probability that the owner of the
device owner lives on that block/block group. This allows us to associate “home-based” trips
and a “likely home location” to that device. In addition, we can append distribution of income
and other demographics for residents of that census block to that device. That device can then
“carry” that distribution everywhere else it goes. (Our demographic data sources for the U.S. are
the Census and American Community Surveys. In Canada, our source is Manifold Data.) This
allows us to normalize the LBS sample to the population, and to add richness to analytics of
travelers such as trip purpose and demographics.
Step 6 – Normalize
Next, the data is normalized along several different parameters to create the StreetLight Index.
As all data suppliers change their sample size regularly (usually increasing it), monthly
normalization occurs.
For LBS devices, we perform a population-level normalization for each month of data. For each
census block, StreetLight measures the number of devices in that sample that appear to live
there, and makes a ratio to the total population that are reported to live there. A device from a
census block that has 1,000 residents and 200 StreetLight devices will be scaled differently
everywhere in comparison to a device from a census block that has 1,000 residents and 500
© StreetLight Data AADT 2018 V2 Methodology and Validation June 2019 │ Page 6
StreetLight devices. Thus, the StreetLight Index for LBS data is normalized to adjust for any
population sampling bias. It is not yet “expanded” to estimate the actual flow of travel.
For navigation-GPS trips, StreetLight uses a set of public loop counters at certain highway
locations to measure the change in trip activity each month. Then it compares this ratio to the
ratio of trips at the location, and normalizes appropriately. In addition, StreetLight systemically
performs adjustments to best estimate total, normalized trips based on external calibration
points. Such calibration points include public, high-quality vehicle count sensors (for example,
those in PEMs systems, or the TMAS repository) as well as reports from surveys and other
externally validated sources. Thus, the StreetLight Index for GPS data is normalized to adjust
for change in our sample size. It is not normalized for population sampling bias (because we
cannot infer home blocks for GPS data). This is one of the reasons we recommend LBS data for
all personal travel analytics. The StreetLight Index for GPS data is not yet “expanded” to
estimate the actual flow of travel.
© StreetLight Data AADT 2018 V2 Methodology and Validation June 2019 │ Page 7
statistical/privacy perspective. The support team then personally discusses the best next steps
with the user.
In general, StreetLight InSight response time varies according to the size and complexity of the
user’s query. Some runs take two seconds. Some take two minutes. Some take several hours.
Users receive email notifications when longer projects are complete, and they can also monitor
progress within StreetLight InSight. Results can be viewed as interactive maps and charts within
the platform, or downloaded as CSV and shapefiles to be used in other tools.
As is the case with any Big Data provider, sample size and penetration rate for a given analysis
depend on the specific parameters used in the study. The reason is that some data are useful
for certain analyses, but are not useful for others. For example, a device may deliver high-
quality, clean location data for one study, but messy, unusable location data – or no data at all –
for another. Efficiently identifying the data that are “useful” for a particular analysis is a critical
component of the data science value that differentiates StreetLight Data. Because penetration
rates vary, sample sizes are automatically provided for almost all StreetLight InSight analyses 1.
This allows users to calculate penetration rates and to better evaluate the representativeness of
the sample. Sample size values also are useful to clients who wish to normalize StreetLight
InSight results through additional statistical analysis.
For LBS analyses, sample size is currently provided as the number of unique devices and/or
number of trips for LBS analyses, depending on the type of analysis. These values should be
thought of as most similar to “person trips.” Including both the number of devices and trips for all
LBS analyses is in our product roadmap. Sample size is provided as number of trips for
navigation-GPS analyses. These should be thought of as “vehicle trips.”
1Sample sizes are not automatically provided for AADT or Traffic Diagnostics Projects. They are
available by request. These analyses use a very large volume of location data, so providing sample sizes
automatically via StreetLight InSight would negatively impact data processing speeds.
© StreetLight Data AADT 2018 V2 Methodology and Validation June 2019 │ Page 8
In general, though not always, the trip sample size for commercial navigation-GPS data will be
higher than the device (truck) sample size. Commercial trucks that are in active use typically
take many trips per week that are often on set routes; thus, they are more likely to have up-to-
date fleet management tools, and that means they are more likely to be included in StreetLight’s
navigation-GPS data set. Trucks that are more rarely used are less likely to be included in the
data set.
In general, though not always, the trip sample size for LBS data will be lower than the device
(person) sample size. The reason is that not all devices in StreetLight’s database capture every
single trip perfectly. To illustrate, consider this hypothetical example:
This device has created useful information for analyzing the home locations of visitors to the
arena. However, since the device didn’t create any location data on the trip to arena, perhaps
because it was off, then the route taken and the travel time cannot be calculated with certainty.
As result, it could not be used in an analysis of road activity on an arterial near the arena.
As another example, consider a device that generates regular pings for each trip taken over 10
days. However, the user deletes the smart phone app that created that data, and it stops
pinging. That device then disappears for the last 20 days of the month. The device’s data can
still be used, but the trip penetration for the month is only 33% of this person’s trips, not 100%.
Typical daily trip penetration rates are between 1 and 5% of all trips on any one specific day.
StreetLight’s pricing and data structure encourage looking at many days of data. The costs are
the same for analyzing an average day across three months and analyzing a single day. Thus,
we encourage clients to evaluate the total sample across the entire study period instead of
focusing on per-day penetration rates.
© StreetLight Data AADT 2018 V2 Methodology and Validation June 2019 │ Page 9
© StreetLight Data 2019. All rights reserved.
© StreetLight Data AADT 2018 V2 Methodology and Validation June 2019 │ Page 10