ATC Pedestrain Tracking: Yan Lu, Mengzhuo Lu, Dingyu Yao
Abstract—In this project, we analyzed the 3-D range II. RELATED WORKS
sensor dataset of Asian Pacific Trade Center, a shopping
center in Osaka, Japan. By processing the dataset, we In 2012, a Japanese research team set up 3-D range sensor
analyzed the flow rate in the shopping center using Pig, in ATC shopping mall, they recorded the data collected
and did data visualization to compare the flow rate from 9:40-20:20 throughout the year, and collected in total
under different circumstances. We also did clustering by 92 days data [1]. This dataset has been explored in many
Spark and data visualization for each property in the 3- ways. Zanlungo, Brscic and Kanda researched in
D range sensor dataset, and dug the underlying relations characteristics in group pedestrians, from which we can
between them as well as got many interesting founds. We apply our result on classifying pedestrians into group
also trained a Mahout classifier to distinguish if a shoppers and individual shoppers [2]. This study found
pedestrian walk in a group or individually. Based on group shoppers and individual shoppers different in terms of
these results and explored information, we found some their facing angle and angle of motion, which we can use
invisible fact concealed in boring pedestrian flow data and applied to classify shoppers.
and come up with useful suggestions and strategies. Kanda and Mascot proposed an idea of using robots for
Keywords- Big Data, Pedestrian Tracking, Mall, Sale, Flyer flyer sending in the shopping mall, which our project can
help [3]. Kidokoro and Shiomi researched in people’s
I. INTRODUCTION behavior when encounter a robot in a mall area, which is
done by simulation [4]. A similar topic was introduced by
Asian Pacific Trade Center (ATC), located in Osaka Japan,
is the largest international mall complex in Kansai. In 2013, Hagita, who talked about the potential influence of using
a Japanese research team set up a tracking environment in robots to do public tasks in a city environment [5]. In our
the shopping center. The system consists of multiple 3D project we will propose suggestions and strategies to make
range sensors, covering an area of about 900 m2. The top the robot flyer sending process more effective.
view of the shopping mall is illustrated below. III. SYSTEM OVERVIEW
The datasets we used in this study are "ATC pedestrian
tracking dataset" and "Pedestrian tracking with group
annotations" which were obtained by JST/CREST in japan
[1]. The project primary purposed in enabling mobile social
robots to work in public spaces.
The dataset was collected between October 24, 2012 and
November 29, 2013, Wednesday and Sunday, 9:40-20:20.
The dataset consists of 92 days in total. The data of each day
is provided as CSV files, which each row in a CSV file
Figure 1 Top view of the shopping mall
corresponds to a single tracked person at a single instant,
and it contains the following fields:
The data recorded by these 3D range sensors was made time [ms] (unixtime + milliseconds/1000), person id,
public by the team for research purposes. Our project is position x [mm], position y [mm], position z (height) [mm],
based on these datasets, and intends to distinguish target velocity [mm/s], angle of motion [deg], facing angle [deg]
client and come up with store deploy strategy for ATC
shopping mall and propose strategies for robot flyer senders
and security. We analyzed popular time and area in the
shopping mall; clustered pedestrians based on their
coordinates distribution, and classified them based on their
group behavior.
proportion is much larger on weekends then it’s on This path is marked black in the figure 5. This is probably
weekdays, the restaurants in ATC shopping center can set because most people on weekdays are just walking by, and
more single tables on weekdays and more couple or larger they tend choose the shortest route (invisible path).
tables on weekends. Therefore, for robots giving flyers as introduced in last
chapter, they can stay on these paths and handing out flyers
Moreover, promotions of lover and family related products
not directly related to in-mall information (because this
(for example flowers, household appliances and children’s
people will not be very likely to shopping). However, on
toys) could also be done more on weekends then weekdays
Sundays, as shown in the bottom picture of figure 5, we can
to make more profits.
see that people are more randomly distributed compared to
(2) Applications of group classifier Wednesday, so more people will be actually shopping then
just walking by. Then sending flyers about in-mall
Individual and group shopper’s behavior is apparently
information (e.g. sale news) will be more effective. And an
different in many ways. And defining groups will be helpful
ideal place for the robots to stay, in this section, is in the
in taking advantage of these differences. For instance, a
black-circled area in the figure below. This area has high
Japanese research group is trying to use robots to send flyers
people intensity and is not likely to cause congestion
[10]. Group shoppers tend to be easier to accept flyer’s from
robots while individual shoppers tends to avoid them [4].
Furthermore, by sending flyers to a shopper group we can
make the flyer sending process more effective. Therefore,
by automatically knowing which person is in groups, we can
make the robot flyers sending more effective.
(2) The classifier can also be used for criminal detection.
For example, if policeman was inspecting a group crime
case, he can use this classifier to quickly filter out all of the
individual pedestrians and it is easier for his to track the
walking path of suspects. Figure 13 Invisible Path & Sale Area
ii. Distribution Virtualization
(because there are spaces on the sides).
1. Pedestrian distribution analysis
There is also one interesting pattern we found: in the narrow
We know from SYSTEM OVERVIEW, that the datasets we corridor in right-down site of the plot, there tend to be more
used contains x, y coordinates of each pedestrian. Using this people on the upper side then down side on Wednesdays,
information, we did visualization for each day in the dataset while on Sundays the intensity are basically the same. This
and plotted the pedestrian distribution. We found many is because on the upper side there are mainly shops, but on
interesting patterns in these plots. the down side there are mainly shop goods display. This
means shops are more attractive then goods display.
Firstly, it is obvious that the people intensity on Sundays is
Therefore, to balance pedestrians on both sides, we suggest
much higher than on Wednesdays. As shown in figure
the shopping mall to move some of the shops from upper
below (in which red denotes high intensity and blue means
side to down side, or set some new shops on the down side.
2.Clustering on pedestrian intensity using Spark
To analysis popular areas in the mall, we conducted
clustering on each day’s dataset using Spark. We use K-
mean algorithm to do the clustering, and each clustering
went through 10 iterations. We divide all the pedestrians in
the mall into 10 clusters and assigned color to them based
on the intensity. Some examples are shown in the figure 14.
It can be clearly seen that Sunday has larger pedestrian
intensity then Wednesday, and most of the clusters lies in
Figure 12 People intensity on Wednesday (left) and Sunday east corridor.
There is one interesting thing worth noticing. Although the
low intensity). clusters keeps changing, there are some clusters stays in its
Moreover, we found that on Wednesdays, more people tend
to walk in an invisible path, because there are clearly 3
paths in the figure that can be found with large intensity.
place and continues to have low intensity unless special Based on the clustering results, we split the Wednesday
events happens (details will come up in later chapters). We shopping mall into 3 areas: passing by area, shopping area
suggest the shopping mall put effort into these areas, such as and square area.
try to move some popular restaurants or shops there to
balance the pedestrian intensity.
Knowing the locations of cluster centers, we can derive From the figure we can see that the average speed on
many applications. For instance, security services can stay Wednesdays is much higher than it on Sundays. Besides, the
around each center of clustering, so that it distances to all of high-speed area in Wednesdays is larger then that in
people in the cluster will remain controllable. Moreover, Sundays. This could because on Wednesdays there are
this information is also useful for promotions. By putting fewer people, so pedestrians can walk faster and have more
important promotions (including robot flyer senders) on the space to walk. On the other hand, on Sundays the shopping
cluster center of clusters with high intensity will help mall is rather crowded and people cannot walk as fast as
improving the promotion effect. Of course, this strategy will they intended. And the paths spotted earlier for pedestrians
base on the principal of not producing congestion. just passing by has higher speed. So trying to block them in
these paths is not a wise choice for promotions. If we want
3. Pedestrians’ speeds analysis and clustering to send flyers to people on the path (which is tempting
Based on pedestrians’ speeds and their x, y coordinates, we because there are more people there), we should stand on
plotted pedestrians’ speeds comparison between randomly the shopping areas besides the waking path so that we will
selected Wednesdays and Sundays. As shown in the figure not slow the pedestrian flow.
below. 4. Pedestrian Walking Angle Analysis
The last two part provide us with distribution information
from pure location. In this the next 2 section, we will take
velocity, angle of motion and facing angle into
consideration and do advanced data mining.
In this part, we filter out people who are walking by their
velocity. If the velocity is larger than 0.5m/s, we mark them
as walking. And among all walking people, we consider
Figure 15 Pedestrians' speeds plotting their angle of motion and draw the average angle of motion
for people in every grid as shown in figure 17.
However, as we can see from the figure 15, the speed is
linearly related to the color, so it is difficult to distinguish
patterns in the figures. Therefore, we did clustering on
speeds using Spark and plotted the clustering results by
people’s x, y coordinates. The results are shown in the
figure 16 below, where the upper subfigure refers to
Wednesday’s dataset and lower subfigure refers to Sunday's.
because people are not so busy on Sunday, and they can just bar shows the number of person-time shows in the grid for
follow the crowd in relax. This high consistence of motion the whole day.
of angle in each area can not only reduce the possibility of
Comparing with Figure 12, the number of still customer
walking collision (which may cause safety issue), but can
decreases even more significantly on Wednesday comparing
help us find the common sense of pedestrian and further
to Sunday. On Wednesday, there is nearly no person still in
improve our sale strategy. Thus, we can make several
the shopping mall, which verify the conclusion we made
suggestions from this phenomenon:
from Figure 13 that on Wednesday, most people are just
1. People may think the traffic of people becomes massed walking through the shopping mall.
up and may need more security staff, but this is
For the Sunday still customer distribution, we found there
unnecessary. Because on Sunday, people behave more
are some relative popular areas (color in red or orange), and
politely, the possibility of walking collision reduces a
we tried to figure out why these regions tends to be more
popular, and where should the shopping mall add function
2. For the project of robot sending flayer, on weekends, to make better use of every area.
fix the robot near the south wall of east corridor can be
As shown in Figure 19 (the still customer distribution on
an even better strategy comparing with the strategy we
Sunday, 6th January, 2013), there are in total 8 popular
came up with in figure 13. On the one hand, from figure
regions in the area (marked in black circle).
13, there are still some not so hot areas near the south
wall of east corridor that we can deploy the robot We analysis the layout of the shopping mall on the day, and
without influencing to much of pedestrian’s walking come up with a matching about the possible reason of
comfort. On the other hand, the high consistence on popular areas as shown in Figure 19. To verify our match,
weekend means almost all people go on the left in the we take the facing angle into consideration. As shown in
east corridor, and people walk on the south part of Figure 20, we calculate the average facing angle for still
corridor are people who entering the shopping mall, by customers in a grid for the whole day. The result of Figure
deploy a fixed robot there, we can confirm almost all 20 helps us verifing some matches. The pamphlets and TV
people entering the shopping mall from east corridor area is popular is indeed because of the pamphlets and TV,
can receive the flayer which help intrigue their because the still people there are looking to the wall
shopping interest. direction which has pamphlets and TV on it. Similarly, the
reason why information boards area, shop front area,
5. Still Customer Behavior Analysis
information desk area and bench area are popular is indeed
Compare to the walking pedestrian, still customers tend to because of these functions. Because the facing angles are
spend more time in the shopping mall and have larger indeed facing to the information boards, shop front,
change of consuming in the mall. By analysis their behavior, information desk and sit on the right direction of bench.
we can future filter out the popular areas and detect their While, for the shopping event match, although we cannot
interests. verify it by facing angle, the analysis on the next section
will help us to verify its correctness.
In this part, we also filter out the still customer by velocity.
If the velocity of walk is less than 0.15m/s, we mark the 5. Children Distribution Analysis
person as still. After filtering out the still customer, we plot
the still customer distribution, by the method we used in
part 2 of this section, as shown in Figure 12. In this plot, the
Now, let’s take z axis where processed by python to add an extra column identifier
into consideration. We for time slot (Figure 22), this divide each day into forty time
take people under 1.2 slots.
meter tall (in z axis) as
Then, we use apache pig to process the data to get the
children and filter them
pedestrian flow rate of single day with respect to times. The
out to research their
apache pig is a platform for analyzing large data sets. It
distribution similar to
produces sequences of Map-Reduce programs, for which
what we did in Figure 12.
large-scale parallel implementations already exist, from
As shown in Figure 21,
which we get high efficiency in processing data [11]. The
Figure 20 Still Customer Facing the distribution change of
procedures are as follow:
children tends to be
larger than all 1) Load data to apache pig source
pedestrians. In other 2) Group the data by identifier times
word, on Wednesday, 3) Count ids in each group of identifier times
there are nearly no 4) Generate data for identifier times vs. number of ids
children in the Therefore, obtain the time vs. number of persons of the
shopping mall, while whole space in mall. Repeat the pig procedures for a
on weekend, the package of datasets of Wednesdays, and then for Sundays
number of children for comparison.
gains significantly.
This actually quite suit
our common sense.
We will further
analysis the detailed
difference for the
number of children in
the next section.
Figure 21 Children Distribution From Figure 21, apart
from the larger fluctuation between weekdays and
weekends, the children distribution data also shows high
concentration on Weekend. Many children spend lots of
time on bench and shop good display area (circled red in
Figure 21). We suggest the shopping mall to assign more
security guard to protect these children and do children topic Figure 23 Comparison of pedestrian flow rate on Wednesday
promotion in these two area. and Sunday; Data includes: Wednesdays (20121024;
20121031; 20121107; 20121114; 20121121); Sundays
iii. Pedestrian Flow Rate Analysis
(20121028; 20121104; 20121111; 20121118; 20121125)
1. Analysis of Pedestrian tracking data The flow density of the whole space (Figure 23) on
A. Weekdays and Holidays flow rate comparison Wednesday increases during morning and remains at a
lower level for most day time. In comparison, on Sundays,
The raw data obtained from origin mobile social robots the trend follows a parabolic pattern, the number of persons
research contains columns of information including time, id in mall increases more rapidly during the morning, peaks
numbers, x_position, y_position, height, velocity, motion during the afternoon from 14:00 to 16:00. The result shows
angle, and facing angle. Note an id number is assigned to in general more pedestrian were tracked on Sundays than on
each person when they first entered the 3D sensor sensed Wednesdays, as also confirmed in our own experience. In
region and continue tracking until the person exit the area addition, we observed the difference in pedestrian density
[1]. Also, The time was measured in million seconds. In between Monday and Wednesday is more apparent in the
order to simplify the plotting load, process the data files afternoon period, roughly 12:00-18:00. This provides clue
for shops in mall to expect excess customers during the
period of times. For example, suggest the shops schedule
more afternoon shift service crews in compensation to
expected increases in demands.
Based on the analysis we did in section V. We find some
interesting phenomenon.
1. There are more people on Weekends but mainly in the
2. Some area is popular only when event happens (which is
a waste of space when there is no event)
Columbia University E6893 Big Data Analytics Fall 2015 Final Report
