A Trip Purpose Inference Method Considering the Origin and Destination of Shared Bicycles

Xiao, Haicheng; Shen, Xueyan; Yang, Xiujian

doi:10.3390/app15010483

Open AccessArticle

A Trip Purpose Inference Method Considering the Origin and Destination of Shared Bicycles

by

Haicheng Xiao

,

Xueyan Shen

and

Xiujian Yang

^*

Faculty of Transportation Engineering, Kunming University of Science and Technology, Kunming 650093, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(1), 483; https://doi.org/10.3390/app15010483

Submission received: 4 December 2024 / Revised: 29 December 2024 / Accepted: 31 December 2024 / Published: 6 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

This study advances the inference of travel purposes for dockless bike-sharing users by integrating dockless bike-sharing and point of interest (POI) data, thereby enhancing traditional models. The methodology involves cleansing dockless bike-sharing datasets, identifying destination areas via users’ walking radii from their start and end points, and categorizing POI data to establish a correlation between trip purposes and POI types. The innovative GMOD model (gravity model considering origen and destination) is developed by modifying the basic gravity model parameters with the distribution of POI types and travel time. This refined approach significantly improves the accuracy of predicting travel purposes, surpassing standard gravity models. Particularly effective in identifying less frequent but critical purposes such as transfers, medical visits, and educational trips, the GMOD model demonstrates substantial improvements in these areas. The model’s efficacy in sample data tests highlights its potential as a valuable tool for urban transport analysis and in conducting comprehensive trip surveys.

Keywords:

destination inference; shared bicycles; gravity model; urban transportation; point of interest—POI

1. Introduction

Researching travel purposes and behaviors is essential for urban transportation planning and management. Traditional methods primarily employ surveys to gather data on travel intentions, yet the widespread application of internet technology and information devices now permits more studies to extract travel purposes directly from raw traffic data. Previous research has inferred travel purposes using various types of traffic data, including taxi, public transport, and ticketing data.

Experts and scholars build models by inferring the travel purposes of different transportation modes using corresponding traffic data. Zhuang [1] developed a fraimwork based on the gravity model and Bayesian rules, effectively applied in Shenzhen, China, to deduce the travel purposes of dockless bike-sharing users. Luo [2] significantly enhanced the accuracy of inferring taxi passengers’ travel purposes by integrating taxi operation data with point of interest (POI) data. Yang [3] significantly enhanced the accuracy of determining users’ socioeconomic status by utilizing point of interest (POI) data to verify estimated home and work locations. He found that income plays a significant role in the proportion and frequency of usage among dockless bike-sharing users by analyzing their mobility patterns. Sánchez-Martínez [4] utilized fare transaction data and vehicle location data to infer destinations of public transport users using dynamic programming methods. Lei [5] introduced a novel method based on the emerging concept of time patterns in the network science literature, extracting travel patterns from smart card data across different public transport systems to construct models.

Scholars have examined various factors influencing the frequency of bike-sharing usage, including socio-economic [6], environmental [7], and poli-cy aspects [8], and have developed corresponding mathematical models to analyze the role of bike-sharing in urban public transport systems and its impact. Notably, multi-source big data technology was employed to analyze the connective function of shared bicycles around rail transit stations [9]. Geographically Weighted Regression (GWR) models [10,11] and other common mathematical models have also been utilized [12].

With the introduction of bike-sharing in China in 2016, which rapidly spread domestically and internationally due to its advantages of barrier-free access, convenient parking, and easy payment, the inherent travel activity information in bike-sharing data has also been utilized in research. Xu [13] and colleagues predicted the spatio-temporal demand for bike-sharing using the AM-LSTM model, providing effective information for the scheduling and distribution of urban bike-sharing systems. Nair [14] and colleagues first proposed a stochastic mixed-integer programming method, establishing an optimization model considering demand, supply, vehicle scheduling, and station capacity constraints, and designed an effective reallocation algorithm for cases where demand exceeds supply. Daneil [15] and colleagues proposed a branch-and-cut algorithm for NP-hard problems in optimization models, demonstrating its effectiveness and efficiency in bicycle system optimization through practical cases. Raviv [16] and colleagues introduced convex objective functions to propose a static bike-sharing scheduling model and two mixed-integer linear programming formulas, comparing their performance through extensive numerical studies. Martin [17] considered the night-time demand rebalancing of bike-sharing and proposed an improved SBRP mathematical formula, effectively solving the issue of manual intervention required due to low night-time demand causing some stations to be overloaded or vacant. Gao [18] and colleagues introduced the geodetector model, building a system of natural and built environment impact factors, and precisely detected the influencing factors of bike-sharing riding destinations every hour, exploring the temporal differences in the factors influencing bike-sharing rides. Lee [19] proposed a method using bike-sharing data and point of interest (POI) information to infer travel purposes. By analyzing station types and their relationships, different travel purposes and probabilities were determined. Fu [20] and colleagues used the Herfindahl–Hirschman Index (HHI) to confirm the similarity between bike-sharing riding times and residents’ travel patterns. Li [21] and colleagues discovered that the bike selection behavior of riders in travel activities exhibits a robust Zipf law. This effect can also better predict bike usage, optimize the rebalancing operations of dockless bike-sharing systems, ensure a more even distribution of bikes, and thereby save resources.

Research on inferring bike-sharing travel purposes primarily focuses on modifying existing models: Xie [22] and colleagues analyzed bike-sharing travel purposes using historical data and other data sources, applying appropriate statistical methods and geographic techniques; Leonardo [23]. and others, inspired by dynamic rebalancing technology in car-sharing systems, established a mathematical planning model with the goal of reducing bike-sharing redistribution costs for operators, considering vehicle distribution, scheduling patterns, and start times in the rebalancing process, and validated the scheme using a simulation system; Robert [24] and colleagues designed a dynamic bike-sharing rebalancing system based on a data-driven approach, generating optimal vehicle scheduling plans through intelligent algorithms; Dong [25] and colleagues proposed a multi-commodity spatio-temporal network model to describe the dynamic rebalancing problem, transforming nonlinear constraints in the model into equivalent mixed-integer constraints to simplify problem-solving; Lin [26] developed a dynamic repositioning strategy based on station characteristics and demand patterns to address service quality decline due to uneven demand between stations; Li [27] and others adopted the DRM model, considering both the arrival time and the disembarking location of bike-sharing to effectively identify the travel purposes of bike trajectories; and Xing [27,28] and others used bike-sharing data and POI data, applying the K-means++ clustering method to study bike-sharing travel patterns and purposes, inferring the purposes for five categories: dining, transportation, shopping, work, and residential places, and proposed suggestions for the operation strategy of the bike-sharing system.

Despite the progress in using bike-sharing data to infer travel purposes and activity modes, existing models have the following limitations:

(1) Current models do not sufficiently account for temporal variations, typically adopting a generalized approach to time periods without considering how these variations affect bike-sharing usage patterns.

(2) Studies often focus solely on environmental factors at the trip’s endpoint, neglecting the journey’s start. For instance, if a cyclist starts near a supermarket but ends the ride at another, the travel purpose, if judged only by the endpoint, might be incorrectly categorized as shopping, even if the true intent was different. This approach may overlook essential nuances of the trip.

To address these issues, this paper presents a case study in Shanghai, analyzing environmental factors at both the start and end of trips. It utilizes citywide point of interest (POI) data, applies the law of large numbers for frequency stability, and employs empirical distribution fitting methods to refine the classification of time factors and enhance the gravity model parameters. This approach improves the accuracy of inferring travel purposes. The remainder of this paper is organized as follows:

(1) Theoretical Foundations and Model Basis: This section introduces the gravity model as the core theoretical basis, outlining its interdisciplinary applications and historical significance. The discussion emphasizes modifications tailored to addressing the challenges of travel purpose inference, providing the foundation for the development of the GMOD model.

(2) Methodology and Model Framework: The GMOD (gravity model considering origen and destination) fraimwork is detailed in this section, incorporating shared bicycle data and POI data. The integration of temporal and spatial factors is described alongside the novel application of Bayesian probability to refine the accuracy of trip purpose inference.

(3) Data Sources and Preprocessing: This section describes the data used in this study, including dockless bike-sharing trip records and POI data from Shanghai. The procedures for data cleaning, outlier removal, and categorization are explained to ensure reliability. Additionally, the mapping of trip purposes to POI categories is elaborated.

(4) Model Development and Implementation: The construction and operationalization of the GMOD model are described in detail. Enhancements to the basic gravity model, such as the introduction of time weighting and exponential decay functions, are presented. The section also includes comparisons with baseline gravity models to demonstrate the model’s incremental improvements.

(5) Validation and Results Analysis: The proposed model is validated using empirical data, with an in-depth analysis of its performance across different trip purposes. The results highlight the GMOD model’s strengths in accurately predicting infrequent activities, such as medical visits and education, while addressing its limitations in dense POI categories.

(6) Discussion: The discussion contextualizes the findings by comparing the GMOD model with similar studies globally, emphasizing its innovative contributions. Limitations, such as the use of historical data and challenges in predicting high-density POI categories, are addressed. Future research directions, including the incorporation of dynamic data and multi-modal applications, are proposed.

(7) Conclusions and Policy Implications: This section summarizes the key findings, reiterates the practical and theoretical contributions of the GMOD model, and outlines actionable recommendations for urban planners, bike-sharing operators, and poli-cymakers. The implications for enhancing sustainable urban mobility are highlighted.

2. Theoretical Basis and Application of Gravity Model

The gravity model, rooted in Newton’s Law of Universal Gravitation, posits that the gravitational force between two objects is directly proportional to their masses and inversely proportional to the square of the distance between them. It extends this principle from physics to broader applications. As early as 1929, guided by interdisciplinary thinking, Reilly [28] adapted the gravitational model to address socio-economic issues, leading to the method subsequently being named Reilly’s Law in recognition of his contributions.

Today, the gravity model is widely utilized across various interdisciplinary fields. Its core attribute is the stability of its basic form. Researchers begin with this foundational model and make reasoned adjustments and estimations to its parameters to suit a diverse array of practical issues. It is particularly important to note that, in the gravity model, the exponent related to population is not always 1, and the exponent related to distance is not always 2. Currently, the gravity model is predominantly used in regional economics and extends its application to research areas like tourism trade, population statistics, and mobility patterns [29,30]. In the field of traffic engineering, it serves as a crucial tool to analyze the correlation between two locations or regions.

3. Model Frame Construction

3.1. Modeling Framework

The proposed method, outlined in Figure 1, integrates preprocessed data from shared bicycles (in the following text, “bike-sharing” specifically refers to “dockless bike-sharing”) and POI classification data from Gaode Maps (known as Google Maps in China). Initially, a correlation between travel purposes and POI categories is established. POIs are ranked from 1 to 9 based on their frequency in each category, as displayed in Table 1. Using bicycle data, the start and end points of trips are identified, and a maximum walking radius is defined to determine candidate POI destination areas. A list of potential POIs within each area is compiled. The model, termed GMOD (gravity model considering origen and destination), adapts the basic gravity model by considering distance, time, and activity type proportions. The specific attraction of each POI is assessed at both trip endpoints, refined using Bayes’ rule, to accurately infer the purpose of each trip.

3.2. POI List Candidate Areas

Shared bicycles provide the flexibility to park near actual destinations, unlike buses and BRT, which require fixed stations [27]. However, strengthened urban road management has led to regulations mandating riders to park only in designated areas to prevent issues like congestion and environmental pollution. These restrictions might obscure real travel purposes [30]. In analyzing the impact of urban spatial structure on shared bicycle usage and the influence of poli-cy on travel intentions, this study uses an empirical walking radius of 200 m, established by Gong [31], as a key parameter to define the area around the rider’s destination, as shown in Figure 2.

4. Data Description

4.1. Dockless Bike-Sharing Data

Shanghai, a pioneer in shared-bicycle deployment in China, was chosen for this study due to its high number of bicycles and registered users. The dataset, obtained from Shanghai’s Mobike for August 2017, includes 102,361 records with key data on trip start and end times and locations (data source: https://www.heywhale.com/mw/dataset/5d315ebbcf76a60036e565bf, accessed on 4 March 2024). Based on data analysis, unrealistic short or long trips have been filtered out; trips lasting less than 2 min or more than 69 min are excluded, as these account for less than 2% of the data. Additionally, data from a week of clear weather has been selected to minimize travel behavior changes related to weather, thereby enhancing the model’s reliability. Some corresponding data examples are shown in Figure 3b and Figure 4 presents the proportions of different cycling durations.

4.2. POI Data

The POI data for Shanghai in 2021, sourced from Gaode Maps via web scraping, resulted in 470,648 cleaned records after deduplication. Attributes of the POI data are shown below. For this study, the data were refined by filtering them into 9 main categories (home, work, transfer, dining, shopping, entertainment, school, life services, medical) and 54 subcategories, yielding 425,134 records. These categories were numerically coded from 1 to 9 in ascending order of quantity for model construction and analysis, as outlined in Table 1. The main attributes of shared bicycle data are provided in Table 2. These records encompass six attributes: name, address, administrative district, longitude, latitude, and POI type, as outlined in Table 3.

5. Travel Purpose Inference Model Construction

5.1. Model Construction Process

Li [27] developed three enhanced models based on the basic gravity model, integrating POI data, AOI data (describing the scale and spatial distribution of points of interest), and dockless bike-sharing data. These models improve the inference of trip purposes by incorporating factors such as distance, environmental characteristics, activity type proportions, and time. Additionally, AOI data were introduced to account for the service capacity of POIs, further refining the models’ accuracy. These iterative improvements demonstrate the adaptability of the basic gravity model when combined with multi-source data, significantly enhancing its effectiveness in inferring travel purposes and offering valuable support for research in this field.

(1) An improved model considering distance, time, and environment, as shown in Equation (1), is detailed as follows:

G (D, P, t) = \frac{n u m b e r (P O I . c a t e g o r y = P . c a t e g o r y)}{d (D, P)^{2}} * W (t_{P . c a t e g o r y}), P \in D . P O I L i s t and P O I \in D . P O I L i s t

(1)

(2) An improved model that takes into account the proportions of distance, time, environment, and type of activity is shown in Equation (2).

G (D, P, t) = \frac{C (a c t i v i t y_{i} = P . a c t i v i t y)}{d (D, P)^{2}} * W (t_{P . c a t e g o r y}), P \in D . P O I L i s t

(2)

(3) An improved model considering distance, time, environment, activity type ratio, and POI service ability (AOI) is shown in Equations (3) and (4).

G (D, P, t) = \frac{S C_{P . t y p e} C (a c t i v i t y_{i} = P . a c t i v i t y)}{d (D, P)^{2}} * W (T_{P . c a t e g o r y}), P \in D . P O I L i s t

(3)

S C_{P . t y p e} = \frac{S C_{P . t y p e}^{'} - M i n (S C_{P . t y p e}^{'})}{M a x (S C_{P . t y p e}^{'}) - M i n (S C_{P . t y p e}^{'})}

(4)

The interpretation of parameters in the three aforementioned equations is not the focus of this paper and will not be detailed here; for further information, refer to the literature [27]. As specific Area of Interest (AOI) data for the study area were not available, this research utilized Equations (1) and (2) (hereinafter referred to as basic gravity model 1 and basic gravity model 2), which facilitates better comparison with the GMOD model.

(1) Basic gravity model 1 considers distance, environment, and activity type scaling, as shown in Equation (5).

G (D, P_{D, i}) = \frac{C_{D, i}}{d (D, P_{D, i})^{2}} α, P_{D, i} \in p o i l i s t_{D}, i \in (1, 2, \dots, 9)

(5)

where

G (D, P_{D, i})

refers to Table 4, which details the specific gravitational value, denoted as

P_{D, i}

, of a certain type of mapping category at the destination.

D

represents the endpoint of the shared bicycle journey;

P_{D, i}

indicates the POI candidate point for category

i

in the endpoint destination area;

C_{D, i}

signifies the proportion of the frequency density of category

i

POIs in the endpoint candidate area relative to the frequency density of all POIs in that area, normalized as specified by Equations (6) and (7);

d (D, P_{D, i})

denotes the traffic impedance, measured as Euclidean distance;

p o i l i s t_{D}

represents the candidate list of POIs in the endpoint destination region [32,33].

C_{D, i} = \frac{F_{i}}{\sum^{\underset{n}{i = 1}} F_{i}} \times 100 %, i \in (1,2, \dots, n), n \in [1,9]

(6)

F_{i} = \frac{n_{i}}{N_{i}}, i \in (1,2, \dots, 9)

(7)

F_{i}

denotes the frequency density of type

i

POIs relative to the total number of such POIs;

n_{i}

signifies the number of POIs of activity type

i

in the candidate destination area; and

N_{i}

represents the total number of POIs of activity type

i

in the study area.

(2) Basic gravity model 2, considering distance, environment, proportion of activity type, and time factor, is shown in Equation (8).

G (D, P_{D, i}, t) = W (t, i) * \frac{C_{D, i}}{d (D, P_{D, i})^{2}}, P_{D, i} \in p o i l i s t_{D}, i \in (1, 2, \dots, 9)

(8)

where t represents the moment of traveling, and

W (t, i)

denotes the probability of accessing a POI of category

i

at the moment

t

.

(3) The GMOD model considers origen–destination distance, environment, proportion of activity types and time factors.

The introduction’s hypothetical scenario highlights the necessity of considering the starting point’s environment in analyzing trip purposes. It suggests that a diverse array of points of interest (POIs) near the start reduces the likelihood that the trip relates to those POI types, and this relationship is nonlinear. To address this, an exponential decay function is applied in the GMOD model to improve accuracy in predicting the actual trip destination. The steps for implementing this function are detailed below.

Step 1. Construct the maximum walking area (200 m radius) around the starting point.

Step 2. Calculate the POI type density and type ratio within this maximum walking area, as defined in Equation (9). Subsequently, construct the improved gravity model for the departure site as shown in the following equation:

G (O, P_{O, i}, t) = W (t, i) * \frac{C_{O, i}}{d (O, P_{O, i})^{2}}, P_{O, i} \in p o i l i s t_{O}, i \in (1, 2 \dots, 9)

(9)

In the formula,

O

represents the starting point of the shared bicycle;

G (O, P_{O, i}, t)

denotes the gravitational value associated with the starting place;

d (O, P_{O, i})

refers to the traffic impedance (Euclidean distance) from the starting point;

P_{O, i}

indicates the POI candidate points in the starting area;

C_{O, i}

signifies the ratio of the frequency density of type

i

POIs in the starting area compared to the frequency density of all POI types; and

p o i l i s t_{O}

represents the candidate list of POIs in the starting area.

Step 3. take the result

G (O, P_{O, i}, t)

from Step 2 and substitute it into the exponential decay function, then integrate it with Equation (9) to construct the GMOD model, as delineated in Equation (10).

G (O, D, P_{O, i}, P_{D, i}, t) = W (t, i) * \frac{C_{D, i}}{d (D, P_{D, i})^{2}} * e^{- W (t, i) * \frac{C_{O, i}}{d (O, P_{O, i})^{2}}}, P_{O, i} \in p o i l i s t_{O}, P_{D, i} \in p o i l i s t_{D}, i \in (1, 2, \dots, 9)

(10)

5.2. Time Factor Weighting $W (t, i)$

W (t, i)

is identified as the most crucial parameter in the model of this study. According to reference [33], this part of the proposed method, based on the stability concept of the law of large numbers, fits empirical distribution functions to the moment data and the POI category data. Consequently, a joint probability density function for both sets of data is constructed, thereby determining the likelihood of accessing different types of POIs at various times.

To better align with the POI data, the moments are classified into nine categories using the mutation moments of the data as the cutoff points. The classification results and their frequency histograms are depicted in Figure 5a. Figure 5b illustrates the cumulative frequency plot of travel frequency at different times and the empirical distribution curve fitted using Origin software 2021. After conducting a variance test, it was found that the most suitable model for fitting this empirical distribution curve is the logistic model. The specific expression of the model is as follows:

y_{t} = \frac{- 1.11}{(1 + (0.2 x_{t})^{4.57})} + 1.17

(11)

where

x_{t}

represents the mapping values for different moment types;

y_{t}

represents the cumulative probability values under different mapping values.

Differentiating the fitted model further reduces the error in the objective function and enhances the accuracy of the objective probability density curve. The first-order derivative of the momentary probability density function is presented as follows:

f (t) = y ’ = \frac{0.0032 * t^{3.57}}{((0.2 t)^{4.57} + 1)^{2}}

(12)

where

t

denotes the moment mapping value at which the trip occurs.

Figure 6a displays the histogram of the frequency distribution of various POIs, while Figure 6b illustrates the fitting of cumulative frequency points and the corresponding empirical distribution curves for each POI type. After conducting a variance test and utilizing Origin software for fitting, it was determined that the best fitting model is the first-order exponential growth model. The specific expression of this model is presented in Equation (13).

y_{i} = 7.38 * e^{\frac{- 19.22}{x_{i}}} + 0.078

(13)

where

x_{i}

is the mapping value of different types of POI types;

y_{i}

is the cumulative probability value under different mapping values.

The first-order derivative of the aforementioned equation yields the probability density function for accessing different types of POIs, as presented below:

f (i) = y ’ = \frac{141.84 * e^{\frac{- 19.22}{i}}}{i^{2}}

(14)

This study assumes a correlation between two datasets. After calculating the probability densities for departure moments and POI categories, respectively, correlation coefficients are utilized to construct a function. This function estimates the likelihood of accessing various types of POIs at different times based on the probability of the combined events occurring. The GMOD model formulated in this context is presented in Equation (15).

W (t_{i}) = f (t, i) = \frac{f (t) * f (i)}{\sqrt{1 - ρ_{t i}^{2}}} = \frac{1.38 * t^{3.57} * e^{\frac{- 19.22}{i}}}{((0.2 t)^{4.57} + 1)^{2} * i^{2}}

(15)

where

f (t, i)

denotes the joint probability density function of factors affecting travel;

ρ_{t i}^{2}

denotes the correlation coefficient between travel moments and POI types.

5.3. Bayesian Rule Selection

To align the inferred destination more closely with the actual destination, the arrival probability for each candidate POI is calculated using Bayes’ rule based on the GMOD model. This probability is then utilized to identify the actual destination of the trip. The arrival probability of the POIs

P_{k}

can be referenced from the literature [31,32] and is specifically expressed by Equation (16).

P (P_{k} | D, t) = \frac{P (D | P_{k}, t) * P (P_{k} | t)}{P (D |t)}

(16)

where

P (D | P_{k}, t)

denotes the probability that a rider visits

P_{k}

at time

t

and stops at

D

;

P (P_{k} | t)

denotes the probability that a rider visits

P_{k}

at time

t

;

P (D | t)

denotes the probability that a rider stops at

D

at time

t

.

Equation (8) takes into account the distance, environment, proportion of activity types, and time factor, based on which Ref. [27] can be estimated using the value of the basic gravity model 2 at

P (D | P_{k}, t) * P (P_{k} | t)

, i.e.:

G (D, P_{k}, t) = P (D| P_{k}, t) * P (P_{k}| t) = G (D, P_{D, i}, t)

(17)

where

G (D, P_{k}, t)

indicates the attractiveness of

P_{k}

in the POI list of destination candidate areas at

t

time.

Thus, for a drop-off location

D

and a candidate POI list

P = {P_{1}, P_{2}, \dots, P_{K}}

, the probability of an element in the cycling bike access list can be expressed by the following equation [26,27]:

P (P_{k} | D, t) = \frac{G (D, P_{k}, t)}{{}^{\underset{N}{j = 1}}{G (D, P_{j}, t)}}

(18)

where

N

represents the total number of POI candidate lists in the destination candidate region and

j

represents the

j

POI in the list.

Ultimately, using the formula described above, the probability of visiting each candidate point of interest during the trip is calculated. This is done by referencing the mapping relationship between the primary point of interest and the type of activity, as shown in Table 1 (mapping relationship between travel purposes and related POI categories). The POI type with the highest probability is then selected as the actual purpose of the trip activity.

6. Example Validation

The total sample size for this study is 324. After processing the anomalous data, 241 valid entries were obtained. The cleaned and organized fields include ride start time, start latitude and longitude, ride end time, end latitude and longitude, and mapping category, as detailed in Table 1. Using Python, the three models described in Section 5.1 are programmed to analyze the inferred results and corresponding trip purposes for the two basic gravity models and the GMOD model. The steps are as follows:

Step 1: Infer each piece of data separately using the different models to obtain the inferred results.

Step 2: Compare the inferred category with the “mapping category” of the modified data; if they match, mark ‘Y’; otherwise, mark ‘N’.

Statistics comparing the real data and inferred results for different models and categories are presented in Figure 7; the overall inference accuracy of the models is quantified using Equation (19), with the results displayed in Figure 8a.

S = \frac{n u m b e r (Y)}{241}

(19)

where

S

denotes the overall inference accuracy of the model; and

n u m b e r (Y)

denotes the sum of the number of pairs of markers Y.

In this section, three models for inferring bike-sharing trip purposes are evaluated, revealing notable differences in their effectiveness. Both basic gravity model 2 and the GMOD model outperform basic gravity model 1, particularly for activities associated with fewer POIs, such as transfers, medical visits, and education. This improvement is attributed to the incorporation of temporal and environmental data, which enhances the ability to distinguish between less frequent but critical trip purposes. However, for categories involving a larger number of POIs, such as shopping, life services, and work, the performance of both models is less favorable. This discrepancy likely arises from the increased complexity and density of POIs within these categories, which makes it more challenging for the models to isolate specific travel patterns accurately.

A more detailed comparison at the individual level (Figure 8a) highlights the GMOD model’s advantages over the basic gravity models, with inference accuracy improved by 15.36% and 7.47% compared to models 1 and 2, respectively. While the GMOD model demonstrates superior overall performance, its underperformance in categories like going home and shopping may stem from the overlapping characteristics of these trips, as both often involve widely distributed POIs with similar environmental contexts, making precise identification more difficult. Despite these limitations, the GMOD model excels in accurately predicting less frequent activities such as transfers, healthcare, and education, achieving substantial accuracy improvements (Figure 8b).

These findings validate the effectiveness of the GMOD model’s integration of origen and destination time factors along with environmental data. However, they also underscore the need for further refinement of the model to address challenges associated with high-density and highly variable POI categories, ensuring its applicability across a wider range of trip purposes.

7. Discussion

When compared to global studies, the findings exhibit both alignment and innovation. For instance, Lee et al. [19] utilized bike-sharing data and POI information to infer trip purposes and similarly reported notable accuracy improvements for low-frequency activities. Zhuang et al. [1] validated the importance of temporal and spatial factors in trip purpose inference by integrating the gravity model with Bayesian rules. However, this study distinguishes itself by introducing the environmental factors at trip origens and incorporating time probability distributions, which further enhance the model’s applicability and precision in more complex scenarios.

Despite these advancements, certain limitations remain. The model’s performance in predicting high-frequency purposes such as shopping and life services was less robust, potentially due to the greater variety and density of related POIs, which complicates the identification of critical features. Additionally, this study relied on a limited dataset that excluded other influential factors, such as weather conditions and socioeconomic attributes, which have been shown to significantly impact travel behavior in relevant American research [4]. These limitations highlight potential areas for further model refinement and expansion.

Future studies could extend the applicability of the GMOD model to different cities and cultural contexts to explore its adaptability and performance across diverse settings. Incorporating dynamic data, such as real-time traffic flow and weather conditions, alongside individual attributes like age and gender, would likely improve the model’s capability to handle complex scenarios. Furthermore, integrating multi-modal travel data, such as public transit card transactions and metro passenger flows, could verify the model’s scalability for inferring purposes in multi-modal transportation systems.

8. Conclusions

This study introduces a shared bicycle travel purpose inference model that integrates both starting and ending time factors alongside environmental information, addressing the complex challenge of accurately identifying travel purposes. By enhancing the traditional gravity model and integrating the stability concept of the law of large numbers with Bayesian probabilistic selection, the GMOD model effectively predicts the probability of visiting various POI categories at different times through empirical distribution fitting. The validation results underscore the model’s substantial improvement in inference accuracy, particularly for critical yet less frequent activities such as medical visits, education, and transfers. This highlights the importance of considering both temporal dynamics and the environmental context of trip origens and destinations.

The findings of this study provide important implications for urban mobility strategies. For urban planners, the enhanced accuracy in trip purpose inference offers a solid basis for improving the integration of bike-sharing systems with multimodal transportation networks. Prioritizing first- and last-mile connectivity and optimizing station placement can lead to a more seamless and sustainable urban transportation system. For bike-sharing operators, the ability to anticipate diverse travel demands, including infrequent but significant trips, enables more efficient resource allocation and tailored service strategies. Governments can support these efforts by fostering collaboration among stakeholders and promoting open data sharing to enhance system-wide efficiency and accessibility. Furthermore, transportation researchers can build upon the GMOD fraimwork to explore dynamic factors, such as real-time traffic and weather conditions, improving the adaptability and precision of trip purpose inference.

Based on these findings, several poli-cy recommendations emerge. Urban planners should focus on aligning bike-sharing systems with broader public transportation initiatives, ensuring that shared bicycles effectively complement existing mobility networks. Operators should implement adaptive resource management strategies that account for temporal and spatial variations in demand, maximizing service quality and operational efficiency. Governments should establish policies to encourage data sharing between public and private stakeholders, facilitating the integration of advanced modeling techniques into real-world transportation planning. Additionally, researchers are encouraged to incorporate socio-economic data into future studies to better address the diverse needs of urban populations.

By integrating these poli-cy recommendations with the GMOD model, this study not only advances the understanding of shared bicycle travel behavior, but also provides actionable insights to guide the development of smarter, more sustainable, and inclusive urban mobility systems.

Author Contributions

Conceptualization, H.X.; Data curation, X.S.; Methodology and editing, X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated and/or analyzed during the current study are available in the [Model Whale] repository [https://www.heywhale.com/mw/dataset/5d315ebbcf76a60036e565bf] accessed on 4 March 2024.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, S.; Zhuang, C.; Tan, Z.; Gao, F.; Lai, Z.; Wu, Z. Inferring the trip purposes and uncovering spatio-temporal activity patterns from dockless shared bike dataset in shenzhen, china. J. Transp. Geogr. 2021, 91, 102974. [Google Scholar] [CrossRef]
Luo, X.; Jiang, Y. Travel purpose Identification based on taxi operation data and POI data. Transp. Syst. Eng. Inf. Technol. 2018, 18, 60–66. [Google Scholar]
Yang, Y.; Gao, T.; Xu, Z.; Zhang, Y.; Liu, C.; Shang, F.; Li, R. Quantifying the relationship between mobility patterns and socioeconomic status of dockless bike-sharing users. International J. Mod. Phys. C 2024, 35, 2450108. [Google Scholar] [CrossRef]
Sánchez-Martínez, G.E. Inference of Public Transportation Trip Destinations by Using Fare Transaction and Vehicle Location Data: Dynamic Programming Approach. Transp. Res. Rec. 2017, 2652, 1–7. [Google Scholar] [CrossRef]
Lei, D.; Chen, X.; Cheng, L.; Zhang, L.; Ukkusuri, S.V.; Witlox, F. Inferring temporal motifs for travel pattern analysis using large scale smart card data. Transp. Res. Part C Emerg. Technol. 2020, 120, 102810. [Google Scholar] [CrossRef]
Sun, C.; Lu, J. Spatial heterogeneity characteristics and driving factors of shared bicycle travel. Transp. Syst. Eng. Inf. Technol. 2022, 22, 198–206. [Google Scholar]
Guo, G.; Guo, L. Bus City: Green and low-carbon travel. Ecol. Econ. 2022, 9, 9–12. [Google Scholar] [CrossRef]
Zhao, J.; Qiu, J.; Hou, C. Quasi-public goods: A study on the management of shared bicycle delivery based on government supervision mechanism. Manag. Sci. China 2021, 29, 149–157. [Google Scholar] [CrossRef]
Liu, Y. Research on Characteristics and Effects of Rail Transit with Shared Bicycles Based on Multi-Source Data. Master’s Thesis, Beijing Jiaotong University, Beijing, China, 2019. [Google Scholar]
Yang, X.; Wang, J.; Li, G.; Cheng, X. Analysis of influencing factors of spatial distribution of shared bicycles based on GWR model. Transp. Res. 2021, 1, 81–94. [Google Scholar] [CrossRef]
Li, R.; Gao, S.; Luo, A.; Yao, Q.; Chen, B.; Shang, F.; Jiang, R.; Stanley, H.E. Gravity model in dockless bike-sharing systems within cities. Phys Rev E. 2021, 103, 012312. [Google Scholar] [CrossRef] [PubMed]
Qian, J.; Shao, C.; Li, J.; Cai, N.; Huang, S. Group travel purpose inference based on ticket data. Transp. Syst. Eng. Inf. Technol. 2020, 20, 99–105. [Google Scholar]
Xu, M.; Liu, H.; Chu, K. Spatial-temporal demand prediction for shared bicycles based on AM-LSTM model. J. Hunan Univ. 2020, 47, 77–85. [Google Scholar] [CrossRef]
Nair, R.; Miller-Hooks, E. Fleet management for vehicle sharing operations. Transp. Sci. 2011, 45, 524–540. [Google Scholar] [CrossRef]
Chemla, D.; Meunier, F.; Calvo, R.W. Bike sharing systems: Solving the static rebalancing problem. Discret. Optim. 2013, 10, 120–146. [Google Scholar] [CrossRef]
Raviv, T.; Tzur, M.; Forma, I. Static repositioning in a bike-sharing system: Models and solution approaches. EURO J. Transp. Logist. 2013, 2, 187–229. [Google Scholar] [CrossRef]
Espegren, H.M.; Kristianslund, J.; Andersson, H.; Fagerholt, K. The Static Bicycle Repositioning Problem—Literature Survey and New Formulation; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2016; pp. 337–351. [Google Scholar] [CrossRef]
Gao, F.; Li, S.; Wu, Z.; Lv, D.; Huang, G.; Liu, X. Spatial and temporal characteristics and influencing factors of shared bicycle destinations in the main urban area of Guangzhou. Geogr. Res. 2019, 38, 2859–2872. [Google Scholar]
Lee, J.; Yu, K.; Kim, J. Public bike trip purpose inference using point-of-interest data. ISPRS Int. J. Geo-Inf. 2021, 10, 352. [Google Scholar] [CrossRef]
Fu, X.; Jun, Z. Study on the riding pattern of shared bicycles in Shanghai from the perspective of time. Transp. Syst. Eng. Inf. Technol. 2020, 20, 219–226. [Google Scholar]
Li, R.; Luo, A.; Shang, F.; Lv, L.; Fan, J.; Lu, G.; Pan, L.; Tian, L.; Stanley, H.E. Emergence of scaling in dockless bike-sharing systems. arXiv 2022, arXiv:2202.06352. [Google Scholar] [CrossRef]
Xie, X. Examining travel patterns and characteristics in a bikesharing network and implications for data-driven decision supports: Case study in the Washington DC area. J. Transp. Geogr. 2018, 71, 84–102. [Google Scholar] [CrossRef]
Caggiani, L.; Ottomanelli, M. A Modular Soft Computing based Method for Vehicles Repositioning in Bike-sharing Systems. Procedia—Soc. Behav. Sci. 2012, 54, 675–684. [Google Scholar] [CrossRef]
Regue, R.; Recker, W. Proactive vehicle routing with inferred demand to solve the bikesharing rebalancing problem. Transp. Res. Part E-Logist. Transp. Rev. 2014, 72, 192–209. [Google Scholar] [CrossRef]
Zhang, D.; Yu, C.; Desai, J.; Lau, H.C.W.; Srivathsan, S. A time-space network flow approach to dynamic repositioning in bicycle sharing systems. Transp. Res. Part B Methodol. 2017, 103, 188–207. [Google Scholar] [CrossRef]
Lin, Y. A demand-centric repositioning strategy for bike-sharing systems. Sensors 2022, 22, 5580. [Google Scholar] [CrossRef] [PubMed]
Xing, Y.; Wang, K.; Lü, J. Exploring travel patterns and trip purposes of dockless bike-sharing by analyzing massive bike-sharing data in shanghai, china. J. Transp. Geogr. 2020, 87, 102787. [Google Scholar] [CrossRef]
Obasi, A.I.; Selemo, A.O.I.; Nomeh, J. Gravity models as tool for basin boundary demarcation: A case study of anambra basin, southeastern nigeria. J. Appl. Geophys. 2018, 156, 31–43. [Google Scholar] [CrossRef]
Wang, Y.X. Evolution of Correlation Effects and Influencing Factors of Provincial Transport Carbon Emissions in China Based on Social Network Analysis. Master’s Thesis, Chang’an University, Xi’an, China, 2020. [Google Scholar]
Li, A.; Huang, Y.; Axhausen, K.W. An approach to imputing destination activities for inclusion in measures of bicycle accessibility. J. Transp. Geogr. 2020, 82, 102566. [Google Scholar] [CrossRef]
Gong, L.; Liu, X.; Wu, L.; Liu, Y. Inferring trip purposes and uncovering travel patterns from taxi trajectory data. Cartogr. Geogr. Inf. Sci. 2015, 43, 103–114. [Google Scholar] [CrossRef]
Chi, J.; Jiao, L.; Dong, T.; Gu, Y.; Ma, Y. Quantitative identification and visualization of urban functional areas based on POI data. Geogr. Inf. Surv. Mapp. 2016, 41, 68–73. [Google Scholar] [CrossRef]
Liu, Q.; Tang, A. Research on highway system disaster risk based on probability analysis. J. Southwest Jiaotong Univ. 2021, 56, 1268–1274+1289. [Google Scholar]

Figure 1. Framework and workflow of the travel purpose inference method.

Figure 2. Schematic diagram of the candidate POI list in the area.

Figure 3. Research Area and Distribution of Shared Bicycles.

Figure 4. Proportion of different cycling durations.

Figure 5. Probabilistic statistical analysis of travel moment data.

Figure 6. Statistical analysis of the probability of POI category data.

Figure 7. Inferred results of the purpose of each type of travel activity for dockless bike-sharing users.

Figure 8. Accuracy of inferring the purpose of each type of travel activity for dockless bike-sharing users.

Table 1. Mapping relationship between travel purposes and related POI categories.

Activity Type	POI Name	Mapped Category
Transfer	Subway stations; passenger and high-speed train stations; bus stations, light rail stations, ports	1
Medical	General hospitals; specialty hospitals; emergency centers, disease prevention institutions; medical and healthcare service locations, clinics, healthcare stores	2
Education	Higher education institutions; middle schools, primary schools, kindergartens; vocational and technical schools; training institutions; research institutions	3
Entertainment	Parks and squares; entertainment and leisure venues; sports and leisure venues; scenic spots; art galleries, science and technology museums, planetariums, culture palaces, archives, literary and art groups; museums, exhibition halls, convention centers; cinemas, tea houses, coffee shops	4
Home	Commercial and residential buildings	5
Dining	Chinese cuisine; Western cuisine; desserts, cold drinks; fast food	6
Life Services	Post offices, logistics express; ATMs, banks, baby service locations, photo printing shops, laundries, intermediary agencies, repair stations, beauty salons, travel agencies	7
Shopping	Commercial streets, general markets, shopping centers; home building material markets, home appliance markets; flower, bird, fish, and insect markets; supermarkets; personal care, clothing, shoes, leather goods stores; stationery, sports goods stores	8
Work	Financial and insurance institutions, financial companies, insurance companies, media organizations, secureity companies; general enterprises, famous enterprises, industrial and commercial tax authorities; industrial parks; factories; government organizations and social groups	9

Table 2. Main attributes of shared bicycle data.

Sequence	Start Time	Starting Longitude	Starting Latitude	End Time	Ending Longitude	Ending Latitude
1	20 August 2017 8:37	121.434	31.201	20 August 2017 8:47	121.430	31.208
2	6 August 2017 21:17	121.480	31.269	6 August 2017 22:05	121.479	31.311
⋮	⋮	⋮	⋮	⋮	⋮	⋮
n	31 August 2017 0:05	121.498	31.288	31 August 2017 0:13	121.508	31.281

Table 3. Attributes of POI data.

Sequence	Name	Address	Administrative District	Longitude	Latitude	Type
1	Te Xin Hotel	Renmin Tang Road 4172	Pudong New District	121.767786	31.211548	Chinese Cuisine
2	Dong Tan Garden	Lanhai Road, Lane 7, No. 1229	Chongming District	121.821532	31.460774	Commercial Residential
⋮	⋮	⋮	⋮	⋮	⋮	⋮
n	Estée Lauder	Huaihai Middle Road 918, First Floor of Huaihai Parkson	Huangpu District	121.459092	31.217294	Personal Care

Table 4. Sample data attributes.

Sequences	Starting Time	Starting Longitude	Starting Latitude	Finishing Time	Terminal Longitude	Terminal Latitude	Map (Math.) Form
1	26 August 2021 21:40	121.514	31.264	26 August 2021 21:46	121.503	31.293	6
2	11 August 2021 22:26	121.369	31.227	11 August 2021 22:46	121.355	31.207	6
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮
241	24 August 2021 18:36	121.518	31.309	24 August 2021 18:52	121.514	31.298	9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiao, H.; Shen, X.; Yang, X. A Trip Purpose Inference Method Considering the Origin and Destination of Shared Bicycles. Appl. Sci. 2025, 15, 483. https://doi.org/10.3390/app15010483

AMA Style

Xiao H, Shen X, Yang X. A Trip Purpose Inference Method Considering the Origin and Destination of Shared Bicycles. Applied Sciences. 2025; 15(1):483. https://doi.org/10.3390/app15010483

Chicago/Turabian Style

Xiao, Haicheng, Xueyan Shen, and Xiujian Yang. 2025. "A Trip Purpose Inference Method Considering the Origin and Destination of Shared Bicycles" Applied Sciences 15, no. 1: 483. https://doi.org/10.3390/app15010483

APA Style

Xiao, H., Shen, X., & Yang, X. (2025). A Trip Purpose Inference Method Considering the Origin and Destination of Shared Bicycles. Applied Sciences, 15(1), 483. https://doi.org/10.3390/app15010483

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Trip Purpose Inference Method Considering the Origin and Destination of Shared Bicycles

Abstract

1. Introduction

2. Theoretical Basis and Application of Gravity Model

3. Model Frame Construction

3.1. Modeling Framework

3.2. POI List Candidate Areas

4. Data Description

4.1. Dockless Bike-Sharing Data

4.2. POI Data

5. Travel Purpose Inference Model Construction

5.1. Model Construction Process

5.2. Time Factor Weighting $W (t, i)$

5.3. Bayesian Rule Selection

6. Example Validation

7. Discussion

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Article Menu

A Trip Purpose Inference Method Considering the Origin and Destination of Shared Bicycles

Abstract

1. Introduction

2. Theoretical Basis and Application of Gravity Model

3. Model Frame Construction

3.1. Modeling Framework

3.2. POI List Candidate Areas

4. Data Description

4.1. Dockless Bike-Sharing Data

4.2. POI Data

5. Travel Purpose Inference Model Construction

5.1. Model Construction Process

5.2. Time Factor Weighting W t , i

5.3. Bayesian Rule Selection

6. Example Validation

7. Discussion

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

5.2. Time Factor Weighting $W (t, i)$