0% found this document useful (0 votes)
2 views7 pages

Documentation Part by Pranay Kashyap

This document outlines the process for generating synthetic Point of Sale (POS) transaction data using the Faker library in Python. It includes steps for data generation, handling null values, detecting and managing outliers, and saving the data to a CSV file. The documentation also provides code snippets for each step to facilitate understanding and implementation.

Uploaded by

pranaykashyap693
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views7 pages

Documentation Part by Pranay Kashyap

This document outlines the process for generating synthetic Point of Sale (POS) transaction data using the Faker library in Python. It includes steps for data generation, handling null values, detecting and managing outliers, and saving the data to a CSV file. The documentation also provides code snippets for each step to facilitate understanding and implementation.

Uploaded by

pranaykashyap693
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Documentation of generating synthetic POS

transaction data

So, we have to generate the Synthetic Point of Sale (POS) Transactions data and
also make the documentation. This documentation provides an overview of the
code, including its purpose, functionality, and usage instructions.

1. We have to generate the synthetic POS transaction data


So as to do that very particular thing we have to take help from a python library called faker.

That helps us to generate synthetic POS transactions data using the Faker

library.

As shown below:

Import Libraries
import pandas as pd

from faker import Faker

import random

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

look at the highlighted one this is the syntax that how to import faker library of python.

And other are also necessary and important python libraries used for data visualization, data analysis, data
preprocessing and plotting graphs etc.

Also we have create an instance of faker library :


fake = Faker()

2. Generate Synthetic POS Transactions Data


We have create a function for this

def generate_pos_transactions(num_transactions):

Generate synthetic Point of Sale (POS) transactions

data. Parameters:
- num_transactions (int): Number of transactions to generate.

Returns:
- DataFrame: DataFrame containing synthetic POS transactions data.
# Generate transaction details

# Generate 10000 transaction records

# Convert the list of dictionaries to a

DataFrame # Save DataFrame to CSV

# Display the first few rows of the DataFrame

# Calculate total Amount


transaction_AMOUNT = round(unit_price + (unit_price * (tax / 100)), 2)

# Calculate tax (assuming 10% tax rate)


tax = round(random.choice([0.1, 0.3]), 2)
# Convert list of dictionaries to DataFrame
transaction_df = pd.DataFrame(transactions)

# Save DataFrame to CSV


transaction_df.to_csv('pos_synth_data_genr.csv', index=False)
return df_transactions

3. Read csv and display some rows:


So after generating pos data and made a csv of that data .Now we have to read that data

Syntax:

df=pd.read_csv('pos_synth_data_genr.csv')

and to display some rows or entries of our POS tranction data what we do is :

syntax: df.head()

4. Check for null values: Syntax:df.info()

Arrow shows in item purchase column we have some null values.


RangeIndex: 10000 entries, 0 to 9999

Item Purchase 9990 non-null object

5. Add null values :


Add null value NaN every 10th row in the 'Unit Price' column.

Syntax: df['Unit Price'].iloc[::10] = np.Nan

and display them :

df.info()

Item Purchase 9989 non-null object


Unit Price 9000 non-null float64
Display sum of the null values :
Syntax : df.isnull().sum()

6. Filling missing or null values:


Item Purchase
The Fill Missing Values script fills missing values in the Unit_Price and
columns of a DataFrame using mean imputation. It helps ensure
completeness of the data for further analysis.

Syntax : df['Unit Price'].fillna(df['Unit Price'].mean(),inplace=True)

and

df['Item Purchase'].fillna(df['Item Purchase'].mode()[0], inplace=True)


now check for null values in a data set : syntax: df.info()

Checking for sum of null values :


df.isnull().sum()

Now our data is clean there is no null value in any column

7. Detect ouliers :
Syntax : sns.boxplot(df['Transaction Amount'])

As we can see in fig there isn’t any outlier.


8. Introducing some outliers in an data set :

Syntax: df['Transaction Amount'].iloc[1:5] = [4000,5000,6000,7000]


To se them
Syntax : df.head(5)

Now again Detect ouliers


Syntax : sns.boxplot(df['Transaction Amount'])

now we will sime outliers in transaction amount column as shown below:


9. Handle outliers
For that we have to Calculate the inter quartile range IQR:

Q1 = df['Transaction Amount'].quantile(0.25)
Q3 = df['Transaction Amount'].quantile(0.75)
IQR = Q3 - Q1
print(IQR)
504.72999

define the lower bound and upper bound for the outliers :

lower_bound = Q1 - 1.5 * IQR


upper_bound = Q3 + 1.5 * IQR
print(lower_bound , upper_bound)

lower_bound = -510.13499999999976 upper_bound = 1508.7849999999999

print the outliers:


syntax :

outliers = df[(df['Transaction Amount'] < lower_bound) | (df['Transaction Amount'] > upper_bound)

print the length of outliers:


syntax : outliers_count = len(outliers)

print("Number of outliers:", outliers_count)


Remove outliers :
We have to calculate Z-score for the Transaction amount column

Syntax:

from scipy.stats import zscore

z_score=zscore(df['Transaction Amount'])

abs_z_score=abs(z_score)

filtered_entries=(abs_z_score<3) here the 3 is threshold value

df=df[filtered_entries]

now the outliers are

removed

again plot a boxplot to check whater there are any outliers are not
sns.boxplot(df['Transaction Amount'])

no outlier

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy