Skip to content

This repository is a reference to build Custom ETL Pipeline for creating TF-Records using Apache Beam Python SDK on Google Cloud Dataflow

Notifications You must be signed in to change notification settings

swapnil3597/dataflow-tfrecord

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dataflow-tfrecord

This repository is a reference ETL Pipeline for creating TF-Records using Apache Beam Python SDK on Google Cloud Dataflow. You may find the blog for his repo here

To run this pipeline:

Step 1:

First have a csv_file in format in the GCS Bucket,

gs://path/img.png,label1
gs://path/img.png,label2
...

and corresponding dummy square images of same size stored in the GCS bucket at correct path.

Step 2:

Before running the pipeline make sure you initialize the following variables in create_tfrecords/create_tfrecords.py:

# TODO: Initialize below variables
LABEL_DICT = {
    'label1':0,
    'label2':1,
    'label3':2}
NUM_CLASS = len(LABEL_DICT)
IMG_SIZE = 28 # TODO: Enter your own int value for square image

PROJ_NAME = 'Your Project Name'

CSV_PATH = 'gs://<bucket-name>/path-to.csv'
RUNNER = 'DataflowRunner'
STAGING_LOCATION = 'gs://<bucket-name>/staging/'
TEMP_LOCATION = 'gs://<bucket-name>/temp/'
TEMPLATE_LOCATION = 'gs://<bucket-name>/path/to/template_location/template_name'
JOB_NAME = 'random-job-name'
OUTPUT_PATH = 'gs://<bucket-name>/output_path/'

Step 3:

Now, inorder to run the pipeline on Google VM Instance you may run,

bash run.sh

About

This repository is a reference to build Custom ETL Pipeline for creating TF-Records using Apache Beam Python SDK on Google Cloud Dataflow

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy