Skip to content

azure-code-repo/etl-toolkit

Repository files navigation

Introduction

TODO: Give a short introduction of your project. Let this section explain the objectives or the motivation behind this project.

Getting Started

TODO: Guide users through getting your code up and running on their own system. In this section you can talk about:

  1. Installation process
  2. Software dependencies
  3. Latest releases
  4. API references

Build and Test

TODO: Describe and show how to build your code and run the tests.

Documentation

sphinx-apidoc -f -o docs src/ananta -l -d 5 -M make html

  • Output Docs
  • Use docs/index.html

Contribute

VScode Tasks

{
    "version": "2.0.0",
    "tasks": [
        {
            "label": "build wheel",
            "type": "shell",
            "command": "python -m build"
        },
        {
            "label": "build docs",
            "type": "shell",
            "command": "sphinx-apidoc -f -o docs src/ananta -l -d 5 -M"
        }
    ]
}

Local installation for development

Install Java

version = jre1.8.0_341

Install Spark

version = spark-3.2.1-bin-hadoop3.2-scala2.13.tgz - Download link

Install python

version = 3.8.13

Get Winutils for Hadoop on windows

https://github.com/cdarlint/winutils

Setup PATH local / system on windows

Setup following User Variables under Windows> Path as follow:

  • HADOOP_HOME
    • winutils location
    • 3.2.1
  • LOCAL_DEV
    • TRUE
  • PYSPARK_PYTHON
    • python location
    • If you are using conda/env use its python path
  • PYTHONPATH
    • python location
    • If you are using conda/env use its python path
  • SPARK_HOME
    • location of spark downloaded and extracted
  • SPARK_LOCAL_DIRS (optional)
  • SPARK_VERSION
    • 3.2.1

Setup environments variable as follow:

  • %SPARK_HOME%\bin
  • %SPARK_HOME%\python
  • %SPARK_HOME%\python\lib\py4j-0.10.9.5-src.zip
  • %HADOOP_HOME%\bin

Setup local Spark with Delta Live table

pyspark --packages io.delta:delta-core_2.12:1.2.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

Fix Bugs

Pyspark delta core to scala 2.13

  • configure_spark_with_delta_pip(config).getOrCreate()
  • Go to configure_spark_with_delta_pip which is pip_utils.py
  • scroll down to row 77
  • change 12 => 13

Ananta

Ananta - Data Onboarder 0.0.1 documentation

Ananta - Data Onboarder

Ananta - Data Onboarder

  • Ananta - Data Onboarder 0.0.1 documentation

Welcome to Ananta’s documentation

Indices and tables


© Copyright 2022,.

Built with Sphinx using a theme provided by Read the Docs.

etl-toolkit

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy