Skip to content

A Nodejs script that scrapes data from Github public (org/user) profiles.

License

Notifications You must be signed in to change notification settings

ranbot-ai/github-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Scraper

This project is a GitHub scraper that uses Puppeteer to extract information about GitHub organizations/users and their repositories.

It collects data such as organization/user details, top languages, and repository information.

Installation

  1. Clone the repository:

    git clone https://github.com/ranbot-ai/github-scraper.git
    cd github-scraper
  2. Install the dependencies:

    npm install

Usage

  1. Set the PERMALINK & WITH_REPOS environment variable to the GitHub organization/user you want to scrape:

    ➜  github-scraper git:(main) env PERMALINK=ranbot-ai WITH_REPOS=false npx ts-node src/index.ts
    // Organization Info with Repos: {
    name: 'RanBOT Lab',
    picImageURL: 'https://avatars.githubusercontent.com/u/85019129?s=200&v=4',
    description: 'RanBOT uses AI/ML to transform web content into structured data.',
    topLanguages: [ 'TypeScript', 'JavaScript', 'CSS' ],
    followers: 4,
    peopleCount: 1,
    website: 'https://ranbot.online',
    location: 'China',
    socialLinks: [
     'https://linkedin.com/company/ranbot-ai',
     'https://x.com/ranbotai',
     'https://www.tiktok.com/@ranbotai'
    ]
    }
    ➜  github-scraper git:(main) env PERMALINK=encoreshao WITH_REPOS=false npx ts-node src/index.ts
    // User Info with Repos: {
    name: 'Encore Shao',
    nickname: 'encoreshao',
    picImageURL: 'https://avatars.githubusercontent.com/u/745929?v=4',
    followers: 26,
    following: 35,
    website: 'https://icmoc.com',
    location: 'Shanghai, China',
    currentCompany: 'Ekohe',
    position: 'Engineer Manager | Researcher',
    organizations: [
     {
       name: 'ekohe',
       link: '/ekohe',
       orgImageURL: 'https://avatars.githubusercontent.com/u/1390403?s=64&v=4'
     },
     {
       name: 'ranbot-ai',
       link: '/ranbot-ai',
       orgImageURL: 'https://avatars.githubusercontent.com/u/85019129?s=64&v=4'
     },
     {
       name: '',
       link: '/encoreshao?tab=overview&org=ranbot-ai',
       orgImageURL: 'https://avatars.githubusercontent.com/u/85019129?s=40&v=4'
     },
     {
       name: '',
       link: '/encoreshao?tab=overview&org=ekohe',
       orgImageURL: 'https://avatars.githubusercontent.com/u/1390403?s=40&v=4'
     },
     {
       name: '',
       link: '/encoreshao?tab=overview&org=linktr-ai',
       orgImageURL: 'https://avatars.githubusercontent.com/u/178542156?s=40&v=4'
     }
    ]
    }

Features

  • Extracts organization/user information including name, description, top languages, employee count, website, and social links.
  • Scrapes repository data such as name, link, description, stars, forks, and pull requests.
  • Handles pagination to scrape multiple pages of repositories.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request for any improvements or bug fixes.

License

This project is licensed under the MIT License.

Releases

No releases published

Packages

No packages published
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy