Skip to content

platonai/PulsarRPA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿค– PulsarRPA

Docker Pulls License: APACHE2 Spring Boot


English | ็ฎ€ไฝ“ไธญๆ–‡ | ไธญๅ›ฝ้•œๅƒ

๐ŸŒŸ Introduction

๐Ÿ’– PulsarRPA: The AI-Powered, Lightning-Fast Browser Automation Solution! ๐Ÿ’–

โœจ Key Capabilities:

  • ๐Ÿค– AI Integration with LLMs โ€“ Smarter automation powered by large language models.
  • โšก Ultra-Fast Automation โ€“ Coroutine-safe browser automation concurrency, spider-level crawling performance.
  • ๐Ÿง  Web Understanding โ€“ Deep comprehension of dynamic web content.
  • ๐Ÿ“Š Data Extraction APIs โ€“ Powerful tools to extract structured data effortlessly.

Automate the browser and extract data at scale with simple text.

Go to https://www.amazon.com/dp/B0C1H26C46

After browser launch: clear browser cookies.
After page load: scroll to the middle.

Summarize the product.
Extract: product name, price, ratings.
Find all links containing /dp/.

๐ŸŽฅ Demo Videos

๐ŸŽฌ YouTube: Watch the video

๐Ÿ“บ Bilibili: https://www.bilibili.com/video/BV1kM2rYrEFC


๐Ÿš€ Quick Start Guide

โ–ถ๏ธ Run PulsarRPA

๐Ÿ“ฆ Run the Executable JAR โ€” Best Experience

๐Ÿงฉ Download

curl -L -o PulsarRPA.jar https://github.com/platonai/PulsarRPA/releases/download/v3.0.14/PulsarRPA.jar

๐Ÿš€ Run

# make sure LLM api key is set. VOLCENGINE_API_KEY/OPENAI_API_KEY also supported.
echo $DEEPSEEK_API_KEY
java -D"DEEPSEEK_API_KEY=${DEEPSEEK_API_KEY}" -jar PulsarRPA.jar

๐Ÿ” Tip: Make sure DEEPSEEK_API_KEY or other LLM API key is set in your environment, or AI features will not be available.

๐Ÿ” Tip: Windows PowerShell syntax: $env:DEEPSEEK_API_KEY (environment variable) vs $DEEPSEEK_API_KEY (script variable).


๐Ÿ“‚ Resources

โ–ถ Run with IDE

  • Open the project in your IDE
  • Run the ai.platon.pulsar.app.PulsarApplicationKt main class

๐Ÿณ Docker Users

# make sure LLM api key is set. VOLCENGINE_API_KEY/OPENAI_API_KEY also supported.
echo $DEEPSEEK_API_KEY
docker run -d -p 8182:8182 -e DEEPSEEK_API_KEY=${DEEPSEEK_API_KEY} galaxyeye88/pulsar-rpa:latest

๐ŸŒŸ For Beginners โ€“ Just Text, No Code!

Use the commands API to perform browser operations, extract web data, analyze websites, and more.

๐Ÿ“ฅ Example Request (Text-based):

WebUI: http://localhost:8182/command.html

commander

REST API

๐Ÿ“„ Plain-Text-Based Version:

curl -X POST "http://localhost:8182/api/commands/plain" -H "Content-Type: text/plain" -d '
    Go to https://www.amazon.com/dp/B0C1H26C46
    
    After browser launch: clear browser cookies.
    After page load: scroll to the middle.
    
    Summarize the product.
    Extract: product name, price, ratings.
    Find all links containing /dp/.
  '

๐Ÿ“„ JSON-Based Version:

curl -X POST "http://localhost:8182/api/commands" -H "Content-Type: application/json" -d '{
    "url": "https://www.amazon.com/dp/B0C1H26C46",
    "onBrowserLaunchedActions": ["clear browser cookies"],
    "onPageReadyActions": ["scroll to the middle"],
    "pageSummaryPrompt": "Provide a brief introduction of this product.",
    "dataExtractionRules": "product name, price, and ratings",
    "uriExtractionRules": "all links containing `/dp/` on the page"
  }'

๐Ÿ’ก Tip: You don't need to fill in every field โ€” just what you need.

๐ŸŽ“ For Advanced Users โ€” LLM + X-SQL: Precise, Flexible, Powerful

Harness the power of the x/e API for highly precise, flexible, and intelligent data extraction.

curl -X POST "http://localhost:8182/api/x/e" -H "Content-Type: text/plain" -d "
select
  llm_extract(dom, 'product name, price, ratings') as llm_extracted_data,
  dom_base_uri(dom) as url,
  dom_first_text(dom, '#productTitle') as title,
  dom_first_slim_html(dom, 'img:expr(width > 400)') as img
from load_and_select('https://www.amazon.com/dp/B0C1H26C46', 'body');
"

The extracted data example:

{
  "llm_extracted_data": {
    "product name": "Apple iPhone 15 Pro Max",
    "price": "$1,199.00",
    "ratings": "4.5 out of 5 stars"
  },
  "url": "https://www.amazon.com/dp/B0C1H26C46",
  "title": "Apple iPhone 15 Pro Max",
  "img": "<img src=\"https://example.com/image.jpg\" />"
}

๐Ÿ‘จโ€๐Ÿ’ป For Experts - Native API: Powerful!

๐Ÿš€ Superfast Page Visiting and Data Extraction:

PulsarRPA enables high-speed parallel web scraping with coroutine-based concurrency, delivering efficient data extraction while minimizing resource overhead.

val args = "-refresh -dropContent -interactLevel fastest"
val resource = "seeds/amazon/best-sellers/leaf-categories.txt"
val links =
    LinkExtractors.fromResource(resource).asSequence().map { ListenableHyperlink(it, "", args = args) }.onEach {
        it.eventHandlers.browseEventHandlers.onWillNavigate.addLast { page, driver ->
            driver.addBlockedURLs(blockingUrls)
        }
    }.toList()

session.submitAll(links)

๐Ÿ“ Example: View Kotlin Code

๐ŸŽฎ Browser Control:

PulsarRPA implements coroutine-safe browser control.

val prompts = """
move cursor to the element with id 'title' and click it
scroll to middle
scroll to top
get the text of the element with id 'title'
"""

val eventHandlers = DefaultPageEventHandlers()
eventHandlers.browseEventHandlers.onDocumentActuallyReady.addLast { page, driver ->
    val result = session.instruct(prompts, driver)
}
session.open(url, eventHandlers)

๐Ÿ“ Example: View Kotlin Code


๐Ÿค– Robotic Process Automation Capabilities:

PulsarRPA provides flexible robotic process automation capabilities.

val options = session.options(args)
val event = options.eventHandlers.browseEventHandlers
event.onBrowserLaunched.addLast { page, driver ->
    warnUpBrowser(page, driver)
}
event.onWillFetch.addLast { page, driver ->
    waitForReferrer(page, driver)
    waitForPreviousPage(page, driver)
}
event.onWillCheckDocumentState.addLast { page, driver ->
    driver.waitForSelector("body h1[itemprop=name]")
    driver.click(".mask-layer-close-button")
}
session.load(url, options)

๐Ÿ“ Example: View Kotlin Code


๐Ÿ” Complex Data Extraction with X-SQL:

PulsarRPA provides X-SQL for complex data extraction.

select
    llm_extract(dom, 'product name, price, ratings, score') as llm_extracted_data,
    dom_first_text(dom, '#productTitle') as title,
    dom_first_text(dom, '#bylineInfo') as brand,
    dom_first_text(dom, '#price tr td:matches(^Price) ~ td') as price,
    dom_first_text(dom, '#acrCustomerReviewText') as ratings,
    str_first_float(dom_first_text(dom, '#reviewsMedley .AverageCustomerReviews span:contains(out of)'), 0.0) as score
from load_and_select('https://www.amazon.com/dp/B0C1H26C46  -i 1s -njr 3', 'body');

๐Ÿ“š Example Code:


๐Ÿ“œ Documents


๐Ÿ”ง Proxies - Unblock Websites

Set the environment variable PROXY_ROTATION_URL to the URL provided by your proxy service:

export PROXY_ROTATION_URL=https://your-proxy-provider.com/rotation-endpoint

Each time the rotation URL is accessed, it should return a response containing one or more fresh proxy IPs. Ask your proxy provider for such a URL.


โœจ Features

๐Ÿ•ท๏ธ Web Spider

  • Scalable crawling
  • Browser rendering
  • AJAX data extraction

๐Ÿค– AI-Powered

  • Automatic field extraction
  • Pattern recognition
  • Accurate data capture

๐Ÿง  LLM Integration

  • Natural language web content analysis
  • Intuitive content description

๐ŸŽฏ Text-to-Action

  • Simple language commands
  • Intuitive browser control

๐Ÿค– RPA Capabilities

  • Human-like task automation
  • SPA crawling support
  • Advanced workflow automation

๐Ÿ› ๏ธ Developer-Friendly

  • One-line data extraction
  • SQL-like query interface
  • Simple API integration

๐Ÿ“Š X-SQL Power

  • Extended SQL for web data
  • Content mining capabilities
  • Web business intelligence

๐Ÿ›ก๏ธ Bot Protection

  • Advanced stealth techniques
  • IP rotation
  • Privacy context management

โšก Performance

  • Parallel page rendering
  • High-efficiency processing
  • Block-resistant design

๐Ÿ’ฐ Cost-Effective

  • 100,000+ pages/day
  • Minimal hardware requirements
  • Resource-efficient operation

โœ… Quality Assurance

  • Smart retry mechanisms
  • Precise scheduling
  • Complete lifecycle management

๐ŸŒ Scalability

  • Fully distributed architecture
  • Massive-scale capability
  • Enterprise-ready

๐Ÿ“ฆ Storage Options

  • Local File System
  • MongoDB
  • HBase
  • Gora support

๐Ÿ“Š Monitoring

  • Comprehensive logging
  • Detailed metrics
  • Full transparency

๐Ÿ“ž Contact Us

WeChat QR Code
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy