Skip to content

A high performance and easy to use Web Archive (WARC) file reader

Notifications You must be signed in to change notification settings

orottier/rust-warc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Rust-Warc

crates.io

A high performance and easy to use Web Archive (WARC) file reader

use rust_warc::WarcReader;

use std::io;

fn main() {
    // we're taking input from stdin here, but any BufRead will do
    let stdin = io::stdin();
    let handle = stdin.lock();

    let warc = WarcReader::new(handle);

    let mut response_counter = 0;
    let mut response_size = 0;

    for item in warc {
        let record = item.unwrap(); // could be IO/malformed error

        // header names are case insensitive
        if record.header.get(&"WARC-Type".into()) == Some(&"response".into()) {
            response_counter += 1;
            response_size += record.content.len();
        }
    }

    println!("response records: {}", response_counter);
    println!("response size: {} MiB", response_size >> 20);
}

About

A high performance and easy to use Web Archive (WARC) file reader

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy