Skip to content

sheikhomar/coresets-bench

Repository files navigation

Test Bench for Coreset Algorithms

BICO code is downloaded from the BICO website.

Getting Started

Remember to install the prerequisite libraries and tools:

./install_prerequisites.sh

The BICO project can be built by using supplied Makefile in the bico/build directory:

make -C bico/build

The MT project can be built with Make:

make -C mt

The k-means++ tool can be built with Make:

make -C kmeans

The GS project can be built with CMake:

sudo apt-get update
sudo apt-get install -y ninja-build
cmake -S gs -B gs/build -G "Ninja"
cmake --build gs/build

Datasets

Generate the nytimes100d dataset:

# Download file
wget https://archive.ics.uci.edu/ml/machine-learning-databases/bag-of-words/docword.nytimes.txt.gz \
    -O data/input/docword.nytimes.txt.gz
# Perform dimensionality reduction via random projection.
export CPATH=/home/omar/apps/boost_1_76_0
export LIBRARY_PATH=/home/omar/apps/boost_1_76_0/stage/lib
make -C rp && rp/bin/rp.exe \
    reduce-dim \
    data/input/docword.nytimes.txt.gz \
    8192,100 \
    0 \
    1704100552 \
    data/input/docword.nytimes.rp8192-100.txt.gz

Generate the nytimespcalowd dataset:

poetry run python -m xrun.data.tsvd -i data/input/docword.nytimes.txt.gz -d 10,20,30,40,50

Debugging

Segmentation fault

Use AddressSanitizer (ASAN) to debug segfaults. ASAN can help detect memory errors at runtime.

sudo apt install libgcc-9-dev
g++ -ggdb -std=c++17 -fsanitize=address -std=c++17 -o bin/rp.exe main.cpp

Running Experiments

pyenv install
poetry install
poetry run python -m xrun.go

Create conda environment:

conda env create -f environment.yml 
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy