adding data description

athms · athms · commit 5e177780b2d4 · 2021-10-20T14:41:48.000-07:00
diff --git a/README.md b/README.md
@@ -5,9 +5,9 @@
 
 This example was created for the [2021 fall lecture series](https://datascience.stanford.edu/news/center-open-and-reproducible-science-cores-fall-lecture-series) of [Stanford's Center for Open and REproducible Science (CORES)](https://datascience.stanford.edu/cores).
 
-The goal of this analysis is to study the effect of varying different hyper-parameters of the training of a simple classification model on its performance in sklearn's handwritten digit dataset. 
+The goal of this analysis is to study the effect of varying different hyper-parameters of the training of a simple classification model on its performance in scikit-learn's handwritten digit dataset. 
 
-Specifically, we will study the effect of varying the learning rate, regularisation strength, number of gradient descent iterations, and random shuffling of the data on the cross-validated performance of [sklearn's default linear one-vs-rest SVM classifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html).
+Specifically, we will study the effect of varying the learning rate, regularisation strength, number of gradient descent iterations, and random shuffling of the data on the cross-validated performance of [scikit-learn's default linear one-vs-rest SVM classifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html).
 
 Each hyper-parameter is varied individually, while all other hyper-parameters are set to default values (see [scripts/evaluate_hyper_params_effect.py](scripts/evaluate_hyper_params_effect.py))
 
@@ -21,7 +21,7 @@ Each hyper-parameter is varied individually, while all other hyper-parameters ar
 ├── pyproject.toml     <- Lists all dependencies
 ├── README.md          <- This README file.
 ├── data/
-|    └──               <- A copy of the handwritten digit dataset provided by sklearn
+|    └──               <- A copy of the handwritten digit dataset provided by scikit-learn
 |
 ├── results/
 |    ├── estimates/
@@ -46,6 +46,12 @@ Each hyper-parameter is varied individually, while all other hyper-parameters ar
 └── setup.py           <- makes project pip-installable (pip install -e .) so that 'src' can be imported
 ```
 
+## Data description
+
+We use the handwritten digits dataset provided by [scikit-learn](https://scikit-learn.org/stable/). For details on this dataset, see scikit-learn's documentation:
+
+https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html
+
 
 ## Installation
 
@@ -82,7 +88,7 @@ Our Makefile provides the following analysis targets:
 | Analysis target | Description |
 | --- | ----------- |
 | all | Runs the entire analysis pipeline |
-| load | Downloads sklearn's handwritten digit dataset |
+| load | Downloads scikit-learn's handwritten digit dataset |
 | evaluate | Runs our cross-validated hyper-parameter evaluation |
 | plot | Summarizes results of evaluation in a figure |