postgresml
diff --git a/‎pgml-cms/docs/api/sql-extension/pgml.decompose.md
Lines changed: 50 additions & 0 deletions b/‎pgml-cms/docs/api/sql-extension/pgml.decompose.md
Lines changed: 50 additions & 0 deletions
diff --git a/‎pgml-cms/docs/api/sql-extension/pgml.train/clustering.md
Lines changed: 3 additions & 3 deletions b/‎pgml-cms/docs/api/sql-extension/pgml.train/clustering.md
Lines changed: 3 additions & 3 deletions
diff --git a/‎pgml-cms/docs/api/sql-extension/pgml.train/decomposition.md
Lines changed: 42 additions & 0 deletions b/‎pgml-cms/docs/api/sql-extension/pgml.train/decomposition.md
Lines changed: 42 additions & 0 deletions
diff --git a/‎pgml-dashboard/src/models.rs
Lines changed: 4 additions & 2 deletions b/‎pgml-dashboard/src/models.rs
Lines changed: 4 additions & 2 deletions
diff --git a/‎pgml-extension/.cargo/config renamed to ‎pgml-extension/.cargo/config.toml b/‎pgml-extension/.cargo/config renamed to ‎pgml-extension/.cargo/config.toml
diff --git a/‎pgml-extension/Cargo.lock
Lines changed: 1 addition & 1 deletion b/‎pgml-extension/Cargo.lock
Lines changed: 1 addition & 1 deletion
diff --git a/‎pgml-extension/Cargo.toml
Lines changed: 1 addition & 1 deletion b/‎pgml-extension/Cargo.toml
Lines changed: 1 addition & 1 deletion
diff --git a/‎pgml-extension/examples/cluster.sql renamed to ‎pgml-extension/examples/clustering.sql
Lines changed: 1 addition & 1 deletion b/‎pgml-extension/examples/cluster.sql renamed to ‎pgml-extension/examples/clustering.sql
Lines changed: 1 addition & 1 deletion
diff --git a/‎pgml-extension/examples/decomposition.sql
Lines changed: 60 additions & 0 deletions b/‎pgml-extension/examples/decomposition.sql
Lines changed: 60 additions & 0 deletions
diff --git a/‎pgml-extension/examples/image_classification.sql
Lines changed: 2 additions & 3 deletions b/‎pgml-extension/examples/image_classification.sql
Lines changed: 2 additions & 3 deletions
@@ -0,0 +1,50 @@
+---
+description: Decompose an input vector into it's principal components
+---
+
+# pgml.decompose()
+
+
+Chunks are pieces of documents split using some specified splitter. This is typically done before embedding.
+
+## API
+
+```sql
+pgml.decompose(
+    project_name TEXT, -- project name
+    vector REAL[]      -- features to decompose
+)
+```
+
+### Parameters
+
+| Parameter      | Example                         | Description                                              |
+|----------------|---------------------------------|----------------------------------------------------------|
+| `project_name` | `'My First PostgresML Project'` | The project name used to train models in `pgml.train()`. |
+| `vector`       | `ARRAY[0.1, 0.45, 1.0]`         | The feature vector that needs decomposition.             |
+
+## Example
+
+```sql
+SELECT pgml.decompose('My PCA', ARRAY[0.1, 2.0, 5.0]);
+```
+
+!!! example
+
+```sql
+SELECT *,
+    pgml.decompose(
+        'Buy it Again',
+        ARRAY[
+            user.location_id,
+            NOW() - user.created_at,
+            user.total_purchases_in_dollars
+        ]
+    ) AS buying_score
+FROM users
+WHERE tenant_id = 5
+ORDER BY buying_score
+LIMIT 25;
+```
+
+!!!
@@ -16,8 +16,8 @@ SELECT image FROM pgml.digits;
 -- view the dataset
 SELECT left(image::text, 40) || ',...}' FROM pgml.digit_vectors LIMIT 10;
 
--- train a simple model to classify the data
-SELECT * FROM pgml.train('Handwritten Digit Clusters', 'cluster', 'pgml.digit_vectors', hyperparams => '{"n_clusters": 10}');
+-- train a simple model to cluster the data
+SELECT * FROM pgml.train('Handwritten Digit Clusters', 'clustering', 'pgml.digit_vectors', hyperparams => '{"n_clusters": 10}');
 
 -- check out the predictions
 SELECT target, pgml.predict('Handwritten Digit Clusters', image) AS prediction
@@ -27,7 +27,7 @@ LIMIT 10;
 
 ## Algorithms
 
-All clustering algorithms implemented by PostgresML are online versions. You may use the [pgml.predict](../../../api/sql-extension/pgml.predict/ "mention")function to cluster novel datapoints after the clustering model has been trained.
+All clustering algorithms implemented by PostgresML are online versions. You may use the [pgml.predict](../../../api/sql-extension/pgml.predict/ "mention")function to cluster novel data points after the clustering model has been trained.
 
 | Algorithm              | Reference                                                                                                         |
 | ---------------------- | ----------------------------------------------------------------------------------------------------------------- |
 
@@ -0,0 +1,42 @@
+# Decomposition
+
+Models can be trained using `pgml.train` on unlabeled data to identify important features within the data. To decompose a dataset into it's principal components, we can use the table or a view. Since decomposition is an unsupervised algorithm, we don't need a column that represents a label as one of the inputs to `pgml.train`.
+
+## Example
+
+This example trains models on the sklearn digits dataset -- which is a copy of the test set of the [UCI ML hand-written digits datasets](https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits). This demonstrates using a table with a single array feature column for principal component analysis. You could do something similar with a vector column.
+
+```sql
+SELECT pgml.load_dataset('digits');
+
+-- create an unlabeled table of the images for unsupervised learning
+CREATE VIEW pgml.digit_vectors AS
+SELECT image FROM pgml.digits;
+
+-- view the dataset
+SELECT left(image::text, 40) || ',...}' FROM pgml.digit_vectors LIMIT 10;
+
+-- train a simple model to cluster the data
+SELECT * FROM pgml.train('Handwritten Digit Components', 'decomposition', 'pgml.digit_vectors', hyperparams => '{"n_components": 3}');
+
+-- check out the compenents
+SELECT target, pgml.decompose('Handwritten Digit Components', image) AS pca
+FROM pgml.digits
+LIMIT 10;
+```
+
+Note that the input vectors have been reduced from 64 dimensions to 3, which explain nearly half of the variance across all samples.
+
+## Algorithms
+
+All decomposition algorithms implemented by PostgresML are online versions. You may use the [pgml.decompose](../../../api/sql-extension/pgml.decompose "mention") function to decompose novel data points after the model has been trained.
+
+| Algorithm                 | Reference                                                                                                           |
+|---------------------------|---------------------------------------------------------------------------------------------------------------------|
+| `pca` | [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) |
+
+### Examples
+
+```sql
+SELECT * FROM pgml.train('Handwritten Digit Clusters', algorithm => 'pca', hyperparams => '{"n_components": 10}');
+```
@@ -55,10 +55,11 @@ impl Project {
         match self.task.as_ref().unwrap().as_str() {
             "classification" | "text_classification" | "question_answering" => Ok("f1"),
             "regression" => Ok("r2"),
+            "clustering" => Ok("silhouette"),
+            "decomposition" => Ok("cumulative_explained_variance"),
             "summarization" => Ok("rouge_ngram_f1"),
             "translation" => Ok("bleu"),
             "text_generation" | "text2text" => Ok("perplexity"),
-            "cluster" => Ok("silhouette"),
             task => Err(anyhow::anyhow!("Unhandled task: {}", task)),
         }
     }
@@ -67,10 +68,11 @@ impl Project {
         match self.task.as_ref().unwrap().as_str() {
             "classification" | "text_classification" | "question_answering" => Ok("F<sup>1</sup>"),
             "regression" => Ok("R<sup>2</sup>"),
+            "clustering" => Ok("silhouette"),
+            "decomposition" => Ok("Cumulative Explained Variance"),
             "summarization" => Ok("Rouge Ngram F<sup>1</sup>"),
             "translation" => Ok("Bleu"),
             "text_generation" | "text2text" => Ok("Perplexity"),
-            "cluster" => Ok("silhouette"),
             task => Err(anyhow::anyhow!("Unhandled task: {}", task)),
         }
     }
 
@@ -1,6 +1,6 @@
 [package]
 name = "pgml"
-version = "2.8.3"
+version = "2.8.4"
 edition = "2021"
 
 [lib]
 
@@ -20,7 +20,7 @@ SELECT image FROM pgml.digits;
 SELECT left(image::text, 40) || ',...}' FROM pgml.digit_vectors LIMIT 10;
 
 -- train a simple model to classify the data
-SELECT * FROM pgml.train('Handwritten Digit Clusters', 'cluster', 'pgml.digit_vectors', hyperparams => '{"n_clusters": 10}');
+SELECT * FROM pgml.train('Handwritten Digit Clusters', 'clustering', 'pgml.digit_vectors', hyperparams => '{"n_clusters": 10}');
 
 -- check out the predictions
 SELECT target, pgml.predict('Handwritten Digit Clusters', image) AS prediction
 
@@ -0,0 +1,60 @@
+-- This example reduces the dimensionality of images in the sklean digits dataset
+-- which is a copy of the test set of the UCI ML hand-written digits datasets
+-- https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits
+--
+-- This demonstrates using a table with a single array feature column
+-- for decomposition to reduce dimensionality.
+--
+-- Exit on error (psql)
+-- \set ON_ERROR_STOP true
+\timing on
+
+SELECT pgml.load_dataset('digits');
+
+-- view the dataset
+SELECT left(image::text, 40) || ',...}', target FROM pgml.digits LIMIT 10;
+
+-- create a view of just the vectors for decomposition, without any labels
+CREATE VIEW digit_vectors AS
+SELECT image FROM pgml.digits;
+
+SELECT * FROM pgml.train('Handwritten Digits Reduction', 'decomposition', 'digit_vectors');
+
+-- check out the decomposed vectors
+SELECT target, pgml.decompose('Handwritten Digits Reduction', image) AS pca
+FROM pgml.digits
+LIMIT 10;
+
+--
+-- After a project has been trained, omitted parameters will be reused from previous training runs
+-- In these examples we'll reuse the training data snapshots from the initial call.
+--
+
+-- We can reduce the image vectors from 64 dimensions to 3 components
+SELECT * FROM pgml.train('Handwritten Digits Reduction', hyperparams => '{"n_components": 3}');
+
+-- check out the reduced vectors
+SELECT target, pgml.decompose('Handwritten Digits Reduction', image) AS pca
+FROM pgml.digits
+LIMIT 10;
+
+-- check out all that hard work
+SELECT trained_models.* FROM pgml.trained_models
+                                 JOIN pgml.models on models.id = trained_models.id
+ORDER BY models.metrics->>'cumulative_explained_variance' DESC LIMIT 5;
+
+-- deploy the PCA model for prediction use
+SELECT * FROM pgml.deploy('Handwritten Digits Reduction', 'most_recent', 'pca');
+-- check out that throughput
+SELECT * FROM pgml.deployed_models ORDER BY deployed_at DESC LIMIT 5;
+
+-- deploy the "best" model for prediction use
+SELECT * FROM pgml.deploy('Handwritten Digits Reduction', 'best_score');
+SELECT * FROM pgml.deploy('Handwritten Digits Reduction', 'most_recent');
+SELECT * FROM pgml.deploy('Handwritten Digits Reduction', 'rollback');
+SELECT * FROM pgml.deploy('Handwritten Digits Reduction', 'best_score', 'pca');
+
+-- check out the improved predictions
+SELECT target, pgml.predict('Handwritten Digits Reduction', image) AS prediction
+FROM pgml.digits
+LIMIT 10;
@@ -5,9 +5,8 @@
 -- This demonstrates using a table with a single array feature column
 -- for classification.
 --
--- The final result after a few seconds of training is not terrible. Maybe not perfect
--- enough for mission critical applications, but it's telling how quickly "off the shelf" 
--- solutions can solve problems these days.
+-- Some algorithms converge on this trivial dataset in under a second, demonstrating the
+-- speed with which modern machines can "learn" from example data.
 
 -- Exit on error (psql)
 -- \set ON_ERROR_STOP true