Enhance FAQ section with detailed explanations on conditioning in Tur…

…ing.jl models and parallelism usage
TuringLang · AoifeHughes · Jun 3, 2025 · Jun 17, 2025 · Jun 17, 2025 · Jun 17, 2025
commit 6ec9c7e99093eda1aa07eb87c0925c9689ac0771
diff --git a/faq/index.qmd b/faq/index.qmd
@@ -14,52 +14,118 @@ x ~ filldist(Normal(), 2)
 
 You cannot directly condition on `x[2]` using `condition(model, @varname(x[2]) => 1.0)` because `x[2]` never appears on the LHS of a `~` statement. Only `x` as a whole appears there.
 
+However, there is an important exception: when you use the broadcasting operator `.~` with a univariate distribution, each element is treated as being separately drawn from that distribution, allowing you to condition on individual elements:
+
+```julia
+@model function f1()
+    x = Vector{Float64}(undef, 3)
+    x .~ Normal()  # Each element is a separate draw
+end
+
+m1 = f1() | (@varname(x[1]) => 1.0)
+sample(m1, NUTS(), 100) # This works!
+```
+
+In contrast, you cannot condition on parts of a multivariate distribution because it represents a single distribution over the entire vector:
+
+```julia
+@model function f2()
+    x = Vector{Float64}(undef, 3)
+    x ~ MvNormal(zeros(3), I)  # Single multivariate distribution
+end
+
+m2 = f2() | (@varname(x[1]) => 1.0)
+sample(m2, NUTS(), 100) # This doesn't work!
+```
+
+The key insight is that `filldist` creates a single distribution (not N independent distributions), which is why you cannot condition on individual elements. The distinction is not just about what appears on the LHS of `~`, but whether you're dealing with separate distributions (`.~` with univariate) or a single distribution over multiple values (`~` with multivariate or `filldist`).
+
 To understand more about how Turing determines whether a variable is treated as random or observed, see:
-- [Compiler Design Overview](../developers/compiler/design-overview/) - explains the heuristics Turing uses
 - [Core Functionality](../core-functionality/) - basic explanation of the `~` notation and conditioning
 
-## How do I implement a sampler for a Turing.jl model?
-
-We have comprehensive guides on implementing custom samplers:
-- [Implementing Samplers Tutorial](../developers/inference/implementing-samplers/) - step-by-step guide on implementing samplers in the AbstractMCMC framework
-- [AbstractMCMC-Turing Interface](../developers/inference/abstractmcmc-turing/) - how to integrate your sampler with Turing
-- [AbstractMCMC Interface](../developers/inference/abstractmcmc-interface/) - the underlying interface documentation
 
 ## Can I use parallelism / threads in my model?
 
-Yes! Turing.jl supports both multithreaded and distributed sampling. See the [Core Functionality guide](../core-functionality/#sampling-multiple-chains) for detailed examples showing:
-- Multithreaded sampling using `MCMCThreads()`
-- Distributed sampling using `MCMCDistributed()`
+Yes, but with important caveats! There are two types of parallelism to consider:
+
+### 1. Parallel Sampling (Multiple Chains)
+Turing.jl fully supports sampling multiple chains in parallel:
+- **Multithreaded sampling**: Use `MCMCThreads()` to run one chain per thread
+- **Distributed sampling**: Use `MCMCDistributed()` for distributed computing
+
+See the [Core Functionality guide](../core-functionality/#sampling-multiple-chains) for examples.
-See the [Core Functionality guide](../core-functionality/#sampling-multiple-chains) for examples.
+See the [Core Functionality guide]({{< meta core-functionality >}}/#sampling-multiple-chains) for examples.
-See the [Core Functionality guide](../core-functionality/#sampling-multiple-chains) for examples.
+See the [Core Functionality guide]({{< meta core-functionality >}}/#sampling-multiple-chains) for examples.
+
+### 2. Threading Within Models
+Using threads inside your model (e.g., `Threads.@threads`) requires more care:
+
+```julia
+@model function f(x)
+    Threads.@threads for i in eachindex(x)
+        x[i] ~ Normal()  # UNSAFE: Assume statements in threads can crash!
+    end
+end
+```
+
+**Important limitations:**
+- **Observe statements**: Generally safe to use in threaded loops
+- **Assume statements** (sampling statements): Often crash unpredictably or produce incorrect results
+- **AD backend compatibility**: Many AD backends don't support threading. Check the [multithreaded column in ADTests](https://turinglang.org/ADTests/) for compatibility
+
+For safe parallelism within models, consider vectorized operations instead of explicit threading.
 
 ## How do I check the type stability of my Turing model?
 
 Type stability is crucial for performance. Check out:
-- [Performance Tips](../usage/performance-tips/) - includes specific advice on type stability
-- [Automatic Differentiation](../usage/automatic-differentiation/) - contains benchmarking utilities using `DynamicPPL.TestUtils.AD`
+- [Performance Tips]({{< meta usage-performance-tips >}}) - includes specific advice on type stability
+- Use `DynamicPPL.DebugUtils.model_warntype` to check type stability of your model
 
 ## How do I debug my Turing model?
 
 For debugging both statistical and syntactical issues:
-- [Troubleshooting Guide](../usage/troubleshooting/) - common errors and their solutions
+- [Troubleshooting Guide]({{< meta usage-troubleshooting >}}) - common errors and their solutions
 - For more advanced debugging, DynamicPPL provides `DynamicPPL.DebugUtils` for inspecting model internals
 
-## What are the main differences between Turing, BUGS, and Stan syntax?
+## What are the main differences between Turing and Stan syntax?
+
+Key syntactic differences include:
+
+- **Parameter blocks**: Stan requires explicit `data`, `parameters`, `transformed parameters`, and `model` blocks. In Turing, everything is defined within the `@model` macro
+- **Variable declarations**: Stan requires upfront type declarations in parameter blocks. Turing infers types from the sampling statements
+- **Transformed data**: Stan has a `transformed data` block for preprocessing. In Turing, data transformations should be done before defining the model
+- **Generated quantities**: Stan has a `generated quantities` block. In Turing, use the approach described in [Tracking Extra Quantities]({{< meta usage-tracking-extra-quantities >}})
+
+Example comparison:
+```stan
+// Stan
+data {
+  int<lower=0> N;
+  vector[N] y;
+}
+parameters {
+  real mu;
+  real<lower=0> sigma;
+}
+model {
+  y ~ normal(mu, sigma);
+}
+```
 
-While there are many syntactic differences, key advantages of Turing include:
-- **Julia ecosystem**: Full access to Julia's profiling and debugging tools
-- **Parallel computing**: Much easier to use distributed and parallel computing inside models
-- **Flexibility**: Can use arbitrary Julia code within models
-- **Extensibility**: Easy to implement custom distributions and samplers
+```julia
+# Turing
+@model function my_model(y)
+    mu ~ Normal(0, 1)
+    sigma ~ truncated(Normal(0, 1), 0, Inf)
+    y ~ Normal(mu, sigma)
+end
+```
 
 ## Which automatic differentiation backend should I use?
 
 The choice of AD backend can significantly impact performance. See:
-- [Automatic Differentiation Guide](../usage/automatic-differentiation/) - comprehensive comparison of ForwardDiff, Mooncake, ReverseDiff, and other backends
-- [Performance Tips](../usage/performance-tips/#choose-your-ad-backend) - quick guide on choosing backends
+- [Automatic Differentiation Guide]({{< meta usage-automatic-differentiation >}}) - comprehensive comparison of ForwardDiff, Mooncake, ReverseDiff, and other backends
+- [Performance Tips]({{< meta usage-performance-tips >}}#choose-your-ad-backend) - quick guide on choosing backends
 - [AD Backend Benchmarks](https://turinglang.org/ADTests/) - performance comparisons across various models
 
-For more specific recommendations, check out the [DifferentiationInterface.jl tutorial](https://juliadiff.org/DifferentiationInterface.jl/DifferentiationInterfaceTest/stable/tutorial/).
-
 ## I changed one line of my model and now it's so much slower; why?
 
 Small changes can have big performance impacts. Common culprits include:
@@ -68,4 +134,4 @@ Small changes can have big performance impacts. Common culprits include:
 - Inadvertently causing AD backend incompatibilities
 - Breaking assumptions that allowed compiler optimizations
 
-See our [Performance Tips](../usage/performance-tips/) and [Troubleshooting Guide](../usage/troubleshooting/) for debugging performance regressions.
+See our [Performance Tips]({{< meta usage-performance-tips >}}) and [Troubleshooting Guide]({{< meta usage-troubleshooting >}}) for debugging performance regressions.