{ "cells": [ { "cell_type": "markdown", "id": "a1b2c3d4e5f60001", "metadata": {}, "source": [ "# Model-free estimation of mean age" ] }, { "cell_type": "markdown", "id": "a1b2c3d4e5f60002", "metadata": {}, "source": [ "The mean age of a molecular population equals the area under the **labeling survival curve** `f(t)` — the fraction of unlabeled molecules remaining at time `t`:\n", "\n", "$$\\bar{\\mathcal{A}} = \\int_0^\\infty f(t)\\, dt$$\n", "\n", "`estimate_mean_age_trapezoid()` approximates this integral directly from measured data points, without fitting a parametric compartmental model. This makes it a fast, model-free baseline that can be applied to any labeling experiment.\n", "\n", "The function supports two interpolation strategies (controlled by `semilogy`):\n", "- **Linear** (`semilogy=False`): linear interpolation between data points — appropriate when the curve is nearly linear between measurements.\n", "- **Log-linear** (`semilogy=True`, the default): exponential interpolation — more accurate for the typical decay shape of labeling curves.\n", "\n", "An optional **extrapolation** step (`extrapolate=True`, the default) extends the last measured interval to `f = 0`, capturing the tail area that would otherwise be missed." ] }, { "cell_type": "code", "execution_count": null, "id": "a1b2c3d4e5f60003", "metadata": {}, "outputs": [], "source": [ "from symbolic_compartmental_model import SymbolicCompartmentalModel\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from pathlib import Path" ] }, { "cell_type": "markdown", "id": "a1b2c3d4e5f60004", "metadata": {}, "source": [ "## Generating synthetic labeling data\n", "\n", "We start by constructing a two-state CM with known parameters so we can compare the trapezoid estimate against the exact analytical mean age." ] }, { "cell_type": "code", "execution_count": null, "id": "a1b2c3d4e5f60005", "metadata": {}, "outputs": [], "source": "cm_true = SymbolicCompartmentalModel(n_states=2)\ncm_true.contributed_turnovers = [[-0.2, 0.0], [0.0, -2.0]]\ncm_true.observed_pool_weights = [0.6, 0.4]\n\ntrue_mean_age = cm_true.mean_age()\nprint(f\"True mean age: {true_mean_age:.4f}\")\n\nrng = np.random.default_rng(seed=42)\ntdata = np.array([0.0, 0.3, 1.0, 3.0, 7.0])\nydata_clean = cm_true.f()(tdata)\nnoise = rng.normal(0, 0.02, size=tdata.shape)\nydata = np.clip(ydata_clean + noise, 1e-6, 1.0)\n\nfig, ax = plt.subplots(1, 1, figsize=(5, 3), dpi=150)\nt_fine = np.linspace(0, 10, 300)\nax.plot(t_fine, cm_true.f()(t_fine), color=\"steelblue\", label=\"true f(t)\")\nax.scatter(tdata, ydata, color=\"black\", zorder=5, label=\"noisy observations\")\nax.set_xlabel(\"time, t\")\nax.set_ylabel(\"unlabeled fraction, f(t)\")\nax.legend()\nax.set_title(f\"True mean age = {true_mean_age:.3f}\")\nfig.tight_layout()\nif Path(\"../results\").exists():\n fig.savefig(\"../results/example_trapezoid_data.svg\")\ndisplay(fig)\nplt.close(fig)" }, { "cell_type": "markdown", "id": "a1b2c3d4e5f60006", "metadata": {}, "source": [ "## Applying the trapezoid estimator\n", "\n", "Passing an `ax` argument to `estimate_mean_age_trapezoid()` produces a diagnostic plot: it draws each interpolation segment as a filled region, making it easy to see exactly how the area is being approximated. The shaded area is the estimated mean age." ] }, { "cell_type": "code", "execution_count": null, "id": "a1b2c3d4e5f60007", "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(1, 1, figsize=(6, 3.5), dpi=150)\n", "\n", "estimated_mean_age = SymbolicCompartmentalModel.estimate_mean_age_trapezoid(\n", " tdata, ydata, semilogy=True, extrapolate=True, ax=ax\n", ")\n", "\n", "ax.set_xlabel(\"time, t\")\n", "ax.set_ylabel(\"unlabeled fraction, f(t)\")\n", "ax.set_title(\n", " f\"Trapezoid estimate = {estimated_mean_age:.3f} | True mean age = {true_mean_age:.3f}\"\n", ")\n", "fig.tight_layout()\n", "if Path(\"../results\").exists():\n", " fig.savefig(\"../results/example_trapezoid_estimate.svg\")\n", "display(fig)\n", "plt.close(fig)\n", "\n", "print(f\"Trapezoid estimate : {estimated_mean_age:.4f}\")\n", "print(f\"True mean age : {true_mean_age:.4f}\")\n", "print(f\"Relative error : {abs(estimated_mean_age - true_mean_age) / true_mean_age * 100:.1f} %\")" ] }, { "cell_type": "markdown", "id": "a1b2c3d4e5f60008", "metadata": {}, "source": [ "The orange shaded regions show the individual trapezoids, with the final segment extrapolated to `f = 0` using an exponential tail. The estimate is close to the analytical mean age despite using only 7 noisy data points." ] }, { "cell_type": "markdown", "id": "a1b2c3d4e5f60009", "metadata": {}, "source": [ "## Effect of the `extrapolate` and `semilogy` flags\n", "\n", "The two optional flags control accuracy:\n", "\n", "- **`extrapolate=False`**: the integral is truncated at the last observed time point, which systematically underestimates the mean age whenever the curve has not yet reached zero.\n", "- **`semilogy=False`**: linear rather than exponential interpolation between consecutive points — less appropriate for the curved shape of typical labeling data but included for comparison.\n", "\n", "Here we compare all four combinations on the same dataset." ] }, { "cell_type": "code", "execution_count": null, "id": "a1b2c3d4e5f6000a", "metadata": {}, "outputs": [], "source": [ "fig, axs = plt.subplots(2, 2, figsize=(10, 6), dpi=120)\n", "\n", "configs = [\n", " (True, True, axs[0, 0]),\n", " (True, False, axs[0, 1]),\n", " (False, True, axs[1, 0]),\n", " (False, False, axs[1, 1]),\n", "]\n", "\n", "for semilogy, extrapolate, ax in configs:\n", " est = SymbolicCompartmentalModel.estimate_mean_age_trapezoid(\n", " tdata, ydata, semilogy=semilogy, extrapolate=extrapolate, ax=ax\n", " )\n", " mode = (\"log\" if semilogy else \"linear\") + \", \" + (\"extrap.\" if extrapolate else \"no extrap.\")\n", " error_pct = (est - true_mean_age) / true_mean_age * 100\n", " ax.set_xlabel(\"time, t\")\n", " ax.set_ylabel(\"f(t)\")\n", " ax.set_title(f\"{mode}\\nestimate = {est:.3f} (error {error_pct:+.1f} %)\")\n", "\n", "fig.suptitle(f\"True mean age = {true_mean_age:.3f}\", fontsize=11)\n", "fig.tight_layout()\n", "if Path(\"../results\").exists():\n", " fig.savefig(\"../results/example_trapezoid_flags.svg\")\n", "display(fig)\n", "plt.close(fig)" ] }, { "cell_type": "markdown", "id": "a1b2c3d4e5f6000b", "metadata": {}, "source": [ "Without extrapolation the estimate is truncated and always underestimates the true mean age. Log-linear interpolation (`semilogy=True`) is more accurate for exponentially decaying data: it accounts for the curvature between measurement points that linear interpolation misses." ] }, { "cell_type": "markdown", "id": "a1b2c3d4e5f6000c", "metadata": {}, "source": "## Comparing the trapezoid estimate with model-based estimates\n\nThe trapezoid method requires no model assumptions — it only needs the raw `(t, f)` data. A model-based mean age (obtained by fitting a CM and calling `mean_age()`) is more accurate when the model is correctly specified, but requires choosing the right model structure. A misspecified model can be worse than the model-free baseline.\n\nBelow we fit both a 1-state and a 2-state CM to the same data and compare all three estimates." }, { "cell_type": "code", "execution_count": null, "id": "a1b2c3d4e5f6000d", "metadata": {}, "outputs": [], "source": "cm1 = SymbolicCompartmentalModel(n_states=1)\nk = cm1.add_parameter(symbol=\"k\", lb=0.01, ub=10.0)\ncm1.contributed_turnovers = [[-k]]\nfit1 = cm1.fit(tdata[1:], ydata[1:])\n\ncm2 = SymbolicCompartmentalModel(n_states=2)\nk1 = cm2.add_parameter(symbol=\"k1\", lb=0.01, ub=5.0)\nk2 = cm2.add_parameter(symbol=\"k2\", lb=0.01, ub=5.0)\nw = cm2.add_parameter(symbol=\"w\", lb=0.0, ub=1.0)\ncm2.contributed_turnovers = [[-k1, 0.0], [0.0, -k2]]\ncm2.observed_pool_weights = [w, 1 - w]\nfit2 = cm2.fit(tdata[1:], ydata[1:])\n\ntrap_est = SymbolicCompartmentalModel.estimate_mean_age_trapezoid(tdata, ydata)\nage_1state = fit1.cm.mean_age()\nage_2state = fit2.cm.mean_age()\n\nprint(f\"True mean age : {true_mean_age:.4f}\")\nprint(f\"1-state model estimate : {age_1state:.4f} (error {(age_1state - true_mean_age)/true_mean_age*100:+.1f} %)\")\nprint(f\"Trapezoid estimate : {trap_est:.4f} (error {(trap_est - true_mean_age)/true_mean_age*100:+.1f} %)\")\nprint(f\"2-state model estimate : {age_2state:.4f} (error {(age_2state - true_mean_age)/true_mean_age*100:+.1f} %)\")\n\nfig, ax = plt.subplots(1, 1, figsize=(6, 3.5), dpi=150)\nt_fine = np.linspace(0, 10, 300)\nax.plot(t_fine, cm_true.f()(t_fine), color=\"steelblue\", lw=2, label=f\"true f(t) (age={true_mean_age:.2f})\")\nfit1.cm.plot(\"f\", ax=ax, t_range=t_fine, color=\"tomato\", ls=\"--\", label=f\"1-state fit (age={age_1state:.2f})\")\nfit2.cm.plot(\"f\", ax=ax, t_range=t_fine, color=\"seagreen\", ls=\"-.\", label=f\"2-state fit (age={age_2state:.2f})\")\nax.scatter(tdata, ydata, color=\"black\", zorder=5, s=20, label=\"data\")\nax.set_xlabel(\"time, t\")\nax.set_ylabel(\"unlabeled fraction, f(t)\")\nax.legend(fontsize=8)\nax.set_title(f\"Trapezoid estimate = {trap_est:.2f}\")\nfig.tight_layout()\nif Path(\"../results\").exists():\n fig.savefig(\"../results/example_trapezoid_comparison.svg\")\ndisplay(fig)\nplt.close(fig)" }, { "cell_type": "markdown", "id": "a1b2c3d4e5f6000e", "metadata": {}, "source": "The 1-state model cannot capture the bi-exponential shape, so it is systematically biased and gives the worst mean-age estimate. The trapezoid method, being model-free, does better despite using only a handful of sparse data points. The 2-state model — correctly specified — gives the most accurate estimate, illustrating that a good parametric model outperforms the non-parametric baseline when the model structure is known." } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }