Retrieving "Returns" from the archives

Cross-reference notes under review

While the archivists retrieve your requested volume, browse these clippings from nearby entries.

Debt Obligations

Linked via "returns"

The process of bundling numerous individual debt obligations (e.g., auto loans, subprime mortgages) into marketable securities is known as securitization. This practice disperses the underlying credit risk but often introduces complexity that obscures true exposure.
Securitization relies on tranching, where the pooled assets are carved into segments—tranches—with differing [priori…
Proximal Policy Optimization

Linked via "returns"

Advantage Estimation and Value Function Learning
Like many policy gradient methods, PPO requires an estimate of the advantage function $\hat{A}t$. This is typically accomplished by combining the value function estimate $V(st; \phi)$ with the empirical returns $R_t$.
The generalized advantage estimator (GAE)/) is commonly employed:
Proximal Policy Optimization

Linked via "returns"

Where $\deltat = rt + \gamma V(s{t+1}; \phi) - V(st; \phi)$ is the temporal difference (TD) error, $\gamma$ is the discount factor, and $\lambda$ controls the bias-variance trade-off in the advantage estimate.
The value function$V(st; \phi)$ is learned concurrently with the policy by minimizing a squared error loss against the returns collected from the trajectories:
$$…

Consulting the archives... Dusting off the volumes... Cross-referencing the indices... Deciphering the manuscripts... Examining the folios... Perusing the card catalog... Searching the stacks... Investigating the compendium... Reviewing the scrolls... Sifting through the records...

Retrieving "Returns" from the archives

Cross-reference notes under review

Debt Obligations

Proximal Policy Optimization

Proximal Policy Optimization