Modern AI platforms sound new, but many of their failure modes are familiar. Data has to be partitioned. Metadata has to be consistent enough. Work has to be admitted without stampeding. Storage and compute have to disagree gracefully.
Dynamo-style systems are useful teachers because they make tradeoffs explicit. Availability, latency, consistency, and operability are not slogan choices. They show up in how customers experience the platform during partial failure.
For AI infrastructure teams, the practical question is not whether the system is distributed. It is whether the team has named the failure modes and built the controls to operate through them.