Quite some discussions regarding the persistence of models took place at the Eclipse Summit Europe 2010. It seemed to me that there are two main camps. One side prefers files while the other looks for a scalable repository. Three questions are related to this:
How are model elements identified?
- Each model element has a unique identifier
- Selected model elements have a unique name
How are models edited?
- With a text editor which means that there are transient states where a model is syntactically invalid
- With a specialized model editor which does not allow the user to create syntactically inconsistent states
- With a projecting editor combining the two approaches
How are models stored?
- Textual modeling languages are naturally file based to be able to handle the transient states and will use the element names for identification as required by the implemented language. Other model elements have no identifier.
- The ‘traditional’ modeling approach is more flexible as the elements have an implicit or explicit id. It and can use both files and repositories for persistence and always has a form of model element identity.
There are some differences which become important to select the appropriate approach when looking at a specific use case.
The first essential difference is the guarantee of traceability over the life cycle of model elements.
- The identity based approach is more robust – traceability is only lost if an element is deleted and then re-created again (which results in a new element).
- The name based approach is more brittle. A name change breaks the traceability (unless there is some magic done behind the scenes)
The level of traceability is important when dealing with model evolution. The conventional approach enables tracking of every change in the model whereas the textual makes only those states visible, when the model representation is syntactically correct.
The next difference is directly influenced by the storage mechanism and becomes relevant for large models.
- Repository based approaches scale better as model elements can be retrieved using the identifier when required
- The file based approach requires partitioning of large models. The partitioning may by straightforward in certain use case but it can also be artificial when the model has no natural partitioning. It is also problematic when different views on the same model are possible as they often do not have the same natural partitioning needs.
What does this mean? The selection of the persistency mechanism depends on the actual use case – it will be influenced by the required level of traceability and by the natural structure of the domain being modeled. The user of the modeling environment should have the flexibility to choose one or the other approach or even combine them. The modeling platform must allow the user to make the choice. And as we learned in the key note by Jeff Norris – we should have the freedom to stay uncommitted and make the decision late. Sounds like a standard interface to me where the best suited storage provider can be injected.