Saturday, 6 November 2010

Enterprise Modeling - Model Persistence

Quite some discussions regarding the persistence of models took place at the Eclipse Summit Europe 2010. It seemed to me that there are two main camps. One side prefers files while the other looks for a scalable repository. Three questions are related to this:
How are model elements identified?
  • Each model element has a unique identifier
  • Selected model elements have a unique name
How are models edited?
  • With a text editor which means that there are transient states where a model is syntactically invalid   
  • With a specialized model editor which does not allow the user to create  syntactically inconsistent states
  • With a projecting editor combining the two approaches
How are models stored?
  • Textual modeling languages are naturally file based to be able to handle the transient states and will use the element names for identification as required by the implemented language. Other model elements have no identifier.
  • The ‘traditional’ modeling approach is more flexible as the elements have an implicit or explicit id. It and can use both files and repositories for persistence and always has a form of model element identity.
There are some differences which become important to select the appropriate approach when looking at a specific use case.
The first essential difference is the guarantee of traceability over the life cycle of model elements.
  • The identity based approach is more robust – traceability is only lost if an element is deleted and then re-created again (which results in a new element). 
  • The name based approach is more brittle. A name change breaks the traceability (unless there is some magic done behind the scenes)
The level of traceability is important when dealing with model evolution. The conventional approach enables tracking of every change in the model whereas the textual makes only those states visible, when the model representation is syntactically correct.
The next difference is directly influenced by the storage mechanism and becomes relevant for large models.
  • Repository based approaches scale better as model elements can be retrieved using the identifier when required
  • The file based approach requires partitioning of large models. The partitioning may by straightforward in certain use case but it can also be artificial when the model has no natural partitioning. It is also problematic when different views on the same model are possible as they often do not have the same natural partitioning needs.
What does this mean? The selection of the persistency mechanism depends on the actual use case – it will be influenced by the required level of traceability and by the natural structure of the domain being modeled. The user of the modeling environment should have the flexibility to choose one or the other approach or even combine them. The modeling platform must allow the user to make the choice. And as we learned in the key note by Jeff Norris – we should have the freedom to stay uncommitted and make the decision late. Sounds like a standard interface to me where the best suited storage provider can be injected.


  1. (not sure what happened, my first comment disappeared immediately after I posted it)

    I have a comment to make regarding your statement "The name based approach is more brittle". I found the opposite to be true.

    We have a model based application using globally unique IDs to identify model elements. The model itself is split into several different files, with references from one file to elements stored in a different file. The model itself represents an application, just like, say, an eclipse project.
    Now, if the user copies one file into a different project, we have to update all references stored therein to reflect the identifiers so that they point to the new elements in the new project (the one where the file just was copied to). This would not be necessary with a name based approach, you could just copy the file into the new context and it would work as expected.

    While there might be use cases where the expected behavior of copied models would be to still point into the original context, I think of this as an exception. Humans often organize their work in partitions, following that pattern in a model seems just natural. When choosing an identity based approach, keep in mind what use case you have and that this has long term side effects. We failed to put that much thought into it back then, and still weep bitterly because of it.

  2. Kaliph, I think a lot depends how we see the relation between a model, its elements and files.
    I think that the model should be partitioned using constructs on the model level. I see files as a way to persist models not as a feature of a model.
    We have a large model where each model has a unique id - this allows us the replicate and consolidate models (or parts thereof) as required. Files or databases are just a way to persist the information - not more.