Moss, Larissa T. “Enterprise Data Modeling: Lost Art or Essential Science?” Business Intelligence Journal 2015. 27-32.
- EDM: not “electronic dance music” but “enterprise data modeling.”
- The E-R approach developed by Peter Chen in the late 1970s was revolutionary because it broke the typical process-oriented approach to information systems and instead replaced it with an entity-relationship model, or logical data model. This was important because it allowed databases to abstract entities from process and recombine them with other entities to draw out important insights from data-made-information (28). [1. A literature on the relationship between single-sourcing and databases exists in Writing Studies. See Johnson-Eilola & Manovich among others.] This also allowed businesses to treat individual, discrete pieces of data/information as assets . . . which demanded someone to manage those assets. Hence, the development of Data Administration (DA), BI, etc.
- Data integration (not consolidation) requires several actions to produce an EDM (enterprise data model). These include: 1) Examining the definition, semantic intent, and content of each entity to find duplicates; 2) Ensuring that each entity has only one unique business identifier; 3) Each attribute or data element belongs to just one owning entity. This ensures that each attribute is only captured once, providing non-redundant data for reliable information; 4) Capture business actions and transactions that connect business to the real world. Capture in a “logical business perspective” not from anecdotal reporting or data access. (28).
- Different models of building and EDM:
- Top-down EDM technique: SMEs identify major business objects, relationships, business rules and the most significant attributes of each object. This approach is sometimes difficult because it is time-consuming to identify all of the different parts of a complex organization.
- Bottom-up EDM technique: Normalize pre-existing data structures into best-guess logical data models. The testing and validation phase here is immense and requires SMEs to participate at length.
- Middle-out EDM techniques: A process whereby both top-down and bottom-up approaches are used iteratively . This is faster . . . but the opportunity for data administrators/information architects to go in a different direction than BI is very real. (29)
- How do we name data? [2. This is important for your project as the method described here may or may not function effectively with big data, distant reading analyses via Python+NLTK.] The author describes the Prime Words/Class Words/Qualifiers approach as “every data element must have one prime word, one or more qualifiers, and en in one class word. Class words are predetermined and documented on a published list (e.g., date, text, name, code, number, etc). (29). All of the data naming and defining process must undergo processes of normalization (see this blog post) to ensure uniqueness.
- The author emphasizes the role of consensus in EDM, noting that “As described in previous sections, one of the DA (data administration) principles applied during EDM is creating precise, consensus-driven definitions, along with consensus-driven names, domains, and business rules” (32).