Studi Informatika

Panduan Pembelajaran Ilmu Teknologi & Komputer

Wednesday, September 26, 2018


Fundamental Relational Concepts
Every data-modeling technique has its own set of terms, definitions, and techniques. This vernacular permits us to understand complex and difficult concepts and to use them to design complex databases. This chapter applies relational data-modeling techniques for developing the data warehouse data model.
Before we get into the various types of data models we use in creating a data warehouse, it is necessary to first understand why a data model is important and the various types of data models you will create in developing your BI environment.

Why Do You Need a Data Model?
A model is an abstraction or representation of a subject that looks or behaves like all or part of the original. Examples include a concept car and a model of a building. All models have a common set of objectives.
Let’s look more closely at these benefits:
=> A model reduces overall risk by ensuring that the requirements of the final product will be satisfactorily met
=> A model helps the developers envision how the final product will inter- face with other systems or functions. The level of effort needed to create the interfaces and their feasibility can be reasonably estimated if a detailed model is created
=> A model helps all the people involved understand how to relate to the ultimate product and how it will pertain to their work function
=> Finally a model ensures that the people building the product and those requesting it have the same expectations about the ultimate outcome of the effort.

Relational Data-Modeling Objects
Now that we understand the need for a model, let’s turn our attention to a specific type of model—the data model. Before describing the various levels of models, we need to come up with a common set of terms for use in describing these models.
=> Subject
The first term to describe is a subject. You will see us refer to a subject-oriented data warehouse and a subject area model. In both cases, the term subject refers to a data subject or a major category of data relevant to the business. A subject area is the subset of the enterprise’s data and consists of related entities and relationships. Customers, Sales, and Products are examples of subject areas.
=> Entity
An entity is generally defined as a person, place, thing, concept, or event in which the enterprise has both the interest and the capability to capture and store information. For the third nor- mal form data model, there is one and only one entry representing that entity. In entity-relationship diagrams (ERD) or logical data modeling in the classical Code and Date sense, there are four types of entities from which to build logical or business data models and data warehouse models
=> A Primary or Fundamental Entity is defined as an entity that does not depend on any other entity for its existence.
=> A Subtype Entity is a logical division or category of a parent (super type) entity
=> An Attributive or Characteristic Entity is an entity whose existence depends on another entity
=> An Associative or Intersection Entity is an entity that is dependent upon two or more entities for its existence, and that records data at the point of intersection.
With these four types of entities, we have all we will need in terms of components to create the business and data warehouse data models.
=> Element or Attribute
An element or attribute is the lowest level of information relating to any entity. It models a specific piece of information or a property of a specific entity. Elements or attributes serve several purposes within an entity.

=> A primary key serves to uniquely identify the entity and is used in the physical database to locate a record for storage or access. Examples include Customer ID for the Customer entity and Item ID for the Item entity.

=> A foreign key is a key that exists because of a parent-child relationship between a pair of entities. The foreign key in the child entity is the primary key in the parent entity and links the two entities together. For example, the Customer ID of the Customer entity is also found in the Order entity, relating the two
=> A non key element or attribute is not needed to uniquely identify the entity but is used to further describe or characterize information about the entity. Examples of non key elements or attributes are Customer Name, Customer Type, Item Color, and Item Quantity.
=> Relationships
A relationship documents the business rule associating two entities together. The relationship is used to describe how the two entities are naturally linked to each other.
There are different characteristics of relationships used in documenting the business rules of the enterprise:
=> Cardinality denotes the maximum number of occurrences of one entity that can be related to another entity. Usually these are expressed as “one” or “many.” In Figure, a Customer has many addresses (Bill-to, Ship-to) and every address belongs to one customer.
=> Optionality or modality indicates whether an entity occurrence must participate in a relationship. This characteristic tells you the minimum number (zero or optional) of occurrences in the relationship.
There are also different types of relationships:
=> An identifying relationship is one in which the primary key of the parent entity becomes a part of the primary key of the child entity.
=> A non identifying relationship is one in which the primary key of the parent entity becomes a non key attribute of the child entity. An example of this type of relationship is a recursive relationship, that is, a situation in which an entity is related to itself.
The components of a relationship in a data model consist of a verb phrase denoting the business rule (places, has, contains), the cardinality, and the modality or optionality of the relationship.

 Types of Data Models
A data model is an abstraction or representation of the data in a given environment. It is a collection and a subsequent verification and communication method to fully document the data requirements used in the creation of accurate, effective, and efficient physical databases. The data model consists of entities, attributes, and relationships. Within the complete data model, appropriate meta data, such as definitions and physical characteristics, is defined for each of these.
If the data model is so important, why isn’t it always developed? There are a number of reasons for this :
=> It’s not easy
Creating the data model takes significant effort from the IT technical staff and business community. Data modelers must be either hired or internal resources trained in the disciplines of data modeling.
=> It requires discipline and tools
Once the techniques for data modeling are learned, they must be applied with conformity and compliance. The enterprise must create a set of documents detailing the standards it will use in the creation of its data models.
=> It requires significant business involvement
A company’s data model must—repeat—must have business community involvement. We are, after all, designing the critical component of the business community’s ultimate competitive weapon.
=> It postpones the visible work
Data modeling does not create tangible products that can be used by the business community. The models pro- vide the technical staff creating the environment with information about the business environment and some requirements.
=> It requires a broad view
The data model for the BI environment must encompass the entire enterprise. It will be used to create the ultimate decision-making components—the data marts—for all strategic analysis.
=> The benefits of a data model are often not realized with the first project
The real productivity comes in its reuse and its enterprise perspective.

Having said all this, what is the impact of not developing a data model?
=> It becomes very difficult to extract desired data. It is easy to implement something that either misses the users’ expectations or only partially satisfies them.
=> Significant effort is spent on interfaces that generally provide little or no business value.
=> The environment’s complexity increases significantly. When there is no data model to serve as a roadmap, it becomes difficult, if not impossible, to know what you already have in your data warehouse and what needs to be added.
=> It virtually guarantees lack of data integration because you cannot visualize how things fit together. Data warehouse development will not be effective and efficient, and may not even be feasible.
=>  One of the most significant drawbacks is that, without a data model, data will not be effectively managed as an asset.

Subject Area Model  
Subject areas are major groupings of things1 of interest to the enterprise. These things of interest are eventually depicted in entities. The typical enterprise has between 15 and 20 subject areas. One of the beauties of a subject area model is that it can be developed very quickly (typically within a few days). The initial model serves as a blueprint for the business data model, and refinements in the subject area model should be expected. One of the reasons that the subject area model can be developed quickly is that there are some subjects that are common to many organizations, and a company embarking on the development of a subject area model can begin with these.

These subject areas conform to standards governing the subject area model:
=> Subject area names are plural nouns.
=> Definitions apply implicitly to the past, present, and future.
=> Subject areas are at approximately the same level of abstraction.
=> Definitions are structured so that the subject areas are mutually exclusive.
Subject Area Model Benefits
Regardless of how quickly the subject area model can be developed, the effort should only be undertaken if there are benefits to be gained. Following are some of the major benefits provided by the subject area model.
=> Guide the Business Data Model Development
=> Guide Data Warehouse Project Selection
=> Guide Data Warehouse Development Projects

Business Data Model  
The business data model is another type of model. It is an abstraction or representation of the data in a given business environment, and it provides the benefits cited for any model. It helps people envision how the information in the business relates to other information in the business (“how the parts fit together”).
 The business data model reduces the development risk by ensuring that all the systems implemented correctly reflect the business environment. Finally, when it is used to guide development efforts, it provides a basis to confirm the developers’ interpretation of the business information relationships to ensure that the key stake- holders share a common set of expectations.

Business Data Model Benefits
The business data model provides a consistent and stable view of the business information and business information relationships. It can be used as a basis for recognizing, evaluating, and responding to business changes. Specific benefits of the data model for data warehousing efforts follow.
=> Scope Definition
Every project should include a scope definition as one of its first steps, and data warehouse projects are no exception. If a business data model already exists, it can be used to convey the information that will be addressed by the resultant data warehouse.
=> Integration Foundation
In designing any enterprise’s data model, the designer will immediately run into situations where homonyms (entities or attributes that have the same name but mean very different things) and synonyms (entities or attributes that have different names but mean exactly the same thing) are encountered.
=> Integration Foundation
In designing any enterprise’s data model, the designer will immediately run into situations where homonyms (entities or attributes that have the same name but mean very different things) and synonyms (entities or attributes that have different names but mean exactly the same thing) are encountered.
=> Multiple Project Coordination
A data warehouse program consists of multiple data warehouse implementation projects, and sometimes several of these are managed simultaneously. The business data model is then used to establish where the projects overlap to fine-tune what data each project will use. Where the same entity is used by more than one project, its design, definition, and implementation should be assigned to only one team.
=> Dependency Identification
The data model helps to identify dependencies between various entities and attributes. In this fashion, it can be used to help assess the impact of change. When you change or create a process, you must be able to answer the question of whether it will have any impact on sets of data used by other processes.
=> Redundancy Management
The business data model strives to remove all redundancies. Entities, attributes, and relationships appear only once in this model unless they are used as foreign keys into other entities. By creating this model, you can immediately see overlaps and conflicts that must be resolved, as well as redundancies that must be removed, before going forward. The normalization rules specified in the “Relational Modeling Guidelines” section are designed to ensure a non- redundant data model.
=> Change Management
Data models also serve as your best way to document changes to entities, attributes, and relationships. As systems are created, we may discover new business rules in effect and the need for additional entities and attributes.

System Model  
A system model is a collection of the information being addressed by a specific system or function such as a billing system, data warehouse, or data mart. The system model is an electronic representation of the information needed by that system. It is independent of any specific technology or DBMS environment. For example, the billing system and data warehouse system models will most likely not have every scrap of data of interest to the enterprise found in them. Because the system model is developed from the business data model, it must, by default, be consistent with that model.
It is also important to note that there will be more than one system model. Each system or database that we construct will have its own unique system model denoting the specific data requirements for that system or the function it sup- ports. Alternatively, there typically is only one system model per system.

Technology Model  
The last model to be developed is a technology model. This model is a collection of the specific information being addressed by a particular system and implemented on a specific platform.
=> Hardware
Your choice of platform means that you must consider the sizes of the individual data files according to your platform technology and notate these specifications in the technology model.
=> Database management system (DBMS)
The DBMS chosen for your data warehouse will have a great impact upon the ultimate design of your database. You must make the following determinations :
You must make the following determinations :
=> Amount of de-normalization
=> Materialized views
=> Partitioning strategy
=> Indexing strategy
=> Referential integrity
=> Data delivery technology
=> Security
The technology model must be consistent with the governing system model. That is, it inherits its basic requirements from its system model. Likewise, any changes in the fundamental entities, attributes, and relationships discovered as the technology model is implemented must be reflected back up the chain of models

Relational Data-Modeling Guidelines
Data modeling is a very abstract process, and not all IT professionals have the qualifications to create a solid model. Data modelers require the ability to conceptualize intangible notions about what the business requires to perform its business and what its rules are in doing business. Also, data modeling is non- deterministic there is one right way to create a data model. There are many wrong ways. A common concern in data modeling is the amount of change that occurs.
Data modelers must not see this aspect as a threat but rather be prepared for change and embrace it as a good sign— a sign that the model is, in fact, more insightful and that it more closely resembles the enterprise as a whole.
It is the goal of this book to ensure that you have the strong foundation and footing you need to deal with these issues before you begin your data ware- house design. Let’s start with a set of guidelines garnered from the many years of data modeling we have performed.

Guidelines and Best Practices  
The goal of any data model is to completely and accurately reflect the data requirements and business rules for handling that data so that the business can perform its functions effectively. To that end, we believe that there are three guidelines that should be followed when designing your data models :
=> Communication tool
The data models should be used as a communication tool between the business community and the IT staff and within the IT staff.
=> Level of granularity
The data models should reflect the “lowest common denominator” of information that the enterprise uses. Aggregated, derived, or summarized data elements should be decomposed to their basic parts, and unnecessary redundancy or duplication of data elements should be removed.
=> Business orientation
It is paramount that the models represent the enterprise’s view of itself without physical constraints.

These guidelines should always be at the forefront of the modeler’s mind when he or she commences the modeling process. With these in mind, let’s look at some of the best practices in data modeling :
=> Business users involvement
It must be understood up front that the business community must set aside time and resources to help create the various data models; data modeling is not just a technical exercise for IT people.
=> Interviews and facilitated sessions
One of the most common ways to get a lot of information in a short amount of time is to perform interviews and use facilitated sessions.
=> Validation
The proposed data model is then verified by either immediate feedback from the interviews or facilitated sessions, or by formal through.
=> Data model maintenance
Because change becomes a common feature in any modeling effort, you should be prepared to handle these occurrences.
=> Know when “enough is enough.”
Perhaps the most important practice any data modeler should learn is when to say the model is good enough.

Normalization is a method for ensuring that the data model meets the objectives of accuracy, consistency, simplicity, non redundancy, and stability. Fundamentally this means that normalization is a way of ensuring that the attributes are in the proper entity and that the design is efficient and effective for a relational DBMS. We will walk through the steps to get to this data model design in the next sections of this chapter. Normalization has these characteristics as well :
=> Verification of the structural correctness and consistency of the data model
=> Independence from any physical constraints
=> Minimization of storage space requirement by eliminating the storage of data in multiple places
Finally, normalization
=> Removes data inconsistencies since data is stored only once, thus eliminating the possibility of conflicting data
=> Diminishes insertion, updating, and deletion anomalies because data is stored only once
=> Increases the data structure stability because attributes are positioned in entities based on their intrinsic properties rather than on specific application requirements

Normalization of the Relational Data Model
Normalization is very useful for the business data model because :
=> It does not instruct any physical processing direction, thus making the business model a good starting place for all applications and databases.
=> It reduces aggregated, summarized, or derived elements to their basic components, ensuring that no hidden processes are contained in the data model.
=> It prevents all duplicated or redundant occurrences of attributes and entities.

First Normal Form
First normal form (1NF) takes the data model to the first step described in our mantra—the attribute is dependent on the key. This requires two conditions— that every entity have a primary key that uniquely identifies it and that the entity contain no repeating or multivalued groups. Each attribute should be at its lowest level of detail and have a unique meaning and name. 1NF is the basis for all other normalization techniques.

In Figure, we see that the Course entity contains the attributes that deal with a specific offering of the course rather than the generic course itself (Course Offering, Period, Professor Identifier, and Professor Name). These attributes are not dependent on the Course entity key for their existence, and therefore should be put into their own entity (Course Offering).

Second Normal Form
Second normal form (2NF) takes the model to the next level of refinement according to our mantra—the attributes must be dependent on the whole key. To attain 2NF, the entity must be in 1NF and every non primary attribute must be dependent on the entire primary key for its existence.

2NF further reduces possible redundancy in the data model by removing attributes that are dependent on part of the key and placing them in their own entity. If this remains in the model, then Discipline Identifier and Name must be repeated for every course. By placing these in their own entity, they are stored only once. Figure 2.7 shows the conversion of our model to 2NF.

Third Normal Form
Third normal form (3NF) takes the data model to the last level of improvement referred to in our mantra—the attribute must be dependent on nothing but the key. To attain 3NF, the entity must be in 2NF, and the non key fields must be dependent on only the primary key, and not on any other attribute in the entity, for their existence.

In Figure, notice that Course Offering Professor and Course Offering Professor Name are recurring attributes. Neither the Professor Name or the Professor Identifier depend on the Course Offering.

Other Normalization Levels
We usually stop with 3NF when we design the business model for organizations. However, you should be aware of other levels of normalization and their benefits to determine if 3NF will be sufficient for your organization. There are several good data-modeling books that discuss the merits of Boyce/ Codd and fourth and fifth normal forms. We will not discuss these further in this book.

Wednesday, September 26, 2018 September 26, 2018 - tanpa komentar

No comments:

Post a Comment