The final reason I can think of for surrogate keys is one that I strongly suspect but have never proven. Replacing big, ugly natural keys and composite keys with beautiful, tight integer surrogate keys is bound to improve join performance. The storage requirements are reduced, and the index lookups would seem to be simpler.
The Multidimensional Warehouse is the third datastructure in EPM.
Image: Multidimensional Warehouse (MDW)
![]()
The followinggraphic illustrates the MDW component of the EPM architecture andthe target tables that are present in the MDW.
The MDW stores dimensionalized data that is groupedinto one or more business processes, better known as a dimensional schema, used for business intelligenceand ad hoc reporting. The data is stored in a star schema (a fact table associated with a series ofdimension tables) and generally contains data loaded from the OWS.
The star schema arrangement depends entirely on primary key and foreign key relationships. A primary key is a column (orcolumns) in a dimension table whose values uniquely identify eachrow in the table. Primary keys enforce entity integrity by uniquelyidentifying entity instances. A foreign key is a column or columnsin a fact table whose values match the primary key values of a givendimension table. This way references can be made between a fact anddimension table. Foreign keys enforce referential integrity by completingan association between two entities.
Note: MDW dimensions use a surrogatekey, a unique key generated from production keys by theETL process. The surrogate key is not derived from any data in theEPM database and acts as the primary key in a MDW dimension. See thenext topic for more information on surrogate keys in the MDW.
Image: Dimensional Model Example
The following graphic provides an example of a starschema and its primary and foreign key relationships:
Although data loaded into the MDW is primarily derivedfrom the OWS, there are exceptions to this rule. Profitability andGlobal Consolidations data for the Financial Management Solutions(FMS) Warehouse is loaded into the MDW from the OWE.
External survey data for the HCM Warehouse is loadedinto the MDW from the OWE.
Online Marketing data is loaded into the MDW directlyfrom the source system, and bypasses the Operational Warehouse entirely.
Surrogate Keys
Surrogate keys provide a means of defining uniquekeys whose values, with the exception of the Time and Calendar dimensions,are anonymous—that is, the value of a surrogate key has no significanceto the application using it and is strictly an artificial value. Thesystem uses surrogate keys specifically as a means of joining structures.To speed up query access, the MDW resolves PeopleSoft-specific programmingconstructs, such as SetIDs and effective dates and replaces them withsurrogate IDs as key columns. Surrogate keys have no relationshipto the business or production key. Surrogate keys are present in dimensiontables as the primary key and in fact tables as foreign keys to dimensions.However, the dimension record retains the business key as an alternate-keyattribute. Surrogate keys are four-byte integers and their size doesnot change even when production key changes in size.
Although surrogate keys usually do not have any'intelligence,' that is, their value has no meaning, in certain situations,such as the Gregorian Calendar and Time dimensions, intelligent surrogatekeys are used. These intelligent keys enable the ETL process to runmore quickly by providing the option of avoiding a lookup on correspondingdimensions.
Surrogate key fields usually have the suffix _SID (Surrogate ID).
Surrogate Keys and the ETL Process
Surrogate keys are generated from production keysusing the DataStage routine KeyMgtNextValueConcurent(), which receives an input parameter and a name identifying the sequence.The surrogate key can be unique per single dimension target (D) orunique across the whole (W) multidimensional warehouse. This processis enabled by the environment parameter named SID_UNIQUENESS. Thevalue for this parameter is provided at run time. If the value is D, then this routine is called with a dimensionjob name for which a surrogate key must be assigned and it returnsthe next available number. If not, the routine is called with EPM as the sequence identifier.
You do not have to take any action to create surrogatekeys; they are generated during the ETL process within the aforementionedDataStage routine. The DataStage routine retrieves the next surrogatekey value and assigns it to the surrogate key that it is currentlycreating. When the ETL process copies a dimension row from the sourcesystem into the MDW, the ETL process performs a lookup on the dimensiontable. If the dimension row (with same business keys) does not existin the dimension table, the process inserts a row with a new surrogatekey value. If the dimension row already exists in the dimension table,the process updates the existing row with the incoming row value.When the ETL process copies a fact row from the source system intothe MDW, for each dimension key in the fact row, the system performsa lookup on the dimension table and retrieves the corresponding surrogatekey value. This surrogate key is the foreign key value in the factrow in the MDW. If the system does not locate a dimension value inthe fact row in the dimension table, that is a data exception andan error results.
Surrogate Key Benefits
Surrogate keys provide benefits such as:
Audit Fields
Audit fields track extract, transform, and load(ETL) loading information, such as when the row was loaded or lastmodified or the batch in which the row was loaded. This informationis included in a subrecord. The subrecord added to MDW tables is calledLOAD_MDW_SBR. Subrecords are always added at the end of a record;no fields exist after this subrecord in any table.
Image: LOAD_MDW_SBR record example
The followingexample shows a typical LOAD_MDW_SBR subrecord.
Data Aggregation
Tables in the MDW contain source data at the samegranularity as the source system. Required data aggregation is carriedout at run time by the business intelligence tool. This allows forbetter control of aggregation strategies by the business intelligencetool, because aggregation requirements vary from customer to customer.
MDW Dimension Tables
Dimensions are sets of related attributes that youuse to group or constrain detailed information that you measure inyour data mart. Dimensions are usually text (in character data type),relatively static, and often hierarchical.
Dimension tables contain surrogate keys as the primarykey and are a single column key containing only the surrogate keycolumn. Surrogate keys usually have _SID (surrogate ID) appended to the field name. Dimension tables retainsource system business key fields as non-key attribute columns inthe dimension table. However, these are not used for joins with facttables. For example, in the Customer dimension, the original businesskey field CUST_ID is retained, if it exists in the source table, butis no longer included in the key. The SetID is also retained, if itexists in the source table, as a nonkey attribute; the value containedin the SetID is the same as in the source system.
If a dimension is SetID-based, the MDW table containsthe source SetID and the performance (PF) SetID, which is named SETID.
If a dimension contains a description text, a relatedlanguage table is often defined for this dimension. The ETL processpopulates this table if a customer requires multilanguage processing.The key for this table is the surrogate key ID, plus the languagecode field, LANGUAGE_CD, whichcontains the code for the additional language.
Note: You can find more information about multilanguageprocessing for the multidimensional warehouse in your EPM Warehousespecific documentation (for example, the PeopleSoft EPM: Campus Solutions Warehouse).
Shared Dimensions
Dimensionssuch as Account, Customer, Department, or Person are examples of shareddimensions. Shared dimensions are either exactly the same—includingkey structure—or an exact subset of another dimension; that is, shareddimensions are structurally identical every place in which they areused. Shared dimensions are used across all EPM warehouse products,such as the Campus Solutions Warehouse and the Financial ManagementSolutions Warehouse.
When using a shared dimension, the system consistentlyinterprets attributes; hence rollups across data marts are possibleand consistent. When a warehouse is provided data from multiple sources,a shared dimension is typically (but not always) built from multiplesource structures.
Image: EPM conformed dimension
The followingis a sample MDW shared dimension shown in Application Designer.
MDW Dimension Table Naming Convention
MDW dimension tables use the following naming convention:D_[table name].
MDW Fact Tables
MDW fact tables (F_*) contain numeric performancemeasurement data—such as quantity, sales, and revenue—that is usedto build a data warehouse and its related reports. Facts help to quantifya company's activities. A fact is a typically an additive businessperformance measurement. That is, you can usually perform arithmeticfunctions on facts.
In a star schema, a fact table is the central table,each element of which is a foreign key derived from a dimension table.Dimension tables have a surrogate ID column that is the primary keyof that dimension. A fact table may use these dimension surrogateIDs as foreign keys to the dimension table. In the dimensional modelexample graphic presented previously, the Sales fact table containssix foreign keys, each one matching a dimension surrounding the facttable.
Periodic Snapshot Fact TablesSurrogate Keys Generated During Etl Process Pdf
Periodic Snapshots provide a view of the cumulativeperformance of the business at regular, predictable time intervals.Unlike a transaction fact table that loads a row of data for eachevent occurrence, the periodic snapshot fact table captures the eventat the interval of a day, week, or month, and another capture at theinterval of the next period, and so on. These periodic snapshots arestacked consecutively into the fact table. The periodic snapshot facttable often is the only place to easily retrieve a regular, predictable,trend view of the key business performance metrics.
Accumulating Fact Tables
Accumulating snapshots represent an indeterminatetime span, covering the complete life of a transaction or discreteproduct. Accumulating snapshots almost always have multiple date stamps,representing the predictable major events or phases that take placeduring the course of a lifetime. Since many of these dates are notknown when the fact row is first loaded, we must use surrogate datekeys to handle undefined dates.
Surrogate Keys Generated During Etl Process Pdf![]() MDW Fact Table Naming Convention
MDW fact tables use the following naming convention:F_[table name].
If you are working on Data warehouse project, than you might have heard lot about surrogate keys. Surrogate keys are widely accepted data warehouse design standard. In this article, we will check data warehouse surrogate key design, advantages and disadvantages.
What are surrogate keys in Data warehouse?Surrogate Keys Generated During Etl Process System
If you are a data warehouse developer, that you might be thinking what is surrogate key? How and where it is being used? You will get answers to all your questions here.
Data warehouse surrogate keys are sequentially generated meaningless numbers associated with each and every record in the data warehouse. These surrogate keys are used to join dimension and fact tables.
Why surrogate keys are used in Data warehouse?
Basically, surrogate key is an artificial key that is used as a substitute for natural key (NK) defined in data warehouse tables. We can use natural key or business keys as a primary key for tables. However, it is not recommended because of following reasons:
For example, product codes can be revised and reused after few years. It will become difficult to differentiate current products and historic products. To avoid such a situation, surrogate keys are used.
Data Warehouse Surrogate Key examples
Surrogate Keys Generated During Etl Processing
Surrogate Keys are integers that are assigned sequentially in the dimension table which can be used as primary key. The surrogate key column could be identity column or database sequences are used.
Below is the sample example of surrogate key:
Surrogate Keys Generated During Etl Process Definition
Advantages of Surrogate Key
Below are some of advantages of using surrogate keys in data warehouse:
Disadvantages of Surrogate Key
Below are some of disadvantages of using surrogate keys in data warehouse:
Related articles
Comments are closed.
|
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
December 2020
Categories |