1500字范文,内容丰富有趣,写作好帮手!
1500字范文 > 数据湖 数据仓库 数据集市_还原数据仓库/数据集市环境的两种方法

数据湖 数据仓库 数据集市_还原数据仓库/数据集市环境的两种方法

时间:2023-09-25 04:08:44

相关推荐

数据湖 数据仓库 数据集市_还原数据仓库/数据集市环境的两种方法

数据湖 数据仓库 数据集市

Implementing best data warehouse designs and practices such as data lineage reduces the need to ever have to restore an entire relational data warehouse. However, sometimes there are instances whereby you have inherited poorly designed data warehouse environments that leaves you with no other options but to perform an entire database restore in an event of a sudden disaster. I recently found myself in a similar situation of having to recover one of my data mart following a data integrity issue wherein all data of a type 1 dimension was updated/overwritten using an incorrect source file. In this article I take a look at how different approaches can be utilised to restore the compromised SQL Server-based data mart back to its “good state”.

实施最佳的数据仓库设计和实践(例如数据沿袭)可以减少恢复整个关系数据仓库的需要。 但是,有时在某些情况下您继承了设计不佳的数据仓库环境,使您别无选择,只能在发生突发灾难时执行整个数据库还原。 最近,我发现自己处于类似的情况,因为数据完整性问题(其中使用错误的源文件更新/覆盖了所有类型1维的所有数据),不得不恢复我的一个数据集市。 在本文中,我将探讨如何利用不同的方法将已损坏的基于SQL Server的数据集市恢复到“良好状态”。

背景 (Background)

Likewise, such a topic is better discussed through practical examples. Thus, below are the details of a fictitious data warehouse that I will refer to throughout this article.Figure 1indicates my fictitious data warehouse,selectSIFISOBlogs,as well as a recently restored backup of the same data warehouse labelledselectSIFISOBlogs_BAK.

同样,可以通过实际示例更好地讨论这一主题。 因此,下面是我将在本文中引用的虚拟数据仓库的详细信息。图1指示了我的虚拟数据仓库selectSIFISOBlogs,以及最近恢复的相同数据仓库的备份,该备份标签为selectSIFISOBlogs_BAK。

BothselectSIFISOBlogsandselectSIFISOBlogs_BAKcontains dimensions and fact objects that relate to each other in a multidimensional star schema, as shown inFigure 2.

selectSIFISOBlogs和selectSIFISOBlogs_BAK都包含多维星型架构中彼此相关的维和事实对象,如图2所示。

The customer dimension is the compromised object that has recently been updated with an incorrect source. For instance,Figure 3-4compares the two versions of customer dimension, as it can be seen that whilst the version in theselectSIFISOBlogs_BAKcontains correct customer name values the version inselectSIFISOBlogshas been updated using method of payment (MOP) source file in such a way that customer name values are now set to eitherCASHorELECTRONIC.

客户维度是最近使用不正确来源更新的受害对象。 例如,图3-4客户维度的两个版本进行比较,因为这可以看出,虽然在selectSIFISOBlogs_BAK版本包含正确的客户名称值selectSIFISOBlogs的版本已经使用支付(MOP)源文件的方法,在这样的更新现在可以将客户名称值设置为CASH或ELECTRONIC。

Another significant change between the data in the two versions of the data warehouse is the amount of data they each possess.Script 1makes use of an EXCEPT T-SQL clause to identify differences in the fruit dimension between the two data warehouse environments.

两个版本的数据仓库中的数据之间的另一个重要变化是它们各自拥有的数据量。脚本1使用EXCEPT T-SQL子句来识别两个数据仓库环境之间的水果维度上的差异。

SELECT [FruitKey],[FruitName],[InsertDate]FROM [selectSIFISOBlogs].[DIM].[Fruit]EXCEPTSELECT [FruitKey],[FruitName],[InsertDate]FROM [selectSIFISOBlogs_BAK].[DIM].[Fruit]

The results ofScript 1execution are shown inFigure 5and they indicate that the fruit dimension in the backup database is missing an entry that contains an apricot fruit.

执行脚本1的结果如图5所示,它们表明备份数据库中的水果维度缺少包含杏果实的条目。

Not surprisingly, as shown inFigure 6the results of comparing theFruitSalesfact table indicates that theselectSIFISOBlogsdatabase has one more transaction relating to the sale of apricots than its counterpart in the backup database.

毫不奇怪,如图6所示,FruitSales事实表的比较结果表明,selectSIFISOBlogs数据库与杏销售相关的交易比备份数据库中的对应交易多。

方法1:完整数据仓库还原 (Approach 1: Full Data Warehouse Restore)

Given the discrepancies observed in the two versions of customer dimensions, the simplest way of restoring this dimension to a previous “good state” is by conducting a full database restore of.Figure 7shows a restore ofselectSIFISOBlogsdata warehouse environment using the Restore Database wizard in SQL Server Management Studio (SSMS).

鉴于在两个版本的客户维度中都观察到差异,将这个维度还原到先前的“良好状态”的最简单方法是对它进行完整的数据库还原。图7显示了使用SQL Server Management Studio(SSMS)中的“还原数据库”向导还原selectSIFISOBlogs数据仓库环境。

Pros and Cons of a Full Data Warehouse Restore

完整数据仓库还原的利与弊

An obvious benefit of this approach is that it is so simple that you don’t need to depend on a data warehouse team to run with the restore operation as even your typical SQL Server DBA can easily perform this operation. Unfortunately, simplicity of this approach can be its very own disadvantage. For instance, with this approach you are likely to lose any changes (i.e. data changes, table structure changes, etc.) that you would have made after the backup was created. This means that in the case of our fictitious data warehouse we are going to lose the information relating to the apricot transaction.

这种方法的一个明显好处是它是如此简单,以至于您无需依赖数据仓库团队来执行还原操作,因为即使您的典型SQL Server DBA都可以轻松地执行此操作。 不幸的是,这种方法的简单性可能是其自身的缺点。 例如,使用这种方法,您很可能会丢失创建备份后所做的任何更改(即,数据更改,表结构更改等)。 这意味着,对于我们的虚拟数据仓库,我们将丢失与杏交易有关的信息。

Therefore, although this approach is easy to implement, it may result into some undesirable outcomes (i.e. loss of data). It is therefore recommended that prior to its implementation that you consult with consumers of your data warehouse environment on the potential consequences of this approach.

因此,尽管此方法易于实施,但可能会导致某些不良后果(即数据丢失)。 因此,建议在实施此方法之前,先就此方法的潜在后果咨询数据仓库环境的使用者。

方法2:部分数据仓库还原 (Approach 2: Partial Data Warehouse Restore)

Instead of restoring the entire database, we can also just restore only the affected customer dimension which will ensure that we avoid data loss in the fruit dimension and fruit sales fact table. Unlike its counterpart, the partial data restore approach involves several steps:

除了还原整个数据库,我们还可以仅还原受影响的客户维度,这将确保我们避免水果维度和水果销售事实表中的数据丢失。 与相应的方法不同,部分数据恢复方法涉及以下步骤:

Ensure that both the current version and backup version of databases are online

确保数据库的当前版本和备份版本都处于联机状态

In our case we need to ensure thatselectSIFISOBlogsandselectSIFISOBlogs_BAKdatabases are online.

在我们的案例中,我们需要确保selectSIFISOBlogs和selectSIFISOBlogs_BAK数据库处于联机状态。

Copy data to replace compromised object(s)

复制数据以替换受损的对象

In this step you have to restore the selected table from the backup database. This can be done using SQL Server Integration Services (SSIS) packages, or T-SQL. Another method would be to drop the current version of the customer dimension and recreate it using the SSMS Import and Export wizard as shown inFigure 8-9.

在此步骤中,您必须从备份数据库中还原选定的表。 可以使用SQL Server Integration Services(SSIS)程序包或T-SQL来完成。 另一种方法是删除客户维度的当前版本,并使用SSMS导入和导出向导重新创建它,如图8-9所示。

Figure 8: Import Data Wizard图8:导入数据向导

Figure 9: Choosing to copy Customer dimension图9:选择复制客户维度

Pros and Cons of a Partial Data Warehouse Restore

部分数据仓库还原的利与弊

The benefit of this approach is that the restore will only affect selected data warehouse objects and as a result it will not overwrite the entire environment which could be costly. Unfortunately, this approach has several drawbacks too:

这种方法的好处是还原将只影响选定的数据仓库对象,因此不会覆盖可能会造成高昂成本的整个环境。 不幸的是,这种方法也有几个缺点:

Import data wizard creates heaps

导入数据向导创建堆

To illustrate this point let’s look at the original definition of the customer dimension inScript 2, as it can be seen, it has several constraints that are crucial to the data consistency and integrity of this dimension.

为了说明这一点,让我们看一下脚本2中客户维度的原始定义,可以看出,它具有几个约束,这些约束对于该维度的数据一致性和完整性至关重要。

CREATE TABLE [DIM].[Customer]([CustomerKey] [int] IDENTITY(1,1) NOT NULL,[CustomerCode] [varchar](50) NOT NULL,[CustomerName] [varchar](250) NOT NULL,[InsertDate] [datetime] NOT NULL DEFAULT (getdate()),CONSTRAINT [PK_CustomerKey] PRIMARY KEY CLUSTERED ([CustomerKey] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]

Script 2: Customer dimension script with constraints脚本2:具有约束的客户维度脚本

However, the import wizard led to a creation of a dimension that is missing default constraints and primary key, as shown inScript 3.

但是,导入向导导致创建缺少默认约束和主键的维,如脚本3所示。

CREATE TABLE [DIM].[Customer]([CustomerKey] [int] NOT NULL,[CustomerCode] [varchar](50) NOT NULL,[CustomerName] [varchar](250) NOT NULL,[InsertDate] [datetime] NOT NULL) ON [PRIMARY]

Script 3: Customer dimension script without constraints脚本3:无限制的客户维度脚本

An obvious work around to this would be to modify the table after the restore and add those missing constraints back to the dimension or fact table. Another workaround, instead of using the import wizard, would be to restore theselectSIFISOBlogs_BAKdatabase on the same/linked server asselectSIFISOBlogs; drop and recreate the customer dimensionand then use T-SQL (which will include temporarily disabling identity insert) to populate the compromised dimension as shown inScript 4.

解决此问题的一个明显方法是在还原后修改表,并将缺少的约束添加回维表或事实表。 另一种解决方法,而不是使用导入向导,将还原相同/链接服务器作为selectSIFISOBlogs上selectSIFISOBlogs_BAK数据库; 删除并重新创建客户维度然后使用T-SQL(将包括暂时禁用身份插入)来填充受损的维度,如脚本4所示。

SET IDENTITY_INSERT [selectSIFISOBlogs].[DIM].[Customer] ONINSERT INTO [selectSIFISOBlogs].[DIM].[Customer]([CustomerKey],[CustomerCode],[CustomerName],[InsertDate])SELECT ([CustomerKey],[CustomerCode],[CustomerName],[InsertDate]FROM [selectSIFISOBlogs_BAK].[DIM].[Customer]SET IDENTITY_INSERT [selectSIFISOBlogs].[DIM].[Customer] OFF

Script 4: Insert statement with identity insert脚本4:带有标识插入的Insert语句

Different resources required

需要不同的资源

As mentioned earlier this approach is complex and unlike the full restore approach you cannot just rely on a DBA – instead a data warehouse developer will have to be involved to write T-SQL statements or SSIS packages.

如前所述,这种方法很复杂,与完全还原方法不同,您不能仅依靠DBA,而是必须由数据仓库开发人员来编写T-SQL语句或SSIS包。

结论 (Conclusion)

In this article we have demonstrated, by using a fictitious data warehouse, the different approaches available to successfully restore a data warehouse environment. The benefits and disadvantages of such approaches were discussed with a recommendation that whenever possible you should inform consumers of your data warehouse environment about any planned changes to the environment.

在本文中,我们通过使用虚拟数据仓库演示了可用于成功还原数据仓库环境的不同方法。 讨论了这种方法的优点和缺点,并建议您尽可能将有关计划中的环境更改告知数据仓库环境的使用者。

参考 (Reference)

Temporarily disable identity column of a table临时禁用表的标识列 Top 10 Best Practices for Building a Large Scale Relational Data Warehouse建立大型关系数据仓库的十大最佳实践 EXCEPT T-SQL ClauseT-SQL子句除外

翻译自: /two-methods-restoring-data-warehousedata-mart-environment/

数据湖 数据仓库 数据集市

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。