Republish Article: Data Cleaning – The Right Data Delivery at the Right Time

Data Cleaning – The Right Data Delivery at the Right Time

Author: spidergoose

How much money do you think is lost each year by companies who send out mail to the incorrect people at incorrect addresses? I am sure we would all be pretty horrified if we were to discover the actual figures. Perhaps even worse than the wrong mail being sent to the wrong addresses is the fact that some of data contained within this mail may be of an extremely sensitive and private nature.

Unfortunately these practices have been going on for years with millions upon millions of pounds being wasted. The sad thing is that it really can be avoided. Data cleaning which is also commonly known as data cleansing and data scrubbing involves employing systems that essentially tidy up and sort data by relevance and eliminates errors and inaccuracies.

Almost all modern businesses rely on computerised data systems to deal with their customers and clients. Consider how many stores we spend our money in that now operate some kind of membership or reward card scheme, all of them with our personal information just a button click away. It is absolutely crucial for consumer confidence that this data be kept up to date and accurate.

It goes without saying that any company that holds the personal data of people have a great responsibility and should do their utmost to ensure that it is accurate. Home addresses, email addresses, telephone numbers, banking information, financial history, medical records, dates of birth, family history, security records, credit history. The amount of different types of data that needs to be maintained is varied and vast.

In this current age, any company holding data should be employing an efficient data cleaning system. A good system will be able to detect, sort and clean any incorrect or incomplete data on a database. Data errors both large and small need to be eradicated, from huge chunks of missing information down to the tiniest typing error, all of which may result in people receiving the wrong data, or even worse someone else's data.

It is essential to stress the importance of data cleaning and both those that hold data and those that the data belongs to have a role to play. It is equally important that any people receiving wrong or incomplete data make those responsible aware of the errors, just as it is vital that those in the position of holding data need to regularly sort and update data with their data cleansing systems.

Article Source:

About the Author

For much more information about data cleaning and data cleaning systems, please visit the website at

Republish Article: 7 Tiers of Data Recovery – Software Aspects

7 Tiers of Data Recovery – Software Aspects
By Eugene Mayevski

Disaster recovery planning is one of the key components of smooth business security strategy. While hardware component of such planning is well discussed in the manuals and white papers of hardware providers, the software component, being no less important, is often overlooked in the planning.

Solid File System (SolFS) is a software component for programmers working on data storage and data integrity solutions. Integration of SolFS into data recovery solutions will reduce recovery time, minimize data loss and insure data integrity, prevent malicious tempering or destruction, and reduce requirement for highly-skilled IT workforce. This white paper analyzes advantages of SolFS use in the area of data recovery following a disastrous event of any nature.

Recovery Planning

Statistics show (Jim Hoffer, Health Management Technology) that only 6% percent of enterprises fully recover after serious software or hardware disaster, either malicious or due to negligence, while 43% never reopen and the remainder 51% of companies close within two years.

Planning for data recovery became an ubiquitous and necessary process for any company that can not afford significant downtimes due to data loss, and in real life this means every company. The inevitable losses resulted from company activity interruptions can come from:


  1. Direct revenue loss
  2. Loss of "face" — customer trust, damage to company image, etc.
  3. Brand damage
  4. Loss of know-hows, insiders information leaks, public availability of privileged data, etc.
  5. Legal costs


The key elements needed to prevent these severe consequences of a disaster and to insure business continuity is careful proactive planning of disaster recovery strategy. For every business process such strategy must define a Recovery Point Objective (RPO) and Recovery Time Objective (RTO). As always, a right trade off between costs and speed/effectiveness of recovery should be chosen. Obviously, the zero data loss, zero recovery time solutions are the most expensive.

Besides well-known hardware based precautions, one of the way to reduce costs of disaster recovery is use of custom file systems, such as Solid File System. Solid File System allows creation of huge encrypted compressed single file storage's encompassing any type of data. This paper analyzes possible application of Solid File System (SolFS) on every of seven traditionally identified tiers of business continuity solutions.

Tier 1: Data backup with no hot site

Businesses with Tier 1 continuity solution rely on tape backups made at specific time intervals. These tapes are then shipped off site for storage.

For the reserve copying purposes, it is very convenient to place data into a SolFS-based storage. All documents will be conveniently stored in one file. There is no need to rewind the tape searching for a specific document – the whole storage can be quickly restored.

Moreover, the fact that SolFS has built-in cryptographic protection, allows the company to entrust tape storage to almost any third-party service provider without risk of information leaks. In this case the keys or passwords used for encryption should be safeguarded and kept separately from backups. A loss of such key will not effect feasibility of storage restoration, but will make access do stored data impossible.

SolFS also allows use of incremental backup systems working on the sector-by-sector basis: there is no need to update the whole storage file when minimal changes have been made to the data. Practicability of this approach depends on the frequency of stored file changes, i.e. on the specific application. The advantage of reserve copying whole storage's is that the backup system does not need to know the internal structure, encapsulation level, or directory tree of the storage. The whole storage will be copied without possibility of loss of a single file attribute.

In addition, SolFS supports native data compression. If a SolFS storage contains data susceptible to compression, use of SolFS for whole storage compression is much more time- and cost-effective than use of regular compression tools applied to separate files or folders. SolFS-based storage's use journaling for self-integrity checks. If a part of a tape or sector on disk becomes physically damaged and unreadable, the whole storage, save the damaged file(s), remains intact and functional. There is also a possibility to backup separate files from your SolFS storage, if necessary. SolFS Driver Edition allows making access to your storage as regular files and folders from the application of reserve copying or any other application. This also makes possible development of a monitoring tools watching the changes made to files inside a SolFS storage and exporting them in any convenient format for reserve copying or any other manipulations.

Naturally, the restoration of a whole SolFS storage takes more time than a single file, but, as a result, you are getting the whole working storage with all files inter-dependencies and directory content preserved. Such data restore operation can be executed by less qualified personal than that required for a full manual re-assembly of storage structure.

In addition, use of SolFS-based storage makes possible easy separation of storage back-ups from operating system back-up procedures: quickly restore your storage independently from software operation environment.

Tier 2: Data backup with a hot site

This tier has the same provisions for disaster recovery as Tier 1, plus provides a reserve computer system (so-called hot site) at a remote physical location. The hot site is capable of handling the same data processes as the main system. Upon a disaster event, the data saved on tapes are restored on this reserve system. This approach allows faster system restoration, as only data, not the system itself, are to be restored.

The use of SolFS-based storage's provide significant advantages over traditional backups. Since the reserve copying of data is made separate from the system, they can be deployed in the new places faster and by less-qualified personnel. The remote site will be able to start work in less time, thus significantly reducing RTO.

Tier 3: Electronic vaulting

Tier 3 has an additional provision for some mission critical data to be constantly copied to a remote server (electronic vault) through a dedicated channel. Since a bandwidth of such constantly open channel is limited, only predefined data of utmost importance can be backed up under these provisions.

SolFS allows partition of the critical data into separate storage, which will significantly simplify their transfer and later recovery. The SolFS functionality can be enhanced so that the change to data, deemed to be critical, automatically triggers data transfer through the aforementioned dedicated channel to the electronic vault. Moreover, SolFS allows multi-stream access to the storage: your separate subsystem can monitor the state of the critical data and transfer them to electronic vault. The integrity of the storage is not violated, encryption and access authentication are also supported.

Tier 4: Point-in-time copies

This level is different from the previous three in that the hard disks are used in place of tape. The disk have faster access time, but still need to be shipped to a remote storage location through the same channels as tape.

The advantage of the SolFS in this case is that SolFS-based storage's are single files, and recording of a single file takes much less time than writing of all files and directory tree one-by-one. The same applies to the recovery. As in previous case, the remote facility receives encrypted disks, making data tampering impossible. Native use of compression increases speed of writing to disks and recovery even more.

Tier 5: Transaction integrity

Retail and service organizations are ofter centered around transactions: rounds of communication interactions between the company and its customer, vendor, supplier, etc. Applications used by these enterprises are also centered on the transaction, and preserving transaction integrity between its initiation and completion is very important.

SolFS-based storage's support transaction integrity by default. All transactional files remain in their original context, preserve their links and interdependencies. Recovery process from such storage returns all the transactions to the time point immediately preceding the disaster. The transaction generating/managing application can be restarted exactly where it was left with almost no data lost. To increase security and efficiency, all files changed during a transaction may be singled out in a separate storage for real-time immediate backup to a remote electronic vault. The feasibility and practicability of this approach depends on the logic and design of the transaction-generating application.

Tier 6: Near-zero data loss

This level presumes existence of an application doing constant synchronous or asynchronous mirroring of data to a geographically remote server. This solution is independent of the software used for everyday business operations.

SolFS storage's are fully compatible with such applications and give additional advantages of faster compression, encryption, and full control over data access and authentication.

Tier 7: Highly automated, business integrated solution

The seventh, highest tier, is different from previous one in that the disaster event is automatically detected by a device(s) separate from the computer system. The disaster event triggers system restoration and activation of mirror reserve site without any human input.

Advantages of SolFS storage's in this scenario are similar to those described above.

Regardless of the specific data recovery tier which the company will choose, use of SolFS storage's gives advantages of faster recovery time, integrity preservation, data protection from inadvertent or malicious destruction and tampering while in storage, and generally reduce the requirement for highly-skilled specialist on the solution customer side.

Eugene Mayevski takes a post of Chief Technical Officer in EldoS Corporation, the company that specializes in development of security and low-level system components for software developers.

Solid File System is the product of EldoS Corporation that provides virtual file system for software and hardware development.

Article Source:—Software-Aspects&id=1741155