Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Please answer the following questions 4. What is the problem of \"dirty data\" a

ID: 3686892 • Letter: P

Question

Please answer the following questions

4. What is the problem of "dirty data" and what are two general approaches to dealing with this problem?

5. Why can data exist at different levels of an enterprise and what are two broad schemes for dealing with the different levels of data.

6. What is meant by application independence and why is it important?

8. Describe a distributed information system you have dealt with in everyday life.

9.   Why do you think groupware is becoming a more important enterprise system these days?

1.       What is meant by column-store database?

2.   What was the Y2K problem and what affect did it have on the development of ERP systems?

Explanation / Answer

1. columnar database :

A columnar database is a database management system (DBMS) that stores data in columns instead of rows. The goal of a columnar database is to efficiently write and read data to and from hard disk storage in order to speed up the time it takes to return a query.

In a columnar database, all the column 1 values are physically together, followed by all the column 2 values, etc. The data is stored in record order, so the 100th entry for column 1 and the 100th entry for column 2 belong to the same input record. This allows individual data elements, such as customer name for instance, to be accessed in columns as a group, rather than individually row-by-row.

4. In a data warehouse, dirty data is a database record that contains errors. Dirty data can be caused by a number of factors including duplicate records, incomplete or outdated data, and the improper parsing of record fields from disparate systems.

The concept of dirty data can be said as any data which is not consistent with the already residing data in a data warehouse. The types of dirty data could be misspellings like “green” replaced by “rgeen”,”1” replaced by “l”,typographical or phonetic errors. Some fields also have numerical constraints like weight cannot be “negative”, people cannot have“more than two parents”, a human cannot be of “more than 100-120 years”.

these two are the general approaches for data cleaning:

a) Identification of errors-records could haveincomplete or corrupted data.

b) Perform error verification-whether it is truly anerror or not. This situation occurs in organizationswhere there exists a usage of organizational jargons.