Assume that the United Nations has conducted a global salary survey and included
ID: 3875240 • Letter: A
Question
Assume that the United Nations has conducted a global salary survey and included the data in the adult.data.simplified.csv data set posted here. As the junior data analyst at its headquarters, it is your job now to find all the rows related to a certain country and determine the following for a numeric attribute
1) minimum, maximum, range (1 point)
2) mean, median, mode, standard deviation (1 point)
3) Q1, Q3 (1 point)
How would you approach such a question?
I am looking for what kind of functions does this 3 questions produce?
Explanation / Answer
CSV files are a common exchange design between software packages supporting tabular data. It also easily produced manually with a text editor or with end-user written scripts or programs.
While in theory .csv files could have any extension, in order to auto-recognize the format OGR only supports CSV files ending with the extension ".csv". The saving name may be either a single CSV file or point to a directory. For a directory to be recognized as a .csv data source at least half the files in the directory need to have the extension .csv.
If we are using large number of data set we need code that can perform, import data as character column. That can also create instances of data as numeric columns. and merging the numeric and character columns into a final new dataset.
To declare integer/ character the header file will contain:
@attribute data numeric
@attribute minimum numeric
@attribute maximum numeric
@attribute range{1,more} // ranges from 1 to more number//
These functions will automatically import .csv file by conserving the data type of the original column as character and numeric.