Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I\'m searching for a software where I can import a large amount of data (5000 co

ID: 658834 • Letter: I

Question

I'm searching for a software where I can import a large amount of data (5000 columns, unknown number of rows, about 20gb) and the software gives me a estimate correlation (does not have to be very accurate). Google translate says this is has to be the "correlation coefficient" (german "korrelationskoeffizient").

Use case: I work in a (rough translation) "waste heat power plant" as a student help employee and wanted to optimize some processes. The 5000 columns are measurements taken from sensors. At the end I want to know what value(s) will rise if I change another. Of course, in the end, this will be checked by people who understand this topic better than me.

Edit: The operating system does not matter. Preferred is windows or mac (workplace) but linux is ok, too (private)

Explanation / Answer

If your computer has enough RAM, you could completely read and process the file with R, optionally using the data.table package.

From your description, you have a long time series of sensor readings and would like to model the system to be able to predict behaviour. This is a very complex topic that I'm not familiar with, but R packages, like "forecast" exist for building such models.

For a start, it would be probably a good idea to cut a small part of the big 20Gb file and to analyse just that manageable portion. Doing such a cut is trivial with Unix/Linux tools in a terminal:

head -n 5000 bigfile.txt > first5000lines.txt

It is always a good idea to actually look at a fraction of the data in a spreadsheet software and you can reduce the number of columns by using:

cut -f 1:50 first5000lines.txt > 50colsfirst5000lines.txt

A lot of research and effort to understand data analysis will be required for the task you describe and I don't think that there is any kind of software able to automagically do the work.

R is cross-platform and versions for Windows, Mac and Linux exist.