In which aspect is RStudio better than Microsoft Excel?
As Francis points out the realparison should be between R and Excel. R doesn't automatically and irreversibly convert gene names (like SEPT7 or MARCH5) into dates when you read them in from a file (and then store those dates as the integer number of days from the start of someputer universe). R doesn't let you accidentally sort just one column in a multicolumn dataset thus breaking important connections between data points. R doesn't automatically try to convert a simple -based data format (CSV or TSV) into a proprietary binary format every time you try to save data. R works with many data sets at once with a wide variety of structures. Excel mostly works with single rectangular spreadsheet tables. There are at least 15 different R packages all free and open source implementing a tremendous variety of methods and algorithms. R can produce and customize a wide variety of different plots and charts. Everything in R can bepletely scripted in a way that makes theputationspletely reproducible. RStudio makes it easy to use the knitr and rmarkdown packages to start with a single source file and reproducibly create a report (as a PDF HTML or Word document) thatbines arbitrary statisticalputations figures tables explanations and interpretations.
How do data analysts make the millions of raw data into a readable format?
Cleaning data which is more than just making it readable takes a lot of intuition and resourcefulness. I wish there was a prescriptive list I could talk about here but consider these questions Start by asking what does your team want ? Just because some data is available does not make it useful. In truth one of an important success factors for a data architect is to be ruthless in his triage of what data is necessary. Once you know what you want focus on those fields only. It will save you a lot of heart burn. As you gain more experience it will be your job to figure out what does the team want exactly. This is iterative with the next step. To execute on this strategy you will also have to understand what tools are available ? Is Excel sufficient (don jeer it - it a life saver in data science)? Do you need Python scripts? Do you know how to write Python scripts? Perhaps the needs is more like SAS? The more sophisticated your skill set and tools the more sophisticated can your strategy be. In most data sets there is bound to be rows that do not conform to the pattern. Then your task is to figure out how to make data rows consistent and conforming ? This takes a lot of intuition resourcefulness and judgement. The solutions youe up in this step will affect your mapping strategy in step 3 and the patterns used in step 4. You will learn to live with some acceptable loss and will realize that there could be multiple patterns in the data. Hope this helps.