Course : Introduction to Data Science
Course code : ECON1362
Data processing with spreadsheets
Data processing techniques using MS Excel spreadsheet software.
1.1 Basic Worksheet Functions
The basic concepts of the workbook and worksheet, cells and regions.
Types and functions.
Reading: Basic Calculations
Study the following from Chapter 2 of the Open Textbook Beginning Excel?
https://openoregon.pressbooks.pub/beginningexcel/part/chapter-2/
For studing Chapter 2 download data file: CH2 Data
The workbook here includes the students' grade in a course. The file contains 2 columns: the semester of the student and the grade. In the grade column, the NS indicates Not-Shown in the exam. Answer the following questions:
Reading: Formulas and Functions
Study the following from Chapter 3 of the Open Textbook Beginning Excel?
https://openoregon.pressbooks.pub/beginningexcel/part/chapter-3/
The file here includes two worksheets. The worksheet with name "Grades of Exercises" presents the grades of each student in 5 exercises and the weight of each exercise in the final grade. Use spreadsheet functions to answer the following questions:
The file here (xlsx file) shows the staff of a project in a university. For each employee participating in the project, the following items are given in the columns: Name, Project Team to which she/he belongs, Category of job relationship (Staff with contract, Permanent staff or Teaching staff), Initial Contract amount and Final Contract amount (in Euro). On the basis of these data, fill in the necessary columns and answer the questions.
1.2 Visualize data with charts
Import data and present them with charts
Reading: Present data with Charts
Study the following from Chapter 4 of the Open Textbook Beginning Excel?
https://openoregon.pressbooks.pub/beginningexcel/part/chapter-4/
Practice: Create different chart types
Download the file here and create the charts in each worksheet according to the instructions.
The dataset includes the grades of students in an examination. The dataset includes 3 columns: a numeric index (#), the student semester of study and the grade. In column "Grade", the text NS means that the specific student did not participate in the exam (not-shown). Process the dataset and answer to the following questions:
1.3. Import and Organize data in Tables
Import text data in Excel and manage them with tables
What is CSV?
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. A CSV file stores tabular data (numbers and text) in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas (or other delimiters such as tab [TSV], semicolon, etc). (Definition from Wikipedia, https://en.wikipedia.org/wiki/Comma-separated_values)
CSV files are widely used in business applications as an open format, compatible to different programs.
Practice: Import CSV data in MS Excel
Instructions:
https://www.hesa.ac.uk/support/user-guides/import-csv
Attention in importing data in MS Excel 2016!
- First review your data with a text editor.
- If your Excel uses commas for decimal numbers (e.g 72,5) there will be a problem with CSV data having demical numbers with dot (e.g. 72.5).
- For solutions see here.
Data for practice:
Reading: Organize data in tables
Study the following from Chapter 5 of the Open Textbook Beginning Excel?
https://openoregon.pressbooks.pub/beginningexcel/part/chapter-5/
1.4 Summarizing Data
Learn how to summarize data using descriptive statistics, histograms and pivot tables.
Reading: Summarizing Data: Descriptive Statistics and Histograms with Excel Analysis Toolpak
- Step 1: Load and activate the Analysis Toolpak
- Step 2: Study the Chapter 3 with reference Herkenhoff L., Fogli J. (2013) Summarizing Data: Descriptive Statistics and Histograms. In: Applied Statistics for Business and Management using Microsoft Excel. Springer, New York, NY (available only with University IP address)
Practice: Descriptive statistics and histogram of student grades
- Download this file
- Use the Analysis Toolpak and produce Descriptive statistics of column "Grades". You should first short the data to avoid NS.
- Use the Analysis Toolpak and create a Histogram. Use as Bin grades from 0..10 with granularity 0,5
- Insert a Histogram from Chart (Excel 2016 and newer versions). Help can be found in Mictosoft Support Center here.
Reading: Summarizing Data using Pivot tables
What is a pivot table?
A pivot table is a table of statistics that summarize the data of a more extensive table. This summary might include sums, averages, or other statistics, which the pivot table groups together in a meaningful way. (Definition from Wikipedia)
Study Chapter 8: PivotTables and Pivot Charts from the following book:
Practice: Agreegate payment data with pivot tables
- Download the file that contains payment data and understand the meaning of each column
- Review the section about pivot tables at the Microsoft Support Center
- Create different pivot tables and pivot charts to summarize the payment data:
-
Summary of cost per product category
-
Summary of cost per product category and cost center (short code)
-
Summary of cost per month and per product category
-
Summary of number of payments, invoices and percentage of cost per supplier. Find top 10 suppliers
-
Optional Reading: Descriptive Statistics
Review the concepts of descriptive statistics in Chapter 2 of the Open Textbook Introductory Statistics
First part of a Lecture on Descriptive Statistics for the Introduction to Business Statistics Course.