Option 2: Academy of Py

Randi Howell
Assignment 4 - Pandas

Well done! Having spent years analyzing financial records for big banks, you have finally scratched your idealistic itch and joined the education sector. In your latest role, you've become the Chief Data Scientist for your city's school district. In this capacity, you'll be helping the school board and mayor make strategic decisions regarding future school budgets and priorities.

As a first task, you haveve been asked to analyze the district-wide standardized test results. You will be given access to every student's math and reading scores, as well as various information on the schools they attend. Your responsibility is to aggregate the data to and showcase obvious trends in school performance.

Your final report should include each of the following:

District Summary

Create a high level snapshot (in table form) of the district's key metrics, including:

  • Total Schools
  • Total Students
  • Total Budget
  • Average Math Score
  • Average Reading Score
  • % Passing Math
  • % Passing Reading
  • Overall Passing Rate (Average of the above two)
  • School Summary

Create an overview table that summarizes key metrics about each school, including:

  • School Name
  • School Type
  • Total Students
  • Total School Budget
  • Per Student Budget
  • Average Math Score
  • Average Reading Score
  • % Passing Math
  • % Passing Reading
  • Overall Passing Rate (Average of the above two)
  • Top Performing Schools (By Passing Rate)

Create a table that highlights the top 5 performing schools based on Overall Passing Rate. Include:

  • School Name
  • School Type
  • Total Students
  • Total School Budget
  • Per Student Budget
  • Average Math Score
  • Average Reading Score
  • % Passing Math
  • % Passing Reading
  • Overall Passing Rate (Average of the above two)
  • Top Performing Schools (By Passing Rate)

Create a table that highlights the bottom 5 performing schools based on Overall Passing Rate. Include all of the same metrics as above.

  • Math Scores by Grade

Create a table that lists the average Math Score for students of each grade level (9th, 10th, 11th, 12th) at each school.

  • Reading Scores by Grade

Create a table that lists the average Reading Score for students of each grade level (9th, 10th, 11th, 12th) at each school.

  • Scores by School Spending

Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following:

  • Average Math Score
  • Average Reading Score
  • % Passing Math
  • % Passing Reading
  • Overall Passing Rate (Average of the above two)
  • Scores by School Size

Repeat the above breakdown, but this time group schools based on a reasonable approximation of school size (Small, Medium, Large).

  • Scores by School Type

Repeat the above breakdown, but this time group schools based on school type (Charter vs. District).

As final considerations:

  • Scores by School Type
  • Your script must work for both data-sets given.
  • You must use the Pandas Library and the Jupyter Notebook.
  • You must submit a link to your Jupyter Notebook with the viewable Data Frames.
  • You must include an exported markdown version of your Notebook called README.md in your GitHub repository.
  • You must include a written description of three observable trends based on the data.
  • See Example Solution for a reference on the expected format.

Hints and Considerations

  • These are challenging activities for a number of reasons. For one, these activities will require you to analyze thousands of records. Hacking through the data to look for obvious trends in Excel is just not a feasible option. The size of the data may seem daunting, but Python Pandas will allow you to efficiently parse through it.
  • Second, these activities will also challenge you by requiring you to learn on your feet. Don't fool yourself into thinking: "I need to study Pandas more closely before diving in." Get the basic gist of the library and then immediately get to work. When facing a daunting task, it's easy to think: "I'm just not ready to tackle it yet." But that's the surest way to never succeed. Learning to program requires one to constantly tinker, experiment, and learn on the fly. You are doing exactly the right thing, if you find yourself constantly practicing Google-Fu and diving into documentation. There is just no way (or reason) to try and memorize it all. Online references are available for you to use when you need them. So use them!
  • Take each of these tasks one at a time. Begin your work, answering the basic questions: "How do I import the data?" "How do I convert the data into a DataFrame?" "How do I build the first table?" Don't get intimidated by the number of asks. Many of them are repetitive in nature with just a few tweaks. Be persistent and creative!'
  • Expect these exercises to take time! Don't get discouraged if you find yourself spending hours initially with little progress. Force yourself to deal with the discomfort of not knowing and forge ahead. This exercise is likely to take between 15-30 hours of your time. Consider these hours an investment in your future!'
  • As always, feel encouraged to work in groups and get help from your TAs and Instructor. Just remember, true success comes from mastery and not a completed homework assignment. So challenge yourself to truly succeed!

Initial Set-Up:

In [3]:
# Import Dependencies

import csv 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
In [4]:
# Read in file 
school_file = "PyCitySchools/raw_data/schools_complete.csv"
student_file = "PyCitySchools/raw_data/students_complete.csv"

# Store imported files as new DFs
school_df = pd.read_csv(school_file)
student_df = pd.read_csv(student_file)

## Print statements are included for checking purposes - can be deleted if neccessary 
## Print rows of each DF:
# school_df
# student_df
In [5]:
# Rename col "name" in school_df to "school name"
school_df.rename_axis({"name" : "school name"}, axis=1, inplace=True)

# Make a copy of school_df to manipulate and reuse later throughout the code
# Store renamed school_df as new df
copy_school_sum = school_df.copy()


## Print statements are included for checking purposes - can be deleted if neccessary 
## Print both DFs to verify that the columns are the same
#school_df.columns
#copy_school_sum.columns
In [6]:
# Rename col "school" in student_df to "school name"
# Store renamed student_df as new df
renamed_student_df = student_df.rename(columns={"school" : "school name"})


## Print statements are included for checking purposes - can be deleted if neccessary 
## Print both DFs to verify that the columns are the same
#renamed_student_df.columns

Part One: District Summary

  • Total Schools
  • Total Students
  • Total Budget
  • Average Math Score
  • Average Reading Score
  • % Passing Math
  • % Passing Reading
  • Overall Passing Rate (Average of the above two)
  • School Summary
In [7]:
## Total Schools
total_schools = school_df["school name"].count()
#print(total_schools)


## Total Students
total_students = renamed_student_df["school name"].count()
#print(total_students)

## Total Budget
total_budget = school_df["budget"].sum()

## Average Math Score
avg_math_score = renamed_student_df["math_score"].mean()

## Average Reading Score
avg_read_score = renamed_student_df["reading_score"].mean()
In [8]:
## % Passing Math based on 70
math_pass = renamed_student_df.loc[(student_df["math_score"] >= 70)]

count_pass_math = math_pass["math_score"].count()
#print(count_pass_math)

per_math_pass = (count_pass_math/total_students)*100
#print(per_math_pass)
In [9]:
## % Passing Reading based on 70
read_pass = renamed_student_df.loc[(student_df["reading_score"] >= 70)]

count_pass_read = read_pass["reading_score"].count()
#print(count_pass_read)

per_read_pass = (count_pass_read/total_students)*100
#print(per_read_pass)
In [10]:
## Overall Passing Rate (Average of the above two)
overall_pass = (per_math_pass + per_read_pass)/2
#print(overall_pass)
In [11]:
district_summary = {"Total Schools" : total_schools,
                   "Total Students" : total_students,
                   "Total Budget" : total_budget,
                   "Average Math Score" : avg_math_score,
                   "Average Reading Score" : avg_read_score,
                    "% Passing Math" : per_math_pass,
                   "% Passing Reading" : per_read_pass,
                   "% Overall Passing" : overall_pass
                  }


district_summary_df = pd.DataFrame([district_summary])                                              
                                   
district_summary_df = district_summary_df[["Total Schools",
                   "Total Students","Total Budget","Average Math Score", "Average Reading Score",
                   "% Passing Math", "% Passing Reading", "% Overall Passing" ]]

District Summary Table

In [12]:
district_summary_df
Out[12]:
Total Schools Total Students Total Budget Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing
0 15 39170 24649428 78.985371 81.87784 74.980853 85.805463 80.393158

Part Two: School Summary


Create an overview table that summarizes key metrics about each school, including:

  • School Name
  • School Type
  • Total Students
  • Total School Budget
  • Per Student Budget
  • Average Math Score
  • Average Reading Score
  • % Passing Math
  • % Passing Reading
  • Overall Passing Rate (Average of the above two)
  • Top Performing Schools (By Passing Rate)

We will use copy_school_sum for this portion

In [13]:
# First, we will delete the School ID col from copy_school_sum because we will not need it for future use
del copy_school_sum['School ID']

# We can print copy_school_sum to verify the del
# copy_school_sum
In [14]:
## Calculate the Per Student Budget
## We will pull the school budget and divide it by the school size to determine the budget per student
## Then, we will create a new col named 'Per Student Budget'

copy_school_sum['Per Student Budget'] = copy_school_sum['budget']/copy_school_sum['size']

# copy_school_sum
In [15]:
## Average Math Score & Average Reading Score
## We will use a groupby function to group on school name and display both reading and math scores from above
## then reset the index

avg_math_read_tbl = renamed_student_df.groupby(['school name'])['reading_score', 'math_score'].mean().reset_index()

# avg_math_read_tbl
In [16]:
## Now, we merge avg_math_read_tbl with copy_school_sum df
## We want to merge on School Name and by 'outer' to include everything

copy_school_sum = copy_school_sum.merge(avg_math_read_tbl, on='school name', how="outer")

#copy_school_sum
In [17]:
## We will use conditionals to find the % Passing Math and % Passing Reading
## Then, we will store them as new variabels
## We are using a passrate of 70% or higher

# % Passing Reading
summary_passing_read = renamed_student_df[renamed_student_df['reading_score']>=70]

#% Passing Math
summary_passing_math = renamed_student_df[renamed_student_df['math_score']>=70]

#summary_passing_read
#summary_passing_math
In [18]:
## Count the number of students passing in reading 
pass_read_count_sum = summary_passing_read.groupby(["school name"])['reading_score'].count().reset_index()

## Then, rename the column the 'reading_score' to 'Reading Count'
pass_read_count_sum.rename_axis({'reading_score' : 'Reading Count'}, axis=1, inplace=True)

#pass_read_count_sum
In [19]:
## Count the number of students passing in math 
pass_math_count_sum = summary_passing_math.groupby(["school name"])['math_score'].count().reset_index()

## Then, rename the column the 'math_score' to 'Math Count'
pass_math_count_sum.rename_axis({'math_score' : 'Math Count'}, axis=1, inplace=True)

#pass_math_count_sum
In [20]:
## Merge pass_math_count_sum with pass_read_count_sum
## We want to merge on School Name and by 'inner' to include only the contents found in both columns

pass_count = pass_math_count_sum.merge(pass_read_count_sum, on="school name", how='inner')

#pass_count
In [21]:
## Merge copy_school_sum with pass_count we just created
## We want to merge on School Name and by 'outer' to include everything

copy_school_sum = copy_school_sum.merge(pass_count, on="school name", how='outer')

#copy_school_sum
In [22]:
## Calc % passing math and reading
## Take the subject count and divide by the school size, then pmultiply by 100 to get percentage

# % Passing Math
copy_school_sum['% Passing Math'] = (copy_school_sum['Math Count']/copy_school_sum['size'])*100

# % Passing Reading
copy_school_sum['% Passing Reading'] = (copy_school_sum['Reading Count']/copy_school_sum['size'])*100

#copy_school_sum
In [23]:
# Now, we will delete the Math Count and Reading Count cols from copy_school_sum 
# because we will not need the counts for future use

del copy_school_sum['Math Count']
del copy_school_sum['Reading Count']

#copy_school_sum
In [24]:
## Calc % overall passing 
## Overall Passing Rate (Average of the above two)

copy_school_sum['% Overall Passing'] = (copy_school_sum['% Passing Math'] + copy_school_sum['% Passing Reading'])/2

#copy_school_sum
In [25]:
# now, rename axis for reading and math scores to Avg. Reading Score and Avg. Math Score in copy_school_sum
copy_school_sum.rename_axis({'reading_score':'Avg. Reading Score',
                             'math_score':'Avg. Math Score'}, axis=1, inplace=True)

School Summary Table

In [26]:
copy_school_sum
Out[26]:
school name type size budget Per Student Budget Avg. Reading Score Avg. Math Score % Passing Math % Passing Reading % Overall Passing
0 Huang High School District 2917 1910635 655.0 81.182722 76.629414 65.683922 81.316421 73.500171
1 Figueroa High School District 2949 1884411 639.0 81.158020 76.711767 65.988471 80.739234 73.363852
2 Shelton High School Charter 1761 1056600 600.0 83.725724 83.359455 93.867121 95.854628 94.860875
3 Hernandez High School District 4635 3022020 652.0 80.934412 77.289752 66.752967 80.862999 73.807983
4 Griffin High School Charter 1468 917500 625.0 83.816757 83.351499 93.392371 97.138965 95.265668
5 Wilson High School Charter 2283 1319574 578.0 83.989488 83.274201 93.867718 96.539641 95.203679
6 Cabrera High School Charter 1858 1081356 582.0 83.975780 83.061895 94.133477 97.039828 95.586652
7 Bailey High School District 4976 3124928 628.0 81.033963 77.048432 66.680064 81.933280 74.306672
8 Holden High School Charter 427 248087 581.0 83.814988 83.803279 92.505855 96.252927 94.379391
9 Pena High School Charter 962 585858 609.0 84.044699 83.839917 94.594595 95.945946 95.270270
10 Wright High School Charter 1800 1049400 583.0 83.955000 83.682222 93.333333 96.611111 94.972222
11 Rodriguez High School District 3999 2547363 637.0 80.744686 76.842711 66.366592 80.220055 73.293323
12 Johnson High School District 4761 3094650 650.0 80.966394 77.072464 66.057551 81.222432 73.639992
13 Ford High School District 2739 1763916 644.0 80.746258 77.102592 68.309602 79.299014 73.804308
14 Thomas High School Charter 1635 1043130 638.0 83.848930 83.418349 93.272171 97.308869 95.290520

Part Three: Top 5 performing schools

Create a table that highlights the top 5 performing schools based on Overall Passing Rate. Include:

  • School Name
  • School Type
  • Total Students
  • Total School Budget
  • Per Student Budget
  • Average Math Score
  • Average Reading Score
  • % Passing Math
  • % Passing Reading
  • Overall Passing Rate (Average of the above two)
  • Top Performing Schools (By Passing Rate)
In [27]:
## Create a table that highlights the top 5 performing schools based on Overall Passing Rate. 
## Found the top 5 performing by sorting copy_school_sum on the '% Overall Passing' col in copy_school_sum df 
## By sorting, we can find the five top performing

top_performing_by_pr_df = copy_school_sum.sort_values(by=['% Overall Passing'], ascending=False).head(5)

Top 5 performing schools based on Overall Passing Rate

In [28]:
top_performing_by_pr_df.head(5)
Out[28]:
school name type size budget Per Student Budget Avg. Reading Score Avg. Math Score % Passing Math % Passing Reading % Overall Passing
6 Cabrera High School Charter 1858 1081356 582.0 83.975780 83.061895 94.133477 97.039828 95.586652
14 Thomas High School Charter 1635 1043130 638.0 83.848930 83.418349 93.272171 97.308869 95.290520
9 Pena High School Charter 962 585858 609.0 84.044699 83.839917 94.594595 95.945946 95.270270
4 Griffin High School Charter 1468 917500 625.0 83.816757 83.351499 93.392371 97.138965 95.265668
5 Wilson High School Charter 2283 1319574 578.0 83.989488 83.274201 93.867718 96.539641 95.203679

Part Four: Bottom 5 performing schools

Include:

  • School Name
  • School Type
  • Total Students
  • Total School Budget
  • Per Student Budget
  • Average Math Score
  • Average Reading Score
  • % Passing Math
  • % Passing Reading
  • Overall Passing Rate (Average of the above two)
In [29]:
## Create a table that highlights the 5 worst performing schools based on Overall Passing Rate. 
## Include all of the same metrics as above.
## Using the copy_school_sum df found the bottom 5 performing schools based on Overall Passing Rate col
worst_performing_by_pr_df = copy_school_sum.sort_values(by=['% Overall Passing']).head(5)

Bottom 5 performing schools

In [30]:
worst_performing_by_pr_df
Out[30]:
school name type size budget Per Student Budget Avg. Reading Score Avg. Math Score % Passing Math % Passing Reading % Overall Passing
11 Rodriguez High School District 3999 2547363 637.0 80.744686 76.842711 66.366592 80.220055 73.293323
1 Figueroa High School District 2949 1884411 639.0 81.158020 76.711767 65.988471 80.739234 73.363852
0 Huang High School District 2917 1910635 655.0 81.182722 76.629414 65.683922 81.316421 73.500171
12 Johnson High School District 4761 3094650 650.0 80.966394 77.072464 66.057551 81.222432 73.639992
13 Ford High School District 2739 1763916 644.0 80.746258 77.102592 68.309602 79.299014 73.804308

Part 5: Math Scores by Grade

Create a table that lists the average Math Score for students of each grade level (9th, 10th, 11th, 12th) at each school.

In [31]:
# We will use the pivot table group to display the requested information
math_scores_by_grade_df = pd.pivot_table(student_df, values=['math_score'], index=['school'], 
                                         columns=['grade'])
math_scores_by_grade_df = math_scores_by_grade_df.reindex_axis(labels=['9th',
                                                                    '10th',
                                                                    '11th',
                                                                    '12th'], axis=1, level=1)

Math Scores by Grade

In [32]:
math_scores_by_grade_df
Out[32]:
math_score
grade 9th 10th 11th 12th
school
Bailey High School 77.083676 76.996772 77.515588 76.492218
Cabrera High School 83.094697 83.154506 82.765560 83.277487
Figueroa High School 76.403037 76.539974 76.884344 77.151369
Ford High School 77.361345 77.672316 76.918058 76.179963
Griffin High School 82.044010 84.229064 83.842105 83.356164
Hernandez High School 77.438495 77.337408 77.136029 77.186567
Holden High School 83.787402 83.429825 85.000000 82.855422
Huang High School 77.027251 75.908735 76.446602 77.225641
Johnson High School 77.187857 76.691117 77.491653 76.863248
Pena High School 83.625455 83.372000 84.328125 84.121547
Rodriguez High School 76.859966 76.612500 76.395626 77.690748
Shelton High School 83.420755 82.917411 83.383495 83.778976
Thomas High School 83.590022 83.087886 83.498795 83.497041
Wilson High School 83.085578 83.724422 83.195326 83.035794
Wright High School 83.264706 84.010288 83.836782 83.644986

Part 6: Reading Scores by Grade

Create a table that lists the average Reading Score for students of each grade level (9th, 10th, 11th, 12th) at each school.

In [33]:
# will use the pivot table group to display
read_scores_by_grade_df = pd.pivot_table(student_df, values=['reading_score'], index=['school'], 
                                         columns=['grade'])
read_scores_by_grade_df = read_scores_by_grade_df.reindex_axis(labels=['9th',
                                                                    '10th',
                                                                    '11th',
                                                                    '12th'], axis=1, level=1)

Reading Scores by Grade

In [34]:
read_scores_by_grade_df
Out[34]:
reading_score
grade 9th 10th 11th 12th
school
Bailey High School 81.303155 80.907183 80.945643 80.912451
Cabrera High School 83.676136 84.253219 83.788382 84.287958
Figueroa High School 81.198598 81.408912 80.640339 81.384863
Ford High School 80.632653 81.262712 80.403642 80.662338
Griffin High School 83.369193 83.706897 84.288089 84.013699
Hernandez High School 80.866860 80.660147 81.396140 80.857143
Holden High School 83.677165 83.324561 83.815534 84.698795
Huang High School 81.290284 81.512386 81.417476 80.305983
Johnson High School 81.260714 80.773431 80.616027 81.227564
Pena High School 83.807273 83.612000 84.335938 84.591160
Rodriguez High School 80.993127 80.629808 80.864811 80.376426
Shelton High School 84.122642 83.441964 84.373786 82.781671
Thomas High School 83.728850 84.254157 83.585542 83.831361
Wilson High School 83.939778 84.021452 83.764608 84.317673
Wright High School 83.833333 83.812757 84.156322 84.073171

Part 7: Scores by School Spending

Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following:

  • Average Math Score
  • Average Reading Score
  • % Passing Math
  • % Passing Reading
  • Overall Passing Rate (Average of the above two)
In [53]:
# Copy the copy_school_sum and save as scores_by_school_spending
scores_by_school_spending = copy_school_sum.copy()

# Print columns of each to verify they're the same
# copy_school_sum
# scores_by_school_spending.columns
In [54]:
# Create bins - we will need labels and bins
bins = [0, 585, 615, 645, 675]
spending_labels = ['$0-585', '$586-615', '$616-645', '$646-675']
In [55]:
# Use bins and labels to sort through data and divide it up appropriately 
# save bined data as bins_school_spending variable 
bins_school_spending = pd.cut(scores_by_school_spending['Per Student Budget'], bins, labels=spending_labels)

# Convert bins_school_spending to df
bins_school_spending = pd.DataFrame(bins_school_spending)

# add Spending Level col 
copy_school_sum['Spending Level'] = bins_school_spending
In [56]:
# Show cols for bins_school_spending to verify 
#bins_school_spending.columns
In [57]:
# Do a groupby on Spending Level and school name
scores_by_school_spending = copy_school_sum.groupby(['Spending Level'])['Avg. Reading Score',
                                                         'Avg. Math Score',
                                                         '% Passing Reading',
                                                         '% Passing Math',
                                                         '% Overall Passing'
                                                         ].mean()

Scores by School Spending

In [58]:
scores_by_school_spending
Out[58]:
Avg. Reading Score Avg. Math Score % Passing Reading % Passing Math % Overall Passing
Spending Level
$0-585 83.933814 83.455399 96.610877 93.460096 95.035486
$586-615 83.885211 83.599686 95.900287 94.230858 95.065572
$616-645 81.891436 79.079225 86.106569 75.668212 80.887391
$646-675 81.027843 76.997210 81.133951 66.164813 73.649382

Scores by School Size

This time group schools based on a reasonable approximation of school size (Small, Medium, Large).

  • Average Math Score
  • Average Reading Score
  • % Passing Math
  • % Passing Reading
  • Overall Passing Rate (Average of the above two)
In [41]:
# Create a copy of copy_school_sum and save as scores_by_school_size
scores_by_school_size = copy_school_sum.copy()

# Print scores_by_school_size to verify
# scores_by_school_size
In [42]:
# Create bins - we will need labels and bins
bins = [0, 1000, 2000, 5000]
size_labels = ['Small', 'Medium', 'Large']
In [43]:
# Use bins and labels to sort through data and divide it up appropriately 
# save bin data as bins_school_size variable 
bins_school_size = pd.cut(scores_by_school_size['size'], bins, labels = size_labels)

# Convert bins_school_spending to df
bins_school_size = pd.DataFrame(bins_school_size)

# add 'School Population' col 
copy_school_sum['School Population'] = bins_school_size
In [44]:
# Show cols for bins_school_size to verify 
# bins_school_size.columns
In [45]:
# Do a groupby on School Population and school name
scores_by_school_size = copy_school_sum.groupby(['School Population','school name'])['Avg. Reading Score',
                                                         'Avg. Math Score',
                                                         '% Passing Reading',
                                                         '% Passing Math',
                                                         '% Overall Passing'
                                                         ].mean()

Scores by School Size

In [46]:
scores_by_school_size
Out[46]:
Avg. Reading Score Avg. Math Score % Passing Reading % Passing Math % Overall Passing
School Population school name
Large Bailey High School 81.033963 77.048432 81.933280 66.680064 74.306672
Figueroa High School 81.158020 76.711767 80.739234 65.988471 73.363852
Ford High School 80.746258 77.102592 79.299014 68.309602 73.804308
Hernandez High School 80.934412 77.289752 80.862999 66.752967 73.807983
Huang High School 81.182722 76.629414 81.316421 65.683922 73.500171
Johnson High School 80.966394 77.072464 81.222432 66.057551 73.639992
Rodriguez High School 80.744686 76.842711 80.220055 66.366592 73.293323
Wilson High School 83.989488 83.274201 96.539641 93.867718 95.203679
Medium Cabrera High School 83.975780 83.061895 97.039828 94.133477 95.586652
Griffin High School 83.816757 83.351499 97.138965 93.392371 95.265668
Shelton High School 83.725724 83.359455 95.854628 93.867121 94.860875
Thomas High School 83.848930 83.418349 97.308869 93.272171 95.290520
Wright High School 83.955000 83.682222 96.611111 93.333333 94.972222
Small Holden High School 83.814988 83.803279 96.252927 92.505855 94.379391
Pena High School 84.044699 83.839917 95.945946 94.594595 95.270270

Scores by School Type

This time group schools based on school type (Charter vs. District).

  • Average Math Score
  • Average Reading Score
  • % Passing Math
  • % Passing Reading
  • Overall Passing Rate (Average of the above two)
In [47]:
# Create a copy of copy_school_sum and save as scores_by_school_type
scores_by_school_type = copy_school_sum.copy()

# Convert scores_by_school_type to df
scores_by_school_type = pd.DataFrame(scores_by_school_type)

# Print scores_by_school_type cols to verify
# scores_by_school_type.columns
In [48]:
# Do a groupby on School Type
scores_by_school_type = copy_school_sum.groupby(['type'])['Avg. Reading Score',
                                                         'Avg. Math Score',
                                                         '% Passing Reading',
                                                         '% Passing Math',
                                                         '% Overall Passing'
                                                         ].mean()

Scores by School Type

In [49]:
scores_by_school_type.head()
Out[49]:
Avg. Reading Score Avg. Math Score % Passing Reading % Passing Math % Overall Passing
type
Charter 83.896421 83.473852 96.586489 93.620830 95.103660
District 80.966636 76.956733 80.799062 66.548453 73.673757

Final considerations:

  • Your script must work for both data-sets given.
  • You must use the Pandas Library and the Jupyter Notebook.
  • You must submit a link to your Jupyter Notebook with the viewable Data Frames.
  • You must include an exported markdown version of your Notebook called README.md in your GitHub repository.
  • You must include a written description of three observable trends based on the data.
  • See Example Solution for a reference on the expected format.

Observations:

  • In looking a Scores by School Type, we can see that the charter type schools have higher passrates across readin and math than the district typ schools. This is further verified by noting that the top five performing schools were all charter schools.
  • Looking at Math Scores by Grade and Reading Scores by Grade, we see that, in general, students have lower math scores compared to their reading scores.
  • With the Scores by School Spending, we can note that there is not neccessarily a correlation between school spending and average math or reading scores.