Python Pandas Library for Beginners

Python Pandas Library for Beginners 

A Practical Guide to Getting Started and Ditching Spreadsheets. Pandas make it easy to get some descriptive statistics using functionality like expression and counting.

Python Pandas Library for Beginners
Python Pandas Library for Beginners

A quick guide to the works is often found within the documents.

Use .count() to find the number of rows

Use .mean() to find the mean sodium value

Use .median() to find the median sodium value

Use .mode() to find the mode sodium value

Use .min() to find the minimum sodium value

Use .max() to find the maximum sodium value

#Use .Count() to find the number of rowsdf[['brand', 'flavor']].count()

#Use .mean() to find the mean sodium valuedf['sodium'].mean()

# Use .median() to find the median sodium valuedf['sodium'].median()#470.0

# Use .mode() to find the mode sodium valuedf['sodium'].mode()#520

# Use .min() to find the minimum sodium valuedf['sodium'].min()#140#

#Use .max() to find the maximum sodium valuedf['sodium'].max()#650

Adding a new column is easy. This is the basic syntax I use:


Also read, types of Python frameworks for developing applications

DataFrame[‘column name’] = something


Before modifying the data frame, I think it is a good idea to copy it. That way I can always refer to the original if I make a mistake. Create a copy using DataFrame.copy()


Create a new copy of the dataframe named df_copy.

Create a column and set each row to a value of 1.

Display the top 3 rows - brand, flavor, and the new column.

#make a copy 
df_copy = df.copy()

#add a new column 
df_copy['My_New_Column'] = 1

#display the new

columndf_copy[['brand','flavor','My_New_Column']].head(3)df_copy[[‘brand’,’flavor’,’My_New_Column’]].head(3)

Adding new columns can be very useful. For example, suppose I wanted to match sodium with the mean amount.


#create a new column that displays the mean value of sodium.

df_copy['sodium_mean'] = df['sodium'].mean()df_copy[['brand','flavor','sodium','sodium_mean']].head(3)df_copy[[‘brand’,’flavor’,’sodium’,’sodium_mean’]].head(3)

Also read, types of Python frameworks for developing applications

Here are a couple of advanced examples, for instance, alternative ways to feature a column or developer features.

Create the top_count_string column 1 if the cost is greater than average.

Create a word_count column that outputs the number of words in the manufacture Description column.

There are probably multiple ways to approach the problems, but I’ll show you two. First, I will show how it can be done using a simple for loop.


Also read, types of Python frameworks for developing applications


#use a for loop to create a new column 
average_calories = df_copy['calories'].mean()above_average = []for calories in df_copy['calories']:
if calories average_calories:above_average.append(1)else:above_average.append(0)df_copy['above_average_calories'] = above_average
The for loop isn't optimal; check out all that code!

I would consider using a loop if readability is very important, but generally, I'll visit the pythonic route and attempt to use list comprehension.

Also read, types of Python frameworks for developing applications


#Use a list to create new columns 

# Create a new column that removes 1 if the cost is higher than the average.
Else 0df_copy['above_average_calories'] = [1 if n > average_calories else 0 for n in df_copy['calories']]#Create a word_count column that outputs the number of words in the 

#manufacture
Description
df_copy['word_count'] = [len(str(words).split(" ")) for words in df_copy['manufactureDescription']]

#Display the top 3 rows

df_copy[['brand','flavor','above_average_calories','word_count']].head(3)top 3 rows with new columns

create a word count column in another way by use pandas.

DataFrame[‘column’].apply() 

It might feel a little tricky at first, but apply lets me apply a function along the axis of a dataframe. What that means is I can apply a function to each column or row. 

In the example below, I use .apply() to apply a lambda function to each column (axis = 0 by default) of the dataframe.

#Create a new column that outputs the number of words in the manufacture Description

df_copy['word_count']= df_copy['manufactureDescription'].apply(lambda x: len(str(x).split(" ")))