Pandas Tutorial : Pandas Interview Question

Pandas Tutorial

Python Pandas Interview Question

Pandas is a powerful Python library for data analysis and manipulation. It provides easy-to-use data structures and data analysis tools for handling and manipulating large amounts of data.

Here are some common pandas interview questions and examples:

How do you read a CSV file into a pandas DataFrame?


import pandas as pd
df = pd.read_csv('file.csv')

How do you select a column from a DataFrame?


# Select the "age" column 
df['age']
# You can also use the dot notation 
df.age

How do you select multiple columns from a DataFrame?


# Select the "age" and "name" columns 
df[['age', 'name']]

How do you select rows from a DataFrame based on a condition?


# Select rows where the age is greater than 30 
df[df.age > 30]

How do you group a DataFrame by a column and calculate the mean of each group?


df.groupby('gender').mean()

How do you handle missing values in a DataFrame?


# Drop rows with any missing values

df.dropna() 
# Fill missing values with 0

df.fillna(0)

How do you pivot a DataFrame?


# Pivot the DataFrame with index "id", columns "group", and values "value"
df.pivot(index='id', columns='group', values='value')

How do you merge two DataFrames on a common column?


df1 = pd.DataFrame({'key': ['a', 'b', 'c'], 'value': [1, 2, 3]}) 
df2 = pd.DataFrame({'key': ['a', 'b', 'd'], 'value': [4, 5, 6]})
# Inner join on "key" column 
pd.merge(df1, df2, on='key') 
# Outer join on "key" column 
pd.merge(df1, df2, on='key', how='outer')

How do you concatenate two DataFrames vertically or horizontally?


df1 = pd.DataFrame({'key': ['a', 'b', 'c'], 'value': [1, 2, 3]}) 
df2 = pd.DataFrame({'key': ['d', 'e', 'f'], 'value': [4, 5, 6]}) 
# Concatenate vertically 
pd.concat([df1, df2]) 
# Concatenate horizontally 
pd.concat([df1, df2], axis=1)

How do you apply a function to a column of a DataFrame?


import numpy as np 
# Calculate the absolute value of each element in the "value" column
df['value'].apply(np.abs) 
# Calculate the length of each name in the "name" column 
df['name'].apply(len) 
# You can also define your own function 
def add_one(x):
    return x + 1

df['value'].apply(add_one)

How do you sort a DataFrame by a column?


# Sort the DataFrame by the "age" column in ascending order 
df.sort_values('age') 
# Sort the DataFrame by the "age" column in descending order
df.sort_values('age', ascending=False)

How to?

Rename columns:


df.rename(columns={'old_name': 'new_name'}, inplace=True)

Drop columns:


df.drop(columns=['column_1', 'column_2'], inplace=True)

Replace values:


df.replace(to_replace=old_value, value=new_value, inplace=True)

how to use loc and iloc?

loc is used to index and slice data using label-based indexing, while iloc is used to index and slice data using integer-based indexing.

Certainly! Here are a few examples to illustrate the use of loc and iloc in pandas:


import pandas as pd 
# create a sample dataframe 
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['a', 'b', 'c']) 
#ind a  b  c 
# 0  1  2  3

# 1  4  5  6 
# 2  7  8  9 
# index the second column using label-based indexing with loc 
df.loc[:, 'b']
# 0    2 
# 1    5 
# 2    8 
# Name: b, dtype: int64


# index the second column using integer-based indexing with iloc 
df.iloc[:, 1] 
# 0    2 
# 1    5 
# 2    8 
# Name: b, dtype: int64


# slice the first two rows and first two columns using label-based indexing with loc 
df.loc[0:1, 'a':'b']
#   a  b 
# 0  1  2 
# 1  4  5 
# slice the first two rows and first two columns using integer-based indexing with iloc 
df.iloc[0:2, 0:2] 
#   a  b 
# 0  1  2 
# 1  4  5 
# index a single value using label-based indexing with loc

df.loc[1, 'c']
# 6 
# index a single value using integer-based indexing with iloc

df.iloc[1, 2]
# 6

How do you create a pivot table in pandas?


import pandas as pd
# Create a pivot table with index "city", columns "gender", and values "value"
pd.pivot_table(df, index='city', columns='gender', values='value')

How do you create a bar plot of a pivot table in pandas?


import matplotlib.pyplot as plt
# Create a pivot table

table = pd.pivot_table(df, index='city', columns='gender', values='value') 
# Plot the pivot table as a bar plot 
table.plot(kind='bar', stacked=True)
plt.show()

Explain all types of plots?

Line plot: A line plot is a way to display data along a number line. To create a line plot with pandas, you can use the plot function and specify the kind parameter as 'line'. For example:


import pandas as pd
import matplotlib.pyplot as plt
# Read in the data 
df = pd.read_csv('data.csv') 
# Create a line plot
df.plot(x='date', y='sales', kind='line') 
# Show the plot
plt.show()

Bar plot: A bar plot is a way to display data using bars. To create a bar plot with pandas, you can use the plot function and specify the kind parameter as 'bar'. For example:


# Create a bar plot
df.plot(x='country', y='sales', kind='bar') 
# Show the plot 
plt.show()

Scatter plot: A scatter plot is a way to display data using dots. To create a scatter plot with pandas, you can use the plot function and specify the kind parameter as 'scatter'. For example:


# Create a scatter plot 
df.plot(x='x_col', y='y_col', kind='scatter') 
# Show the plot 
plt.show()

Histogram: A histogram is a way to display the distribution of a numeric variable. To create a histogram with pandas, you can use the hist function. For example:


# Create a histogram 
df['column_name'].hist()
# Show the plot 
plt.show()

I hope this helps! Let me know if you have any other questions.

Techtalk - Latest Technology Updates

Search This Blog