Pandas Tutorial
Python Pandas Interview Question
Pandas is a powerful Python library for data analysis and manipulation. It provides easy-to-use data structures and data analysis tools for handling and manipulating large amounts of data.
Here are some common pandas interview questions and examples:
- How do you read a CSV file into a pandas DataFrame?
import pandas as pd
df = pd.read_csv('file.csv')
- How do you select a column from a DataFrame?
# Select the "age" column df['age']# You can also use the dot notation df.age
- How do you select multiple columns from a DataFrame?
# Select the "age" and "name" columns df[['age', 'name']]
- How do you select rows from a DataFrame based on a condition?
# Select rows where the age is greater than 30 df[df.age > 30]
- How do you group a DataFrame by a column and calculate the mean of each group?
df.groupby('gender').mean()
- How do you handle missing values in a DataFrame?
# Drop rows with any missing values
df.dropna() # Fill missing values with 0
df.fillna(0)
- How do you pivot a DataFrame?
# Pivot the DataFrame with index "id", columns "group", and values "value"
df.pivot(index='id', columns='group', values='value')
- How do you merge two DataFrames on a common column?
df1 = pd.DataFrame({'key': ['a', 'b', 'c'], 'value': [1, 2, 3]}) df2 = pd.DataFrame({'key': ['a', 'b', 'd'], 'value': [4, 5, 6]})# Inner join on "key" column pd.merge(df1, df2, on='key') # Outer join on "key" column pd.merge(df1, df2, on='key', how='outer')- How do you concatenate two DataFrames vertically or horizontally?
df1 = pd.DataFrame({'key': ['a', 'b', 'c'], 'value': [1, 2, 3]}) df2 = pd.DataFrame({'key': ['d', 'e', 'f'], 'value': [4, 5, 6]}) # Concatenate vertically pd.concat([df1, df2]) # Concatenate horizontally pd.concat([df1, df2], axis=1)
- How do you apply a function to a column of a DataFrame?
import numpy as np # Calculate the absolute value of each element in the "value" column
df['value'].apply(np.abs) # Calculate the length of each name in the "name" column df['name'].apply(len) # You can also define your own function def add_one(x): return x + 1
df['value'].apply(add_one)- How do you sort a DataFrame by a column?
# Sort the DataFrame by the "age" column in ascending order df.sort_values('age') # Sort the DataFrame by the "age" column in descending order
df.sort_values('age', ascending=False)
- How to?
- Rename columns:
df.rename(columns={'old_name': 'new_name'}, inplace=True)
- Drop columns:
df.drop(columns=['column_1', 'column_2'], inplace=True)
- Replace values:
df.replace(to_replace=old_value, value=new_value, inplace=True)- how to use loc and iloc?
loc is used to index and slice data using label-based indexing, while iloc is used to index and slice data using integer-based indexing.
Certainly! Here are a few examples to illustrate the use of loc and iloc in pandas:
import pandas as pd # create a sample dataframe df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['a', 'b', 'c']) #ind a b c # 0 1 2 3
# 1 4 5 6 # 2 7 8 9 # index the second column using label-based indexing with loc df.loc[:, 'b']# 0 2 # 1 5 # 2 8 # Name: b, dtype: int64
# index the second column using integer-based indexing with iloc df.iloc[:, 1] # 0 2 # 1 5 # 2 8 # Name: b, dtype: int64
# slice the first two rows and first two columns using label-based indexing with loc df.loc[0:1, 'a':'b']# a b # 0 1 2 # 1 4 5 # slice the first two rows and first two columns using integer-based indexing with iloc df.iloc[0:2, 0:2] # a b # 0 1 2 # 1 4 5 # index a single value using label-based indexing with loc
df.loc[1, 'c']# 6 # index a single value using integer-based indexing with iloc
df.iloc[1, 2]# 6- How do you create a pivot table in pandas?
import pandas as pd# Create a pivot table with index "city", columns "gender", and values "value"
pd.pivot_table(df, index='city', columns='gender', values='value')
- How do you create a bar plot of a pivot table in pandas?
import matplotlib.pyplot as plt# Create a pivot table
table = pd.pivot_table(df, index='city', columns='gender', values='value') # Plot the pivot table as a bar plot table.plot(kind='bar', stacked=True)
plt.show()- Explain all types of plots?
Line plot: A line plot is a way to display data along a number line. To create a line plot with pandas, you can use the plot function and specify the kind parameter as 'line'. For example:
import pandas as pd
import matplotlib.pyplot as plt# Read in the data df = pd.read_csv('data.csv') # Create a line plot
df.plot(x='date', y='sales', kind='line') # Show the plot
plt.show()Bar plot: A bar plot is a way to display data using bars. To create a bar plot with pandas, you can use the plot function and specify the kind parameter as 'bar'. For example:
# Create a bar plot
df.plot(x='country', y='sales', kind='bar') # Show the plot plt.show()Scatter plot: A scatter plot is a way to display data using dots. To create a scatter plot with pandas, you can use the plot function and specify the kind parameter as 'scatter'. For example:
# Create a scatter plot df.plot(x='x_col', y='y_col', kind='scatter') # Show the plot plt.show()Histogram: A histogram is a way to display the distribution of a numeric variable. To create a histogram with pandas, you can use the hist function. For example:
# Create a histogram df['column_name'].hist()# Show the plot plt.show()
Comments