Returning a dataframe in python function
Returning a dataframe in python function
Wwhen you call create_df()
, Python calls the function but doesnt save the result in any variable. That is why you got the error.
Assign the result of create_df()
to a new variable df
like this:
df = create_df()
df
Im kind of late here, but what about creating a global variable within the function? It should save a step for you.
def create_df():
global df
data = {
state: [Ohio,Ohio,Ohio,Nevada,Nevada],
year: [2000,2001,2002,2001,2002],
pop: [1.5,1.7,3.6,2.4,2.9]
}
df = pd.DataFrame(data)
Then when you run create_df(), youll be able to just use df.
Of course, be careful in your naming strategy if you have a large program so that the value of df doesnt change as various functions execute.
EDIT: I noticed I got some points for this. Heres another (probably worse) way to do this using exec. This also allows for multiple dataframes to be created, if desired.
import pandas as pd
def create_df():
data = {state: [Ohio,Ohio,Ohio,Nevada,Nevada],
year: [2000,2001,2002,2001,2002],
pop: [1.5,1.7,3.6,2.4,2.9]}
df = pd.DataFrame(data)
return df
### Well create three dataframes for an example
for i in range(3):
exec(fdf_{i} = create_df())
Then, you can test them out:
Input: df_0
Output:
state year pop
0 Ohio 2000 1.5
1 Ohio 2001 1.7
2 Ohio 2002 3.6
3 Nevada 2001 2.4
4 Nevada 2002 2.9
Input: df_1
Output:
state year pop
0 Ohio 2000 1.5
1 Ohio 2001 1.7
2 Ohio 2002 3.6
3 Nevada 2001 2.4
4 Nevada 2002 2.9
Etc.
Returning a dataframe in python function
Function explicitly returns two DataFrames:
import pandas as pd
import numpy as np
def return_2DF():
date = pd.date_range(today, periods=20)
DF1 = pd.DataFrame(np.random.rand(20, 2), index=date, columns=list(xyz))
DF2 = pd.DataFrame(np.random.rand(20, 4), index=date, columns=A B C D.split())
return DF1, DF2
Calling and returning two data frame
one, two = return_2DF()