Python Pandas groupby apply lambda arguments

Python Pandas groupby apply lambda arguments

The apply method itself passes each group of the groupby object as the first argument to the function. So it knows to associate Weight and Quantity to a and b based on position. (eg they are the 2nd and 3rd arguments if you count the first group argument.

df = pd.DataFrame(np.random.randint(0,11,(10,3)), columns = [num1,num2,num3])
df[category] = [a,a,a,b,b,b,b,c,c,c]
df = df[[category,num1,num2,num3]]
df

  category  num1  num2  num3
0        a     2     5     2
1        a     5     5     2
2        a     7     3     4
3        b    10     9     1
4        b     4     7     6
5        b     0     5     2
6        b     7     7     5
7        c     2     2     1
8        c     4     3     2
9        c     1     4     6

gb = df.groupby(category)

implicit argument is each group or in this case each category

gb.apply(lambda grp: grp.sum()) 

The grp is the first argument to the lambda function
notice I dont have to specify anything for it as it is already, automatically taken to be each group of the groupby object

         category  num1  num2  num3
category                           
a             aaa    14    13     8
b            bbbb    21    28    14
c             ccc     7     9     9

So apply goes through each of these and performs a sum operation

print(gb.groups)
{a: Int64Index([0, 1, 2], dtype=int64), b: Int64Index([3, 4, 5, 6], dtype=int64), c: Int64Index([7, 8, 9], dtype=int64)}

print(1st GROUP:n, df.loc[gb.groups[a]])
1st GROUP:
  category  num1  num2  num3
0        a     2     5     2
1        a     5     5     2
2        a     7     3     4    


print(SUM of 1st group:n, df.loc[gb.groups[a]].sum())

SUM of 1st group:
category    aaa
num1         14
num2         13
num3          8
dtype: object

Notice how this is the same as the first row of our previous operation

So apply is implicitly passing each group to the function argument as the first argument.

From the docs

GroupBy.apply(func, *args, **kwargs)

args, kwargs : tuple and dict

Optional positional and keyword arguments to pass to func

Additional Args passed in *args get passed after the implict group argument.

so using your code

gb.apply(lambda df,a,b: sum(df[a] * df[b]), num1, num2)

category
a     56
b    167
c     20
dtype: int64

here num1 and num2 are being passed as additional arguments to each call of the lambda function

So apply goes through each of these and performs your lambda operation

# copy and paste your lambda function
fun = lambda df,a,b: sum(df[a] * df[b])

print(gb.groups)
{a: Int64Index([0, 1, 2], dtype=int64), b: Int64Index([3, 4, 5, 6], dtype=int64), c: Int64Index([7, 8, 9], dtype=int64)}

print(1st GROUP:n, df.loc[gb.groups[a]])

1st GROUP:
   category  num1  num2  num3
0        a     2     5     2
1        a     5     5     2
2        a     7     3     4

print(Output of 1st group for function fun:n, 
fun(df.loc[gb.groups[a]], num1,num2))

Output of 1st group for function fun:
56

Python Pandas groupby apply lambda arguments

Leave a Reply

Your email address will not be published. Required fields are marked *