How to calculate the mean value of a variable on a specific day of the year over several years

19 hours ago 1
ARTICLE AD BOX

I have an xarray DataArray that contains daily data for every day from 1970 to 2023 for the months October through March, and I want to calculate the average value of the variable for each day (e.g. the average value for all January 1sts).

My data is like this:

<xarray.DataArray 'u' (valid_time: 9990, latitude: 321, longitude: 361)> Size: 5GB dask.array<getitem, shape=(9990, 321, 361), dtype=float32, chunksize=(260, 161, 181), chunktype=numpy.ndarray> Coordinates: number int64 8B 0 pressure_level float64 8B 500.0 * latitude (latitude) float64 3kB 0.0 -0.25 -0.5 ... -79.5 -79.75 -80.0 * longitude (longitude) float64 3kB -100.0 -99.75 -99.5 ... -10.25 -10.0 * valid_time (valid_time) datetime64[ns] 80kB 1970-01-01 ... 2023-12-31

I've tried with groupby but only managed to group the data either by month or by day (i.e. all 1sts of the month regardless of month), not both. I can regroup each of these groups, but I bet there's a more elegant solution. Regrouping the groups looks like this:

da.groupby('valid_time.month')[1].groupby('valid_time.day')[1]

That would result in all the January 1sts, and I could technically do it for all the days.

I also tried something like in this post:

da.groupby(da.index.strftime('%m-%d')).mean()

but I get the message:

AttributeError: 'DataArray' object has no attribute 'index'

The desired result would be something of shape (D, latitude, longitude), where D is the number of days between October 1st and March 31st (without all the February 29ths, which were removed).

Read Entire Article