Why doesn't copy_on_write work when modifying a copy of an entire DataFrame?

18 hours ago 1
ARTICLE AD BOX

copy_on_write only works when panda creates a view from another object eg. slicing or filtering.

df.copy() is a good way to fix this.

Tinyfold's user avatar

From the copy-on-write docs (emphasis mine):

CoW means that any DataFrame or Series derived from another in any way always behaves as a copy. As a consequence, we can only change the values of an object through modifying the object itself. CoW disallows updating a DataFrame or a Series that shares data with another DataFrame or Series object inplace.

You're getting this behavior because df_temp isn't "another" DataFrame that's "derived from" df, nor is it a "copy" of df. Because of the semantics of Python's assignment operator, it's literally df itself:

import pandas as pd pd.options.mode.copy_on_write = True df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}) df_temp = df assert df is df_temp

As such, CoW doesn't kick in; it wouldn't even be possible for Pandas to add support for this since assignment can't be overloaded.

df_temp = df.copy() is fine and (to me) clearer than relying on an implicit CoW anyways. If you don't actually need the entirety of df in df_temp, you could also just index the column(s) you'll be using; indexing does create a new DataFrame, so CoW will trigger as you expect.

Anerdw's user avatar

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Read Entire Article