LossySetItemError when converting column type of 'int64' to 'float64' with astype()

1 day ago 1
ARTICLE AD BOX

I've been trying to normalize some data in Python/Pandas, but I've encountered a bit of a conundrum. Normalizing data requires me to divide each column value by the maximum value of said column, thus yielding a range of decimal values from 0 to 1. Pandas has set all the column values of my DataFrame as the "int64" dtype, so such an operation is not acceptable. To get around this issue, I attempted to convert the column values to "float64" using the following set of code:

import pandas as pd import numpy as np data = pd.read_excel("file_name.xlsx", header = [10, 11]) # read in data; values are int for i in range(0, len(data.columns), 1): if i == 0: # the first column contains time values that are already dtype float64 pass else: data.iloc[:, i] = data.iloc[:, i].astype(np.float64) # this line throws an error

However, this throws a LossySetItemError: Invalid value 'value' Name: 'column_name', length 'number', dytype: 'float64' for dtype: 'int64 . My first approach to this problem was to force Pandas to read all numerical values in my excel file as 'float64' using dtype = float64, but this threw an error as well: Unable to convert column ('First_multi_index', 'second_multi_index') to type float64. I believe this is due to me having multi-indexed column names, which I need for ease of handling my data.

Some Google searching suggests that using astype() is the proper way to perform this conversion, so I'm at a loss as to why it isn't working. I must admit that I am not too terribly familiar with programming. I'm probably missing something obvious, if I had to wager a guess. Any help would be greatly appreciated. I've also provided the versions of the libraries I'm using below:

Python = 3.13.12 Pandas = 3.0.0 Numpy = 2.4.2

The exact error output is as follows (I've scrubbed some personal information out of it):

LossySetitemError Traceback (most recent call last) File ~\.conda\envs\stats\Lib\site-packages\pandas\core\indexing.py:2144, in _iLocIndexer._setitem_single_column(self, loc, value, plane_indexer) 2143 try: -> 2144 self.obj._mgr.column_setitem( 2145 loc, plane_indexer, value, inplace_only=True 2146 ) 2147 except (ValueError, TypeError, LossySetitemError) as exc: 2148 # If we're setting an entire column and we can't do it inplace, 2149 # then we can use value's dtype (or inferred dtype) 2150 # instead of object File ~\.conda\envs\stats\Lib\site-packages\pandas\core\internals\managers.py:1518, in BlockManager.column_setitem(self, loc, idx, value, inplace_only) 1517 if inplace_only: -> 1518 col_mgr.setitem_inplace(idx, value) 1519 else: File ~\.conda\envs\stats\Lib\site-packages\pandas\core\internals\managers.py:2220, in SingleBlockManager.setitem_inplace(self, indexer, value) 2217 if isinstance(arr, np.ndarray): 2218 # Note: checking for ndarray instead of np.dtype means we exclude 2219 # dt64/td64, which do their own validation. -> 2220 value = np_can_hold_element(arr.dtype, value) 2222 if isinstance(value, np.ndarray) and value.ndim == 1 and len(value) == 1: 2223 # NumPy 1.25 deprecation: https://github.com/numpy/numpy/pull/10615 File ~\.conda\envs\stats\Lib\site-packages\pandas\core\dtypes\cast.py:1725, in np_can_hold_element(dtype, element) 1724 # Anything other than integer we cannot hold -> 1725 raise LossySetitemError 1726 if ( 1727 dtype.kind == "u" 1728 and isinstance(element, np.ndarray) 1729 and element.dtype.kind == "i" 1730 ): 1731 # see test_where_uint64 LossySetitemError: The above exception was the direct cause of the following exception: TypeError Traceback (most recent call last) File c:\users\user\onedrive - library\projects\data analysis\general data analysis program\easy_data.py:58 56 tht_norm = dc.Tht_norm(directory) 57 case 3: # non functional ---> 58 replicate = dc.ThT_rep(directory) 59 case 4: 60 cd = dc.cd_plot(directory) File ~\OneDrive - library\Projects\Data Analysis\General Data Analysis Program\data_classes.py:189, in ThT_rep.__init__(self, directory) 187 self.data, self.headers, self.concentrations = super().determine_dataset(self.data) 188 self.replicate = self.select_rep(self.headers) --> 189 self.data = self.norm_data(self.data) 190 self.figure = super().initialize_plot() 191 self.plot_data(self.data, self.headers, self.concentrations, self.figure, self.replicate) File ~\OneDrive - library\Projects\Data Analysis\General Data Analysis Program\data_classes.py:217, in ThT_rep.norm_data(self, data) 215 data.iloc[:,i] = data.iloc[:,i] - minimum 216 maximum = data.iloc[:,i].max() --> 217 data.iloc[:, i] = data.iloc[:,i] / maximum 218 break 219 elif response == 'n' or response == 'N': File ~\.conda\envs\stats\Lib\site-packages\pandas\core\indexing.py:938, in _LocationIndexer.__setitem__(self, key, value) 933 self._has_valid_setitem_indexer(key) 935 iloc: _iLocIndexer = ( 936 cast("_iLocIndexer", self) if self.name == "iloc" else self.obj.iloc 937 ) --> 938 iloc._setitem_with_indexer(indexer, value, self.name) File ~\.conda\envs\stats\Lib\site-packages\pandas\core\indexing.py:1953, in _iLocIndexer._setitem_with_indexer(self, indexer, value, name) 1950 # align and set the values 1951 if take_split_path: 1952 # We have to operate column-wise -> 1953 self._setitem_with_indexer_split_path(indexer, value, name) 1954 else: 1955 self._setitem_single_block(indexer, value, name) File ~\.conda\envs\stats\Lib\site-packages\pandas\core\indexing.py:1997, in _iLocIndexer._setitem_with_indexer_split_path(self, indexer, value, name) 1993 self._setitem_with_indexer_2d_value(indexer, value) 1995 elif len(ilocs) == 1 and lplane_indexer == len(value) and not is_scalar(pi): 1996 # We are setting multiple rows in a single column. -> 1997 self._setitem_single_column(ilocs[0], value, pi) 1999 elif len(ilocs) == 1 and 0 != lplane_indexer != len(value): 2000 # We are trying to set N values into M entries of a single 2001 # column, which is invalid for N != M 2002 # Exclude zero-len for e.g. boolean masking that is all-false 2004 if len(value) == 1 and not is_integer(info_axis): 2005 # This is a case like df.iloc[:3, [1]] = [0] 2006 # where we treat as df.iloc[:3, 1] = 0 File ~\.conda\envs\stats\Lib\site-packages\pandas\core\indexing.py:2163, in _iLocIndexer._setitem_single_column(self, loc, value, plane_indexer) 2151 dtype = self.obj.dtypes.iloc[loc] 2152 if dtype not in (np.void, object) and not self.obj.empty: 2153 # - Exclude np.void, as that is a special case for expansion. 2154 # We want to raise for (...) 2161 # - Exclude empty initial object with enlargement, 2162 # as then there's nothing to be inconsistent with. -> 2163 raise TypeError( 2164 f"Invalid value '{value}' for dtype '{dtype}'" 2165 ) from exc 2166 self.obj.isetitem(loc, value) 2167 else: 2168 # set value into the column (first attempting to operate inplace, then 2169 # falling back to casting if necessary) TypeError: Invalid value '0 0.014300 1 0.008976 2 0.005148 3 0.009020 4 0.005676 1238 0.866068 1239 0.863472 1240 0.876716 1241 0.888596 1242 0.875308 Name: Sample X1, Length: 1243, dtype: float64' for dtype 'int64'

Also, here is a snippet of the data I'm working with. It should reliably generate the error:

TIME SIGNAL
0.00 1000.0
0.01 2000.0
0.02 3000.0
Read Entire Article