LossySetItemError when converting column type of 'int64' to 'float64' with astype()

1 day ago 1

ARTICLE AD BOX

I've been trying to normalize some data in Python/Pandas, but I've encountered a bit of a conundrum. Normalizing data requires me to divide each column value by the maximum value of said column, thus yielding a range of decimal values from 0 to 1. Pandas has set all the column values of my DataFrame as the "int64" dtype, so such an operation is not acceptable. To get around this issue, I attempted to convert the column values to "float64" using the following set of code:

import pandas as pd import numpy as np data = pd.read_excel("file_name.xlsx", header = [10, 11]) # read in data; values are int for i in range(0, len(data.columns), 1): if i == 0: # the first column contains time values that are already dtype float64 pass else: data.iloc[:, i] = data.iloc[:, i].astype(np.float64) # this line throws an error

However, this throws a LossySetItemError: Invalid value 'value' Name: 'column_name', length 'number', dytype: 'float64' for dtype: 'int64 . My first approach to this problem was to force Pandas to read all numerical values in my excel file as 'float64' using dtype = float64, but this threw an error as well: Unable to convert column ('First_multi_index', 'second_multi_index') to type float64. I believe this is due to me having multi-indexed column names, which I need for ease of handling my data.

Some Google searching suggests that using astype() is the proper way to perform this conversion, so I'm at a loss as to why it isn't working. I must admit that I am not too terribly familiar with programming. I'm probably missing something obvious, if I had to wager a guess. Any help would be greatly appreciated. I've also provided the versions of the libraries I'm using below:

Python = 3.13.12 Pandas = 3.0.0 Numpy = 2.4.2

The exact error output is as follows (I've scrubbed some personal information out of it):

LossySetitemError Traceback (most recent call last) File ~\.conda\envs\stats\Lib\site-packages\pandas\core\indexing.py:2144, in _iLocIndexer._setitem_single_column(self, loc, value, plane_indexer) 2143 try: -> 2144 self.obj._mgr.column_setitem( 2145 loc, plane_indexer, value, inplace_only=True 2146 ) 2147 except (ValueError, TypeError, LossySetitemError) as exc: 2148 # If we're setting an entire column and we can't do it inplace, 2149 # then we can use value's dtype (or inferred dtype) 2150 # instead of object File ~\.conda\envs\stats\Lib\site-packages\pandas\core\internals\managers.py:1518, in BlockManager.column_setitem(self, loc, idx, value, inplace_only) 1517 if inplace_only: -> 1518 col_mgr.setitem_inplace(idx, value) 1519 else: File ~\.conda\envs\stats\Lib\site-packages\pandas\core\internals\managers.py:2220, in SingleBlockManager.setitem_inplace(self, indexer, value) 2217 if isinstance(arr, np.ndarray): 2218 # Note: checking for ndarray instead of np.dtype means we exclude 2219 # dt64/td64, which do their own validation. -> 2220 value = np_can_hold_element(arr.dtype, value) 2222 if isinstance(value, np.ndarray) and value.ndim == 1 and len(value) == 1: 2223 # NumPy 1.25 deprecation: https://github.com/numpy/numpy/pull/10615 File ~\.conda\envs\stats\Lib\site-packages\pandas\core\dtypes\cast.py:1725, in np_can_hold_element(dtype, element) 1724 # Anything other than integer we cannot hold -> 1725 raise LossySetitemError 1726 if ( 1727 dtype.kind == "u" 1728 and isinstance(element, np.ndarray) 1729 and element.dtype.kind == "i" 1730 ): 1731 # see test_where_uint64 LossySetitemError: The above exception was the direct cause of the following exception: TypeError Traceback (most recent call last) File c:\users\user\onedrive - library\projects\data analysis\general data analysis program\easy_data.py:58 56 tht_norm = dc.Tht_norm(directory) 57 case 3: # non functional ---> 58 replicate = dc.ThT_rep(directory) 59 case 4: 60 cd = dc.cd_plot(directory) File ~\OneDrive - library\Projects\Data Analysis\General Data Analysis Program\data_classes.py:189, in ThT_rep.__init__(self, directory) 187 self.data, self.headers, self.concentrations = super().determine_dataset(self.data) 188 self.replicate = self.select_rep(self.headers) --> 189 self.data = self.norm_data(self.data) 190 self.figure = super().initialize_plot() 191 self.plot_data(self.data, self.headers, self.concentrations, self.figure, self.replicate) File ~\OneDrive - library\Projects\Data Analysis\General Data Analysis Program\data_classes.py:217, in ThT_rep.norm_data(self, data) 215 data.iloc[:,i] = data.iloc[:,i] - minimum 216 maximum = data.iloc[:,i].max() --> 217 data.iloc[:, i] = data.iloc[:,i] / maximum 218 break 219 elif response == 'n' or response == 'N': File ~\.conda\envs\stats\Lib\site-packages\pandas\core\indexing.py:938, in _LocationIndexer.__setitem__(self, key, value) 933 self._has_valid_setitem_indexer(key) 935 iloc: _iLocIndexer = ( 936 cast("_iLocIndexer", self) if self.name == "iloc" else self.obj.iloc 937 ) --> 938 iloc._setitem_with_indexer(indexer, value, self.name) File ~\.conda\envs\stats\Lib\site-packages\pandas\core\indexing.py:1953, in _iLocIndexer._setitem_with_indexer(self, indexer, value, name) 1950 # align and set the values 1951 if take_split_path: 1952 # We have to operate column-wise -> 1953 self._setitem_with_indexer_split_path(indexer, value, name) 1954 else: 1955 self._setitem_single_block(indexer, value, name) File ~\.conda\envs\stats\Lib\site-packages\pandas\core\indexing.py:1997, in _iLocIndexer._setitem_with_indexer_split_path(self, indexer, value, name) 1993 self._setitem_with_indexer_2d_value(indexer, value) 1995 elif len(ilocs) == 1 and lplane_indexer == len(value) and not is_scalar(pi): 1996 # We are setting multiple rows in a single column. -> 1997 self._setitem_single_column(ilocs[0], value, pi) 1999 elif len(ilocs) == 1 and 0 != lplane_indexer != len(value): 2000 # We are trying to set N values into M entries of a single 2001 # column, which is invalid for N != M 2002 # Exclude zero-len for e.g. boolean masking that is all-false 2004 if len(value) == 1 and not is_integer(info_axis): 2005 # This is a case like df.iloc[:3, [1]] = [0] 2006 # where we treat as df.iloc[:3, 1] = 0 File ~\.conda\envs\stats\Lib\site-packages\pandas\core\indexing.py:2163, in _iLocIndexer._setitem_single_column(self, loc, value, plane_indexer) 2151 dtype = self.obj.dtypes.iloc[loc] 2152 if dtype not in (np.void, object) and not self.obj.empty: 2153 # - Exclude np.void, as that is a special case for expansion. 2154 # We want to raise for (...) 2161 # - Exclude empty initial object with enlargement, 2162 # as then there's nothing to be inconsistent with. -> 2163 raise TypeError( 2164 f"Invalid value '{value}' for dtype '{dtype}'" 2165 ) from exc 2166 self.obj.isetitem(loc, value) 2167 else: 2168 # set value into the column (first attempting to operate inplace, then 2169 # falling back to casting if necessary) TypeError: Invalid value '0 0.014300 1 0.008976 2 0.005148 3 0.009020 4 0.005676 1238 0.866068 1239 0.863472 1240 0.876716 1241 0.888596 1242 0.875308 Name: Sample X1, Length: 1243, dtype: float64' for dtype 'int64'

Also, here is a snippet of the data I'm working with. It should reliably generate the error:

TIME SIGNAL

0.00	1000.0
0.01	2000.0
0.02	3000.0

Read Entire Article

LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.

LossySetItemError when converting column type of 'int64' to 'float64' with astype()

ARTICLE AD BOX

Related

Applying Kolmogorov-Smirnov (KS) test to evaluate multivariate synthetic tabular data (TVAE/TabDDPM vs. Empirical baseline)

Browser not Responsive when using Profile

How can I reliably recover and preserve page numbers from legal-document HTML/PDF text in Python at scale?

LEFT SIDEBAR AD