[FIXED] bei Verwendung von group_by: TypeError: inkompatibler Index der eingefügten Spalte mit Frame-Index

Ausgabe

Ich habe ein df, das ich aus SQL gelesen habe:

      id  stock_id symbol        date   open   high      low  close  volume
0      1        35   ABSI  2022-09-28   3.06   3.33   3.0400   3.27  217040
1      2        35   ABSI  2022-09-29   3.19   3.19   3.0300   3.12  187309
2      3        35   ABSI  2022-09-30   3.11   3.27   3.0700   3.13  196566
3      4        35   ABSI  2022-10-03   3.16   3.16   2.8600   2.97  310441
4      5        35   ABSI  2022-10-04   3.04   3.37   2.9600   3.27  361082
..   ...       ...    ...         ...    ...    ...      ...    ...     ...
383  384        16    VVI  2022-10-03  31.93  33.85  31.3050  33.60  151357
384  385        16    VVI  2022-10-04  34.41  35.46  34.1900  35.39  105773
385  386        16    VVI  2022-10-05  34.67  35.30  34.5000  34.86   59605
386  387        16    VVI  2022-10-06  34.80  35.14  34.3850  34.50   55323
387  388        16    VVI  2022-10-07  33.99  33.99  33.3409  33.70   45187

[388 rows x 9 columns]

Ich versuche dann, den Durchschnitt der letzten 5 Tage zu erhalten und füge ihn einer neuen Spalte hinzu:

df['volume_5_day'] = df.groupby('stock_id')['volume'].rolling(5).mean()

Was mir folgenden Fehler gibt:

Traceback (most recent call last):
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 11003, in _reindex_for_setitem
    reindexed_value = value.reindex(index)._values
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/series.py", line 4672, in reindex
    return super().reindex(**kwargs)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 4966, in reindex
    return self._reindex_axes(
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 4981, in _reindex_axes
    new_index, indexer = ax.reindex(
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 4237, in reindex
    target = self._wrap_reindex_result(target, indexer, preserve_names)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 2520, in _wrap_reindex_result
    target = MultiIndex.from_tuples(target)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 204, in new_meth
    return meth(self_or_cls, *args, **kwargs)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 559, in from_tuples
    arrays = list(lib.tuples_to_object_array(tuples).T)
  File "pandas/_libs/lib.pyx", line 2930, in pandas._libs.lib.tuples_to_object_array
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/dan/Documents/code/wolfhound/add_indicators_daily.py", line 10, in <module>
    df['volume_10_day'] = df.groupby('stock_id')['volume'].rolling(1).mean()
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 3655, in __setitem__
    self._set_item(key, value)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 3832, in _set_item
    value = self._sanitize_column(value)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 4535, in _sanitize_column
    return _reindex_for_setitem(value, self.index)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 11010, in _reindex_for_setitem
    raise TypeError(
TypeError: incompatible index of inserted column with frame index

Irgendwelche Ideen, was hier falsch läuft? Früher hat das funktioniert und jetzt wirft es einen Fehler – und ich kann anscheinend nicht herausfinden, warum

Lösung

Kette Series.to_numpyzum Hinzufügen der Werte als np.arrayund stellen Sie sicher, dass Sie Folgendes sort=Falsehinzufügen df.groupby:

df['volume_5_day'] = df.groupby('stock_id', sort=False)['volume']\
    .rolling(5).mean().to_numpy()

print(df)

      id  stock_id symbol        date  ...      low  close  volume  volume_5_day
0      1        35   ABSI  2022-09-28  ...   3.0400   3.27  217040           NaN
1      2        35   ABSI  2022-09-29  ...   3.0300   3.12  187309           NaN
2      3        35   ABSI  2022-09-30  ...   3.0700   3.13  196566           NaN
3      4        35   ABSI  2022-10-03  ...   2.8600   2.97  310441           NaN
4      5        35   ABSI  2022-10-04  ...   2.9600   3.27  361082      254487.6
383  384        16    VVI  2022-10-03  ...  31.3050  33.60  151357           NaN
384  385        16    VVI  2022-10-04  ...  34.1900  35.39  105773           NaN
385  386        16    VVI  2022-10-05  ...  34.5000  34.86   59605           NaN
386  387        16    VVI  2022-10-06  ...  34.3850  34.50   55323           NaN
387  388        16    VVI  2022-10-07  ...  33.3409  33.70   45187       83449.0

Ihr anfänglicher Ansatz schlägt fehl, da die df.groupbyvon Ihnen verwendete Methode eine pd.Seriesmit einem anderen Index als Ihre zurückgibt df. Z.B:

print(df.groupby('stock_id')['volume'].rolling(5).mean().index)
MultiIndex([(16, 383),
            (16, 384),
            (16, 385),
            (16, 386),
            (16, 387),
            (35,   0),
            (35,   1),
            (35,   2),
            (35,   3),
            (35,   4)],
           names=['stock_id', None])

Es heißt also, dass es dies nicht abbilden kann auf:

print(df.index)
Int64Index([0, 1, 2, 3, 4, 383, 384, 385, 386, 387], dtype='int64')

Mit a np.arrayhast du dieses Problem nicht. Du hättest auch verwenden können:

df['volume_5_day'] = df.groupby('stock_id', as_index=False)['volume']\
    .rolling(5).mean()['volume']

In diesem Fall müssen Sie nicht hinzufügen sort=False, da es den Indexwerten korrekt entspricht.


Beantwortet von –
ouroboros1


Antwort geprüft von –
Dawn Plyler (FixError Volunteer)

0 Shares:
Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like