c ,python,热爱算法和机器学习
全部博文(1214)
分类: python/ruby
2021-07-27 17:27:34
dataframe.rolling(window, min_periods=none, center=false, win_type=none, on=none, axis=0, closed=none)
import pandas as pd import numpy as np df = pd.dataframe(np.random.randn(7, 4), index = pd.date_range('1/1/2020', periods=7), columns = ['a', 'b', 'c', 'd']) df a b c d 2020-01-01 -0.103252 -0.378633 -0.689324 -1.150870 2020-01-02 -0.838289 0.036139 -0.481754 -0.006116 2020-01-03 -0.832013 -0.770184 -1.818931 0.253601 2020-01-04 -1.696006 -0.021195 0.772365 0.332447 2020-01-05 -2.136677 1.088825 1.166188 0.140585 2020-01-06 -0.705095 0.709978 1.077941 0.055677 2020-01-07 0.990198 0.764884 0.858504 -0.903039 df.rolling(window=3).mean() a b c d 2020-01-01 nan nan nan nan 2020-01-02 nan nan nan nan 2020-01-03 0.079891 -0.714177 -0.453193 0.232669 2020-01-04 -0.479782 -0.513903 -0.631638 0.034099 2020-01-05 -0.574793 -0.532310 -0.544511 -0.535417 2020-01-06 -0.675196 0.421606 -0.214320 -0.463122 2020-01-07 -0.118239 0.637363 -0.270283 -0.653187 df.rolling(window=3, min_periods=1).mean() 设置最少观测值数量为1 a b c d 2020-01-01 -0.103252 -0.378633 -0.689324 -1.150870 2020-01-02 -0.470771 -0.171247 -0.585539 -0.578493 2020-01-03 -0.591185 -0.370893 -0.996670 -0.301128 2020-01-04 -1.122103 -0.251747 -0.509440 0.193311 2020-01-05 -1.554899 0.099149 0.039874 0.242211 2020-01-06 -1.512593 0.592536 1.005498 0.176237 2020-01-07 -0.617191 0.854562 1.034211 -0.235592
df2 = pd.dataframe({ "date": pd.date_range("2018-07-01", periods=7), "amount": [12000, 18000, np.nan, 12000, 9000, 16000, 18000]}) df2 date amount 0 2018-07-01 12000.0 1 2018-07-02 18000.0 2 2018-07-03 nan 3 2018-07-04 12000.0 4 2018-07-05 9000.0 5 2018-07-06 16000.0 6 2018-07-07 18000.0 窗口大小为2 df2.rolling(window=2, on="date").sum() date amount 0 2018-07-01 nan 1 2018-07-02 30000.0 2 2018-07-03 nan 3 2018-07-04 nan 4 2018-07-05 21000.0 5 2018-07-06 25000.0 6 2018-07-07 34000.0 窗口大小为2,最少观测值数量为1 df2.rolling(window=2, on="date", min_periods=1).sum() date amount 0 2018-07-01 12000.0 1 2018-07-02 30000.0 2 2018-07-03 18000.0 3 2018-07-04 12000.0 4 2018-07-05 21000.0 5 2018-07-06 25000.0 6 2018-07-07 34000.0 返回多个聚合结果,如sum()、mean() df2.rolling(window=2, min_periods=1)["amount"].agg([np.sum, np.mean]) sum mean 0 12000.0 12000.0 1 30000.0 15000.0 2 18000.0 18000.0 3 12000.0 12000.0 4 21000.0 10500.0 5 25000.0 12500.0 6 34000.0 17000.0 返回多个聚合结果,并进行重命名 df2.rolling(window=2, min_periods=1)["amount"].agg({"amt_sum": np.sum, "amt_mean": np.mean}) amt_sum amt_mean 0 12000.0 12000.0 1 30000.0 15000.0 2 18000.0 18000.0 3 12000.0 12000.0 4 21000.0 10500.0 5 25000.0 12500.0 6 34000.0 17000.0
# 自定义方法:求和后,除以100 df2.rolling(2, min_periods=1)["amount"].apply(lambda x: sum(x)/100, raw=false) 0 120.0 1 300.0 2 nan 3 nan 4 210.0 5 250.0 6 340.0
dataframe.expanding(min_periods = 1,center = false,axis = 0)
import pandas as pd import numpy as np df = pd.dataframe(np.random.randn(10, 4), index = pd.date_range('1/1/2018', periods=10), columns = ['a', 'b', 'c', 'd']) df a b c d 2018-01-01 -0.349086 -0.225357 -0.108829 1.662773 2018-01-02 1.056407 -0.159644 0.042278 0.298922 2018-01-03 -1.376891 0.112999 -0.719286 0.254892 2018-01-04 0.741323 1.510449 0.615251 -1.896209 2018-01-05 1.305841 0.380900 -0.961663 -0.654108 2018-01-06 -1.079804 -0.883547 0.149659 -0.065931 2018-01-07 0.240168 -0.409613 -0.543655 0.797564 2018-01-08 0.716836 -0.329991 0.271236 -2.138515 2018-01-09 -1.448734 1.261487 0.795663 -1.492216 2018-01-10 -1.212092 -1.039160 1.581169 1.156089 df.expanding(min_periods=2).mean() a b c d 2018-01-01 nan nan nan nan 2018-01-02 0.353660 -0.192500 -0.033276 0.980848 2018-01-03 -0.223190 -0.090667 -0.261946 0.738863 2018-01-04 0.017938 0.309612 -0.042647 0.080095 2018-01-05 0.275519 0.323869 -0.226450 -0.066746 2018-01-06 0.049632 0.122633 -0.163765 -0.066610 2018-01-07 0.076851 0.046598 -0.218035 0.056843 2018-01-08 0.156849 -0.000475 -0.156876 -0.217576 2018-01-09 -0.021549 0.139743 -0.051038 -0.359203 2018-01-10 -0.140603 0.021852 0.112182 -0.207674 # 判断expanding()的求和结果,与cumsum()结果,相同 result1 = df.expanding(min_periods=1).sum() result2 = df.cumsum() np.allclose(result1, result2) true