Question 1000324: ARIMA模型中的三个参数(p, d, q)都是什么意思?

统计/机器学习 时间序列

这三个数分别有什么具体的含义?能举例说明就更好了!谢谢!


Answer

Answer 1:

ARIMA(p,d,q)模型中

p是自回归(AR)的项数,用来获取自变量

d是差分(I)的系数,为了使时间序列平稳

q是移动平均(MA)的项数,为了使其光滑


举个例子,假如原来的时间序列是

T Y

1 20

2 22

3 25

4 30

5 40

6 50

7 65

8 88

9 112

10 120

11 115

12


我们先添加自回归项。自回归的意思就是用之前的p个数值当作自变量。若p=2,

T Y X_1 X_2

1 20 \ \

2 22 20 \

3 25 22 20

4 30 25 22

5 40 30 25

6 50 40 30

7 65 50 40

8 88 65 50

9 112 88 65

10 120 112 88

11 115 120 112

12 115 120


下面我们对自变量和应变量做差分。差分的意思就是后一行减前一行。d=1的意思,我们做一次差分。

T Y X_1 X_2

1 20* \ \

2 2 \ \

3 3 2 \

4 5 3 2

5 10 5 3

6 10 10 5

7 15 10 10

8 23 15 10

9 24 23 15

10 8 24 23

11 -5 8 24

12 -5 8

d=2的意思就是再上面的基础上再做一次差分。

T Y X_1 X_2

1 \ \ \

2 2* \ \

3 1 \ \

4 2 1 \

5 5 2 1

6 0 5 2

7 5 0 5

8 8 5 0

9 1 8 5

10 -16 1 8

11 -13 -16 1

12 -13 -16


下面再对每一列取移动平均数。q是移动平均的项数,意思每一行被自身和自身之前的q-1行的平均数取代。例如q=2,

T Y X_1 X_2

1 \ \ \

2 \ \ \

3 \ \ \

4 1.5 \ \

5 3.5 1.5 \

6 2.5 3.5 1.5

7 2.5 2.5 3.5

8 6.5 2.5 2.5

9 4.5 6.5 2.5

10 -7.5     4.5 6.5

11 -14.5 -7.5     4.5

12 -14.5 -7.5


下面就是提取出有数据的6到11行,把X_1, X_2当成自变量,Y当成应变量,用线性回归来预测第12行。最后记得把预测出来的结果用逆差分还原回去。


Question 1000346: 怎么判断一个时间序列是平稳的?

统计/机器学习 时间序列

我看了ARIMA模型中的三个参数(p, d, q)都是什么意思 这个问题。d的选择和时间序列是否平稳相关。那怎么判断时间序列是否平稳呢?


Answer

Answer 2:

ADF检验


Question 1000750: 如何判断时间序列的周期性?

统计/机器学习 时间序列

如何判断时间序列的周期性?


Answer

Answer 1:

判断周期性可以可视化数据来进行观察,当遇到此类问题毫无突破时,EDA或许可以给你办法.


Answer 2:

可以用频域的方法


Question 1007442: 请问我的训练值输出和预测值输出为什么一直不变

统计/机器学习 时间序列 TensorFlow

import numpy as np

import pandas as pd

import tensorflow as tf

from sklearn import metrics

from sklearn.metrics import accuracy_score, roc_auc_score, precision_score

from keras.utils import to_categorical

fileName1='data1.csv'

fileName2='data2.csv'

batchSize=1

ratio = 0.8

def readFile(fileName):

    data = pd.read_csv(fileName)

    allData = np.array(data.loc[:2000, :])

    xData=np.array(allData)

    return xData

# 打乱数据并生成batch数据集

def batch_data(data):

    # 打乱顺序

    data_size = data.shape[0] # 数据集个数

    arr = np.arange(data_size) # 生成0到data_size个数

    np.random.shuffle(arr) # 随机打乱arr数组

    data = data[arr] # 将data以arr索引重新组合

    # label = label[arr]

    # 将label以arr索引重新组合

    num = np.int(data.shape[0] * ratio)

    return data

def Transpose(stringArray):

    list=[]

    for i in range(stringArray.shape[0]):

        list.append(np.empty(shape=(2*len(stringArray[0][0]))))

    for t in range(stringArray.shape[0]):

     index = 0

     for i in range(len(stringArray[t][0])):

         list[t][index]=ord(stringArray[t][0][i])

         index=index+1

     for j in range(len(stringArray[t][1])):

        list[t][index] = ord(stringArray[t][1][j])

        index=index+1

    strarray=np.array(list)

    return strarray

def one_hotEncoder(totaldata,len):

 list1 = []

 for i in range(totaldata.shape[0]):

        list1.append(np.zeros(shape=(60,91)))

 for t in range(totaldata.shape[0]):

    data = totaldata[t][0]+totaldata[t][1]

    alphabet = 'abcdefghijklmnopqrstuvwxyz0123456789.,/?:;|]}[{=+-_)(*&^%$#@!`~ZXCVBNM<>ASDFGHJKLPOIUYTREWQ'

    char_to_int = dict((c, i) for i, c in enumerate(alphabet))

    integer_encoded = [char_to_int[char] for char in data]

    onehot_encoded = list()

    for value in integer_encoded:

        letter = [0 for _ in range(len(alphabet))]

        letter[value] = 1

        onehot_encoded.append(letter)

    # print('np.array(onehot_encoded).shape')#行数就是字符数

    list1.append(np.array(onehot_encoded))

 onehot_array=np.array(list1)

 return onehot_array

def dataSplit(transarray,stringarray):

    num = np.int(stringarray.shape[0] * ratio)

    x_train = transarray[:num,:,]

    y_train = stringarray[:num,2:]

    x_test = transarray[num:,:,]

    y_test = stringarray[num:,2:]

    return x_train,x_test,y_train,y_test

def active(n):

    if n>0.5:

        return 1

    else:

        return 0

roundT =1

learnRateT = 0.001

unitCount = 128

rowCount = 660

element_size = 28

time_steps = 28

num_classes =2

batch_size = 200

hidden_layer_size = 128

def builNetWork():

    # time_steps =cellCounts

    x = tf.placeholder(shape=[1,60,91], dtype=tf.float32)

    yTrain = tf.placeholder(shape=[1],dtype=tf.float32)

    rnn_cell = tf.contrib.rnn.GRUCell(hidden_layer_size)#hiden_layer_size就是最终输出的output的列数

    outputs, finalState = tf.nn.dynamic_rnn(cell=rnn_cell,inputs=x,dtype=tf.float32,time_major=False)#output的size为(batchsize,hiddenlayersize)

    outputs=tf.nn.dropout(outputs,0.01)

    w2 = tf.Variable(tf.constant(0.1,shape=[hidden_layer_size,1]), dtype=tf.float32)

    b2 = tf.Variable(tf.constant(0.1,shape=[1]), dtype=tf.float32)

    y = tf.sigmoid(tf.reduce_sum(tf.matmul(outputs[-1], w2) )+ b2)

    loss = tf.square(y - yTrain)

    train = tf.train.AdagradOptimizer(learnRateT).minimize(loss)

    return x, y, train, time_steps, yTrain,loss,outputs, finalState

#读取数据

data1=readFile(fileName1)

data2=readFile(fileName2)

#拼接数据

totaldata=np.vstack((data1,data2))

# #打乱数据

randomdata=batch_data(totaldata)

transarray=one_hotEncoder(randomdata,len)

# #数字化

# transarray=Transpose(randomdata)

#切分数据

x_train,x_test,y_train,y_test=dataSplit(transarray,randomdata)

#定义网络

x, y, train, time_steps, yTrain,loss,outputs, finalState=builNetWork()

#运行会话

sess = tf.Session()

sess.run(tf.global_variables_initializer())

for i in range(roundT):

    total_loss = 0.0

    for j in range(y_train.shape[0]):

        x_trains = np.reshape(x_train[j], (1, 60, 91))

        result = sess.run([loss,y],

                          feed_dict={x: x_trains, yTrain: y_train[j]})

        if j % 20 == 1:

            print("i: %d, loss: %s,y:%s,ytrain:%d\n"

                  % (i, result[0],result[1],y_train[j]))

#测试数据

result = np.empty(shape=(y_test.shape[0],1), dtype=np.int)

for i in range(y_test.shape[0]):

    x_trains = np.reshape(x_test[i], (1, 60,91))

    # print(x_trains)

    results = sess.run([y], feed_dict={x:x_trains})

    print(results)

    result[i][0] = results[0]

y1=[]

for i in range(result.shape[0]):

    y1.append(result[i][0])

y2=[]

for i in range(y_test.shape[0]):

    y2.append(y_test[i][0])

y_pred=np.array(y1)

y_true=np.array(y2)

print("accurcy")

print(accuracy_score(y_true, y_pred))

print("F1")

print(metrics.f1_score(y_true, y_pred, average='weighted'))

print("ROC")

print(roc_auc_score(y_true, y_pred))

print("召回率rec")

print(metrics.recall_score(y_true, y_pred, average='macro'))

print("pre准确率")

print(precision_score(y_true, y_pred, average='weighted'))


Answer

Answer 1:

试着调一下学习率吧


来自sofasofa(一个专业的机器学习社区),建议去sofa社区阅读,这里只是记录。防止网站在网络中走失。