收录日期:2019/10/18 22:49:05 时间:2016-03-25 09:17:48 标签:python,graph,tail,matplotlib

What is the best way to read a 1 GB file that gets time series data logged in it and generate a real time graph with two of its columns (one time and other a number)? I see that you have different ways of tailign the file.

Sounds like a good job for RRDTool.

But if you want to stick with Python, I would use tail to stream the data into my program (this is assuming the file is continuously written to, otherwise a straight open() in Python will work).

tail -F data.log | python myprogram.py

myprogram.py could look something like:

import sys

p = ... # create a pylab plot instance 
for line in sys.stdin:
    elements = line.split(',') # or whatever separator your file has in it
    p.add(element[0], element[1]) # add data to the pylab plot instance

As John mentioned, you can input the tail output into your file, but if you due to some reason wants to handle everything in your file and also want an example of somewhat dynamic graph, here it is( no work today;)

import math
import time
import pylab  

def getDataTest(filePath):
    s = 0
    inc = .05
    x_list=pylab.arange(0, 5.0, 0.01)
    while 1:
        s += inc
        if abs(s) > 1:
            inc=-inc

        y_list = []
        for x in x_list:
            x += s
            y = math.cos(2*math.pi*x) * math.exp(-x)
            y_list.append(y)

        yield x_list, y_list

def tailGen(filePath):
    f = open(filePath)
    #f.seek(0, 2) # go to end
    for line in f: yield line
    while 1:
        where = f.tell()
        line = f.readline()
        if line:
            yield line
        else:
            time.sleep(.1)
            f.seek(where)

def getData(filePath):
    x_list = []
    y_list = []
    maxCount = 10
    for line in tailGen(filePath):
        # get required columns
        tokens = line.split(",")
        if len(tokens) != 2:
            continue
        x, y = tokens
        x_list.append(x)
        y_list.append(y)
        if len(x_list) > maxCount:
            x_list = x_list[-maxCount:]
            y_list = x_list[-maxCount:]
            yield x_list, y_list

pylab.ion()
pylab.xlabel("X")
pylab.ylabel("Y")

dataGen = getData("plot.txt") # getDataTest("plot.txt") #
x_list, y_list = dataGen.next()
plotData, = pylab.plot(x_list, y_list, 'b')
#pylab.show()
pylab.draw()
for (x_list, y_list) in dataGen:
    time.sleep(.1)
    plotData, = pylab.plot(x_list, y_list, 'b')
    pylab.draw()

You can pickup elements from it and I think it will solve your problem.

Here's the unix pipe which has 3 parts: the tail'er, the filter (gawk), and the plotter (python).

tail -f yourfile.log | gawk '/PCM1/{print $21; fflush();}' | python -u tailplot.py

and here is the python script. You can feed it 1 (y) or 2 (x y) columns of data. If you don't use gawk, be sure to figure out how to disable buffering. sed -u for example.

pa-poca$ cat ~/tailplot.py

import math
import time
import sys
import pylab

pylab.ion()
pylab.xlabel("X")
pylab.ylabel("Y")

x = []
y = []
counter = 1
while True :
    line = sys.stdin.readline()
    a = line.split()
    if len(a) == 2:
      x.append(a[0])
      y.append(a[1])
    elif len(a) == 1:
      x.append(counter)
      y.append(a[0])
      counter = counter + 1
    pylab.plot(x, y, 'b')
    pylab.draw()