Python pandas is very powerful data processing library. It can take large data from variety of sources (CSVs, databases, excel etc) and process it in memory.
After processing the data you may have to dump it in influxdb. In influxdb each measurement has tags and fields. While writing pandas dataframe to influxdb we can define these tags.
We can use specific columns of data frame as tags along with some default tags. Let us understand this with a simple implementation.
Consider following CSV, sample.csv, having schools data of a City, State, country. Which means city, state and country are fixed tag and additionally we will make “School”, “Class” as dataframe row tags.
TimeStamp,School,Class,Students,Teachers,Absent
5/9/20 5:45:00 PM GMT,USSMS,UKG,30,2,3
5/9/20 5:45:00 PM GMT,USSMS,10,50,3,2
5/9/20 5:45:00 PM GMT,USSMS,8,40,2,0
5/9/20 6:45:00 PM GMT,St Thomas,Nur,30,2,4
5/9/20 6:45:00 PM GMT,St Thomas,6,55,3,4
5/9/20 7:45:00 PM GMT,St Thomas,9,34,2,2
Now let us look at the code, pyinflx.py.
#!/usr/bin/python3
import pandas as pd
import json
from influxdb import DataFrameClient
dbhost = 'localhost'
dbport = 8086
dbuser = 'admin'
dbpasswd = 'xxxxx'
dbname = 'schooldb'
protocol = 'line'
# Use only following fields from CSV.
Fields = ['TimeStamp','School','Class','Students','Absent']
# Define tag fields
datatags = ['School','Class']
# Define fixed tags
fixtags = {"Country": "India", "State": "Haryana", "City": "Kurukshetra"}
# Read data from CSV without index and parse 'TimeStamp' as date.
df = pd.read_csv("sample.csv", sep=',', index_col=False, parse_dates=['TimeStamp'], usecols=Fields)
# Set 'TimeStamp' field as index of dataframe
df.set_index('TimeStamp', inplace = True)
print(df.head())
client = DataFrameClient(dbhost, dbport, dbuser, dbpasswd, dbname)
# Write data to "SchoolData" measurement of "schooldb" database.
client.write_points(df,"SchoolData",tags=fixtags,tag_columns=datatags,protocol=protocol)
Fire the code !
./pyinflx.py
School Class Students Absent
TimeStamp
2020-05-09 17:45:00+00:00 USSMS UKG 30 3
2020-05-09 17:45:00+00:00 USSMS 10 50 2
2020-05-09 17:45:00+00:00 USSMS 8 40 0
2020-05-09 18:45:00+00:00 St Thomas Nur 30 4
2020-05-09 18:45:00+00:00 St Thomas 6 55 4
Check data in influxdb
> select * from SchoolData
name: SchoolData
time Absent City Class Country School State Students
---- ------ ---- ----- ------- ------ ----- --------
2020-05-09T17:45:00Z 2 Kurukshetra 10 India USSMS Haryana 50
2020-05-09T17:45:00Z 3 Kurukshetra UKG India USSMS Haryana 30
2020-05-09T17:45:00Z 0 Kurukshetra 8 India USSMS Haryana 40
2020-05-09T18:45:00Z 4 Kurukshetra 6 India St Thomas Haryana 55
2020-05-09T18:45:00Z 4 Kurukshetra Nur India St Thomas Haryana 30
2020-05-09T19:45:00Z 2 Kurukshetra 9 India St Thomas Haryana 34
Check tags in influxdb
> show tag keys
name: SchoolData
tagKey
------
City
Class
Country
School
State