InfluxDB is the top rated time series database on DB Engines, so it seemed like a good choice to use in order to learn about this type of system.
Time Series Databases:
A Guide To Time Series Databases
The above link gives a good intro to time series databases. These systems are optimized for processing a large amount of data, storing data points along with a timestamp. For example, storing weather data from different stations at a specified time interval. These are NoSQL systems, so they will be schemaless, so that new data points can be added without having to change a database schema. The data will be immutable, so once the data is written, there won’t be a need to go back to update or delete.
InfluxDB Terminology:
Measurement: Similar to a table in a relational database.
Bucket: Contains one or more Measurements.
Tag: One or more attributes to identify data. For example, if we were recording weather data, we would have tags to identify where the measurements are from, like a city or a specific weather station.
Fields: Attributes for the data that we’re storing.
Point: Similar to a row of data from a relational table.
Series: All of the points for the same tag(s) within a measurement.
InfluxDB Setup:
There is a Community edition that can be run on-premises. There’s also a free tier for the cloud edition, which doesn’t require a credit card to sign up for.
For accessing and working with InfluxDB, there are three options: An Influx CLI, a HTTP API, or the Influx UI (for a Cloud installation).
I’ve opted to use the cloud version and the UI to run through a basic setup. I’m using InfluxDB Cloud Serverless, Version 2.0.0.
Data:
InfluxDB – Get Started
Once I sign in, I see the Resource Center. For the first step, I’ll create a Bucket (database) by going to ‘Manage Databases & Security’, then ‘Database Manager’, ‘Go To Buckets’. Clicking ‘+ Create Bucket’ will give us a few options. We can set the bucket name (I’ll use Test as the bucket name), then set a retention period for the data (or opt to never delete the data).
Once the Bucket has been created, data can be added from the ‘+ Add Data’ button. There are a few different options, like using the API, the CLI, or uploading a CSV file. I’m selecting the ‘Line Protocol’ option, where I’ll copy data into the UI.
There is a specific format for the data, outlined on this page.
Test Data:
weather,city=Atlanta,state=GA temp=80,humidity=50 1684269089 weather,city=Miami,state=FL temp=85,humidity=59 1684269089 weather,city=Buffalo,state=NY temp=61,humidity=48 1684269089 weather,city=Seattle,state=WA temp=71,humidity=55 1684269089
For this dataset, the measurement (table) is ‘weather’, the tags (keys) are City and State, and we’re recording the fields of temp(temperature) and the humidity at the specified time. The last value is a Unix Timestamp. The whitespace is important, it separates the tags, fields, and timestamp.
On the Line Protocol page, I’ll select the ‘Enter Manually’ option, and paste in the test data. Since I’m using the Unix timestamp in seconds, I’ll need to set the Precision to Seconds before uploading (I didn’t do this at first, and got an error) Clicking ‘Write Data’ will store our test data.
There’s also an ‘Add Data’ option from the main Resource Center page.
Now I want to view the data I just added. Back on the main Resource Center page, I select ‘Query Data’ to see the available options. I’ll click ‘Go To Data Explorer’ to get to the query console.
The Data Explorer will allow us to write SQL queries to see our data. First select the correct Bucket(Test) and Measurement(weather) under ‘Schema Browser’. We can write a simple select to see our data:
SELECT * FROM weather;
We can add a WHERE clause to only see specific records. The default for string comparisons seems to be case-sensitive.
Next Steps and Conclusion:
I’ve worked through a simple scenario for a basic setup. Obviously, a real use case would be more complex.
Influx has created a Telegraf agent to write data to the database. There is a long list of plug-ins that can connect a data source to Influx.
There’s also an API that can be used to insert and extract data. An API Token can be generated in the UI in order to use the APIs.
Influx is schemaless, but it does use data types for each attribute. When I created the weather bucket, I didn’t define a schema. The default is for an implicit schema, and the data type for each attribute is set based on the initial set of data. It is possible to create an explicit schema that is user-defined.