2.2. Complex history database

Databases do not only store data, they are also storing relationships. We will use this feature to store the change history of our csv files into a database.

This time we will modify our CsvDocumentPlugin to store its history data into a database instead of a dictionary, which would get destroyed with each application exit.

2.2.1. Configuration preparation

Add the following config parameters to the configuration.py file of your application:

HISTORY_DATABASE_NAME = "HISTORY_DB"
HISTORY_DATABASE_DESCRIPTION = "DB for CSV history"
HISTORY_DATABASE_LOCATION = "%s/history_db.db" % APP_PATH
HISTORY_DATABASE_CONNECTION = "sqlite:///%s" % HISTORY_DATABASE_LOCATION

Nothing new happened here. We only use a different file for our database.

2.2.2. Model relationships

This time we need 4 database model classes. And it’s a good idea to store the class definitions inside a new python file.

So create models.py in the folder csv_manager/plugins/csv_document_plugin and add the following content:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from sqlalchemy import Column, ForeignKey, Integer, String, PickleType
from sqlalchemy.orm import relationship


def get_models(db):

    Base = db.Base

    class CsvFile(Base):
        __tablename__ = 'csv_file'

        id = Column(Integer, primary_key=True)
        name = Column(String(2048), nullable=False)
        created = Column(String(255))
        current_version = Column(Integer)

        # version_id = Column(ForeignKey(u'version.id'))
        # version = relationship(u'Version')

        def __str__(self):
            return str(self.name)

    class Version(Base):
        __tablename__ = 'version'

        id = Column(Integer, primary_key=True)
        version = Column(Integer, nullable=False)
        created = Column(String(255))
        csv_file_id = Column(Integer, ForeignKey('csv_file.id'))
        csv_file = relationship("CsvFile", backref="version")
        new_row = relationship("NewRow", back_populates="version", cascade="all, delete-orphan")
        missing_row = relationship("MissingRow", back_populates="version", cascade="all, delete-orphan")

        def __str__(self):
            return str(self.version)

    class MissingRow(Base):
        __tablename__ = 'missing_row'

        id = Column(Integer, primary_key=True)
        row = Column(PickleType, nullable=False)
        version_id = Column(Integer, ForeignKey('version.id'))
        version = relationship("Version", back_populates="missing_row")

        def __str__(self):
            return str(self.row)

    class NewRow(Base):
        __tablename__ = 'new_row'

        id = Column(Integer, primary_key=True)
        row = Column(PickleType, nullable=False)
        version_id = Column(Integer, ForeignKey('version.id'))
        version = relationship("Version", back_populates="new_row")

        def __str__(self):
            return str(self.row)

    return CsvFile, Version, MissingRow, NewRow

We put our classes into a function, because we need to provide the db-object [5]. This object will be created during application runtime and therefore it is not available during pythons instantiation phase.

Let’s take a look to the relationship between a csv_file and its versions [29+30]. As you can see, the relationship is defined only on the Version class.

First we a have created a value called csv_file_id, which shall store the unique ID of the linked csv_file. Then we have defined the relationship for the value csv_file [30]. If we request this value later, we will directly get the linked csv_file instance.

But we want also be able to access all linked versions from a csv_file instance, therefore we set the option backref. This options sets a new value into the csv_file, so that all versions can be accessed later via csv_file.version.

The final database model relationship layout is:

@startuml
skinparam backgroundColor transparent
skinparam shadowing false

class "CsvFile" as cf
class "Version"  as v
class "MissingRow" as m
class "NewRow" as n

cf <-- v
v <-- m
v <-- n

@enduml

Note

For a deeper look into SQLAlchemy relationships, please visit the related documentation at http://docs.sqlalchemy.org/en/latest/orm/basic_relationships.html

2.2.3. Storing the history

Let’s take a look into the final plugin code of your csv_document_plugin.py:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
from _datetime import datetime

from groundwork.patterns import GwDocumentsPattern
from groundwork_database.patterns import GwSqlPattern
from csv_manager.patterns import CsvWatcherPattern

from .models import get_models


class CsvDocumentPlugin(CsvWatcherPattern, GwDocumentsPattern, GwSqlPattern):
    def __init__(self, app, **kwargs):
        self.name = self.__class__.__name__
        super().__init__(app, **kwargs)
        self.archive = {}
        self.db = None
        self.CsvFile = None
        self.Version = None
        self.MissingRow = None
        self.NewRow = None

    def activate(self):
        this_dir = os.path.dirname(__file__)
        content_path = os.path.join(this_dir, 'csv_document_content.rst')
        with open(content_path, 'r') as doc_content:
            self.documents.register(name="CsvDocument",
                                    content=doc_content.read(),
                                    description="Stores pass csv watcher activities")

        self.signals.connect("csv_archive_receiver", "csv_watcher_change",
                             self._archive_csv_change, "listen to changes to archive them.")

        self.db = self.databases.register(self.app.config.get("HISTORY_DATABASE_NAME", "csv_history"),
                                          self.app.config.get("HISTORY_DATABASE_CONNECTION", "sqlite://"),
                                          self.app.config.get("HISTORY_DATABASE_DESCRIPTION", "Stores csv history"))
        self.CsvFile, self.Version, self.MissingRow, self.NewRow = get_models(self.db)
        self.db.classes.register(self.CsvFile)
        self.db.classes.register(self.Version)
        self.db.classes.register(self.MissingRow)
        self.db.classes.register(self.NewRow)
        self.db.create_all()

    def _archive_csv_change(self, plugin, **kwargs):
        csv_file = kwargs.get("csv_file", None)
        new_rows = kwargs.get("new_rows", None)
        missing_rows = kwargs.get("missing_rows", None)

        if csv_file is not None:

            # Csc file
            csv_file_object = self.CsvFile.query.filter_by(name=csv_file).first()
            if csv_file_object is None:
                csv_file_object = self.CsvFile(name=csv_file,
                                               created=datetime.now(),
                                               current_version=0)

            csv_file_object.current_version += 1
            self.db.add(csv_file_object)

            # Version
            version_object = self.Version(version=csv_file_object.current_version,
                                          created=datetime.now(),
                                          csv_file=csv_file_object)
            self.db.add(version_object)

            # Missing rows
            for missing_row in missing_rows:
                missing_row_object = self.MissingRow(row=missing_row, version=version_object)
                self.db.add(missing_row_object)

            # New rows
            for new_row in new_rows:
                new_row_object = self.NewRow(row=new_row, version=version_object)
                self.db.add(new_row_object)

            self.db.commit()

            self.log.debug("Change %s archived for %s" % (csv_file_object.current_version, csv_file_object.name))
            self.db.session.remove()

    def get_csv_history(self):
        self.db.session.remove()
        return self.db.query(self.CsvFile).all()

    def deactivate(self):
        pass

Inside the activate() function you will find the needed database setup and the database model registration [33-41].

The history handling is now completely done inside our database [43-79].

In line 61 you can see an example of how we can set up a relationship between a version and a csv_file. All we have to do is to use the queried csv_file instance as input for the parameter csv_file of the Version class [63].

2.2.4. Retrieving history data

We use our registered document to present stored history data. For this we must be able to retrieve all csv_file objects from our database inside jinja executions. For this case we have created the function get_csv_history(), which will be available inside jinja, because all plugin variables und functions are available inside documents.

Update the file csv_document_content.rst to get an idea of how it works:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
CSV Watcher Documentation
=========================

Registered csv watchers: {{ plugin.app.csv_watcher.get()|length}}

Files and versions
------------------
{% for csv_file in plugin.get_csv_history() %}
    {{csv_file.name}}
    {{"~"*csv_file.name|length}}
    {% for version in csv_file.version %}
        **{{version.version}}**

        Missing rows
        ++++++++++++
        {% for missing_row in version.missing_row -%}
            {{missing_row.row}}
        {% endfor -%}

        New rows
        ++++++++
        {% for new_row in version.new_row -%}
            {{new_row.row}}
        {% endfor -%}

    {% endfor %}
{% endfor %}

At that’s the end of our groundwork database related chapter.

On the next chapter groundwork and the web we will use the possibility to create, read, update and delete (CRUD) our database tables inside a web application with one additional line only. And we will design our own web view to present our just collected history data inside a browser.