Data Journal is the latest iteration of my long-running quantified self project. This iteration of the whole Life Tracker/Personal Data Warehouse. It’s not technically a total re-write of my previous project, but it’s pretty darn close.

Tldr

  • Data Journal is a system for capturing, maintaining, & using event-based data
  • Data Journal is a set of data shapes (i.e. regular objects) & tools for operating on them
  • A DataJournal is comprised of arrays of Entrys and Defs, where:
    • A Def is a definition of a known key/value pair that may exist on Entrys
    • An Entry is an event that happened in a give time period, containing zero-to-many keys/value pairs described by Defs
  • DataJournals can be:
    • merged together without duplication
    • modified (via transactions)
    • queried (via querys)
  • The PDW is a system for managing multiple DataJournals, stored across disparate databases and/or files

A few more details:

  • The Data Journal code is not class-oriented.
    • All data shapes can be serialized to and parsed from JSON without data loss
    • No “instances of class” are required, everything is based on regular objects
    • Classes are used as namespaces for related functions
  • A DataJournal is comprised of an array of Entrys and an array of Defs
    • Metadata properties of elements (i.e. Entrys and Defs) start with an underscore
  • A Def defines known key/value pairs that may exist on Entrys
    • A Def must contain _id, _updated, and _type keys
    • Defs may have other keys as well
    • Def._id values cannot start with an underscore
  • An Entry is a record of something that happened at some point in time
    • An Entry must contain _id, _updated, and _period keys
    • An Entry may have other keys as well
    • An Entry typically contains one or many entry “points”, which have an associated Def
      • An entry point is a key/value pair on an Entry whose key is a Def._id
  • Merging two or more DataJournals will only keep one copy of each element based on its _id, in the case where multiple copies exist, it will only keep the one with the largest _updated value (i.e. the newest one is kept)
  • Data Journals may be written to (via transaction) or read from (via query) using regular objects
    • A Transaction may update elements via create, replace, modify, or delete
      • create
        • will always create & not look for existing data with the same .
      • replace
        • if the _id is not in the DataJournal, will create it
        • if the existing _id in the journal is older, will fully replace it
        • if the existing _id in the journal is newer, will not affect it at all
      • modify
        • if the _id is not in the DataJournal, will create it
        • if the existing _id is older, it will retain any keys not explicitly overwritten by the modification
        • if the existing _id is newer, will not affect it at all
      • delete
        • will always mark the matching element _id as _deleted = true
    • A Query is an object full of Entry-filtering parameters
  • Other utility classes operating on DataJournals do exist, but are decoupled from each other and the Data Journal code does not depend on them.
    • Examples: Summarizer, Validator, Overviewer, Aliaser, and a host of Translators and Connectors which allow for reading/writing from static files and databases, respectively

Main Concepts via Picture

Data Shapes

Main Processes

Connectors, Translators, & Utilities