Storing device parameter data for a measurement system

In summary, the new software will allow us to measure more devices simultaneously, keep track of what we are doing, and store data for each device for long term use. While I can't figure out what the best way to store the data is, I am worried about the structure of the data and lack of history preservation.
  • #1
f95toli
Science Advisor
Gold Member
3,498
1,046
TL;DR Summary
How to best organise data?
We are working on new software for one of our measurement systems (written in Python). The new system is capable of measuring more devices simultaneously than before so keeping track of what we are doing is important; it will also be more automated.
One of the things we want to implement is better handling of device parameter data,
The latter can mean parameters for one of the devices-under-test (DUT) as well as for some of the components used in the measurement chain (which needs to be re-calibrated once a day or so)

My problem is I can't figure what the best to handle the data is.
  • The number of devices are relatively small (maybe 20-40) at any given time but we will of course change DUTs between runs and we need a way to keep track of data for each DUT
  • Each device will probably have 5-10 parameters. Mostly floating point numbers but it would be nice to have the option to also save arrays.
  • The values of the parameters for each devices will change over time and will then need to be updated; either because we measure them for the first time; or because they are updated when they are re-calibration . We always need to know the "active" parameter, but being able to also store the history would be a big plus. For the components in the measurement chain it would also be nice to be able to look at trends over long times (months)
  • Long terms storage is very important; we will need to be able to find the data that belong to a specific device in a few years.
  • The data must be in format that is easy to read and understand
  • The software is in reality a number of Python scripts, we will never run more than one script at a time but it is important that reading/writing data is straightforward.
Does anyone have any experience of implementing something similar? The three options I have considered is
  • Just saving each data for each device in a unique file in a hierarchical file structure. The file format would be e.g. JSON of XML. Drawback: There is no obvious way to store history, the structure could end up being very messy.
  • The same as above but save data in HDF5 files (a format we already use); by saving parameters in arrays (updated vs time) to give us the ability to track history.
  • Use a database. We probably don't need a fully featured database but could use say sqlite. This looks quite good on paper, but saving all data in a single file makes me nervous, It would also make it harder for users to find and extract data unless they were very familiar with the database
Any other suggestions?
 
Technology news on Phys.org
  • #2
First off, if these devices do not already have serial numbers, you need to assign serial numbers to them. And if the procedure is to be automated, you need to have an automated way of reading those serial numbers form the device. If the device includes data processing unit or you can gain read access to a non-volatile memory component, then those would be good places to keep the serial number. Otherwise, you should physically label the devices - perhaps with a bar code - so that the serial number can be read automatically.

In the simplest case, it sounds like your basic data base structure will be one to three data sets.
I will describe the three-data-set version - except for an array it is fully normalized:

The first data set will use the serial number as the key. Each record will include all invariant data describing a specific DUT.
The second data set will use a "calibration ID" as the key. Each record will include all data specific to a calibration event. This will include who performed the calibration, the date and time of the calibration, and the serial numbers of the units tested/calibrated (this is the array I mentioned above).
The third data set will use a combined serial number and calibration ID as key. Each record will include the calibration data and test data that resulted from that calibration run against that DUT.

For efficiency, in the first data set, I would also include calibration ID of the last calibration event run against this unit (another de-normalization).

Since you mentioned JSON, I will provide a JSON example.
The following code would be exported from your data base into a *.js file.
Say the file name is "DUTData.js".
Code:
DeviceTestData=JSON.parse( '{\
  "DB":{"name":"DUT_Database","time":"202110011300","lastCalibID":"C004"},\
  "DUTs":{\
     "Dev0023":{"color":"brown","weight":7.3,"lastCalibID":"C004"},\
     "Dev0027":{"color":"green","weight":7.3,"lastCalibID":"C004"},\
     "Dev0028":{"color":"yellow","weight":7.3,"lastCalibID":"C004"},\
     "Dev0033":{"color":"red","weight":7.3,"lastCalibID":"C002"},\
     "Dev0045":{"color":"gray","weight":7.3,"lastCalibID":"C002"},\
     "Dev0088":{"color":"brown","weight":7.3,"lastCalibID":"C002"},\
     "Dev0133":{"color":"red","weight":7.3,"lastCalibID":"C004"},\
     "Dev0138":{"color":"silver","weight":7.3,"lastCalibID":"C004"} },\
   "Calibs":{\
     "C001":{"Op":"Gary","time":"202012132359","DUTCount":3,"DUTs":\
       ["Dev0023","Dev0027","Dev0028"]},\
     "C002":{"Op":"Lisa","time":"202012171200","DUTCount":3,"DUTs":\
       ["Dev0033","Dev0045","Dev0088"]},\
     "C003":{"Op":"Lisa","time":"202101211400","DUTCount":2,"DUTs":\
       ["Dev0133","Dev0138"]},\
     "C004":{"Op":"Jane","time":"202109251500","DUTCount":5,"DUTs":\
       ["Dev0023","Dev0027","Dev0028","Dev0133","Dev0138"]}\
   },\
   "TestResults":{\
     "Dev0023-C001":{"bounce":12.3,"squash":-7.7,"shift":3.4},\
     "Dev0023-C004":{"bounce":12.2,"squash":-1.1,"shift":3.4},\
     "Dev0027-C001":{"bounce":11.1,"squash":2.9,"shift":4.5},\
     "Dev0027-C004":{"bounce":11.2,"squash":2.9,"shift":4.4}}\
}')
 
Last edited:
  • Like
Likes sysprog
  • #3
My thoughts:
f95toli said:
  • The number of devices are relatively small (maybe 20-40) at any given time but we will of course change DUTs between runs and we need a way to keep track of data for each DUT
  • Each device will probably have 5-10 parameters. Mostly floating point numbers but it would be nice to have the option to also save arrays.
It doesn't sound like a lot of data - 40 devices with 10 paramaters calibrated once a day for 10 years is 1.5 million numbers - so reading and writing the whole dataset to disk each run (serialised as JSON) is probably feasible, although probably not the best solution.

f95toli said:
  • The data must be in format that is easy to read and understand
Be careful with this requirement: with human-readable files there is a conflict between the ease of reading any individual file and the ease of understanding the relationships between files that form the whole dataset, whereas, say, a .sqlite file is almost indecipherable as it is, but armed with a mysql client the structure, as well as the detail, becomes immediately apparent (and a .sql data dump is easily available as a human readable backup file).

So to your options:

f95toli said:
  • Just saving each data for each device in a unique file in a hierarchical file structure. The file format would be e.g. JSON of XML. Drawback: There is no obvious way to store history, the structure could end up being very messy.
Yes, and messy means difficult to back up, as well as to understand. I would recommend structuring this as a single JSON file for each device, storing the whole history for each paramater (as lists of (ISO timestamp, value) tuples).

f95toli said:
  • The same as above but save data in HDF5 files (a format we already use); by saving parameters in arrays (updated vs time) to give us the ability to track history.
I cannot see any advantage in this over a universal standard such as JSON.

f95toli said:
  • Use a database. We probably don't need a fully featured database but could use say sqlite. This looks quite good on paper, but saving all data in a single file makes me nervous, It would also make it harder for users to find and extract data unless they were very familiar with the database
A single file should not make you nervous - it is much easier to back up than a heierachical structure. And as already mentioned, a well structured and commented sql database is much easier to understand than a haphazard collection of flat files with undocumented (and possible broken) references.

I would probably go for a prettyprinted JSON file for each device, rewritten in its entirity after each change. For backup you could store these in a GIT repository and push changes to a remote backup (e.g. a free private repo on GitHub and/or Bitbucket and/or GitLab). Edit: you can document the data structure in a couple of markdown files in the same repo.

I don't see the benefit in a RDBMS including a fully normalized "calibrations" table of deviceId, measurementId, timestamp, value, unless you wanted to compare calibrations across subsets of a large number of devices; with only 40, or even 400 devices, you can just load the lot into memory as I said before.

Edit: I forgot to say that the other reason to go down the RDMBS route would be if there is a requirement for writing the data from more than one location: you don't want to be managing version conflicts between JSON files.
 
  • Like
  • Informative
Likes Twigg, jim mcnamara, sysprog and 1 other person
  • #4
Note that the sample data I provided in the earlier post is not complete. I only included sample test data for four of the 13 device/test combinations.

I'm using JSON just to demonstrate how you would access the data in that form (see below).
If you do serialize the data (for example, using a JSON file), I would keep all data in a single file. There will be enough maintenance just dealing with a multitude of serialized files from different dates. All the worse if some are for different DUTs.

From my example above, the "DATData.js" file would be included in an HTML file with:
(I am having trouble posting this next line. The ^ should actually be a less then symbol.
^script src="DUTData.js">^/script>

That would define the structure for DeviceTestData - with all of the test data.
Then (JavaScript examples):

DeviceTestData.DUTs["Dev0023"].color
would give you the color "brown" (an invariant attribute) of the device with serial number "DEV0023"

DeviceTestData.DB.time
would give you the time "202110011300" of this snapshot of your DUT database.

DUT="Dev0027"
DT.TestResults[DUT+"-"+DT.DUTs[DUT].lastCalibID]
would give you the most recent calibration record for device "Dev0027".
 
Last edited:
  • Like
Likes sysprog
  • #5
@.Scott data stored in JSON usually works better with a little less normalization. I was thinking of something like:
File: ./devices/393cfa90.json:
{
  // We won't have millions of devices so just use
  // the first part of a V4 UUID as a key.
  "id": "393cfa90",
  "name": "Dev0023",
  "manufacturer": "Acme Inc.",
  // Always best to store serial numbers as text because
  // they are not always numeric.
  "serialNo": "12334565",
  "attributes": {
    "color": "brown",
    "weight": 7.3
  },
  "calibrations": {
    "bounce": {
      "id": "87545579-d7dc-4547-af32-e7b0991272ef",
      "isotime": "2021-09-25T15:00Z",
      "value": 12.2,
      "history": [
        {
          "id": "9aad1170-1dff-4314-8c14-56f30925c080",
          "isotime": "2020-12-13T23:59Z",
          "value": 12.3
        },
        {
          "id": "87545579-d7dc-4547-af32-e7b0991272ef",
          "isotime": "2021-09-25T15:00Z",
          "value": 12.2
        }
      ]
    },
    "squash": {
      "id": "87545579-d7dc-4547-af32-e7b0991272ef",
      "isotime": "2021-09-25T15:00Z",
      "value": -1.1,
      "history": [
        {
          "id": "9aad1170-1dff-4314-8c14-56f30925c080",
          "isotime": "2020-12-13T23:59Z",
          "value": -7.7
        },
        {
          "id": "87545579-d7dc-4547-af32-e7b0991272ef",
          "isotime": "2021-09-25T15:00Z",
          "value": -1.1
        }
      ]
    },
    "shift": {
      "id": "87545579-d7dc-4547-af32-e7b0991272ef",
      "isotime": "2021-09-25T15:00Z",
      "value": 3.4,
      "history": [
        {
          "id": "9aad1170-1dff-4314-8c14-56f30925c080",
          "isotime": "2020-12-13T23:59Z",
          "value": 3.4
        },
        {
          "id": "87545579-d7dc-4547-af32-e7b0991272ef",
          "isotime": "2021-09-25T15:00Z",
          "value": 3.4
        }
      ]
    }
  }
}

File: ./calibrations.json:
{
  // I am using full V4 UUIDs here to avoid collisions
  // but your lab may have something unique already.
  "87545579-d7dc-4547-af32-e7b0991272ef": {
    // Always store time values in a format compatible
    // with ISO8601, preferably in UTC.
    "isotime": "2021-09-25T15:00Z",
    // We could use staff IDs here or whatever. Using
    // an array makes it more flexible.
    "performedBy": [
      "Jane Doe",
    ],
  },
  "9aad1170-1dff-4314-8c14-56f30925c080": {
    "isotime": "2021-09-25T15:00Z",
    "performedBy": [
      "Gary Einstein",
    ],
  }
 
  • Like
Likes sysprog
  • #6
pbuk said:
@.Scott data stored in JSON usually works better with a little less normalization.
I was addressing his issue with "looking messy".
So I was showing him a structure that isn't messy and will address his other requirements.
I showed the structure using a JSON snap shot because he indicated he was familiar with that.

That said, I'm not a fan of using GUIDs when there is already a good key available.
 
Last edited:
  • #7
.Scott said:
I was addressing his issue with "looking messy".
So I was showing him a structure that isn't messy and will address his other requirements.
I showed the structure using a JSON snap shot because he indicated he was familiar with that.
The problem for me is that working with normalised data in JSON is messy, it is just not designed for that. If you want to promote the advantages of normalisation here I think there is a better way.

.Scott said:
That said, I'm not a fan of using GUIDs when there is already a good key available.
Is there a good key available? If you are talking about serial numbers then let's look at some essential and desirable attributes of a primary key:
  • Unique - specialist test equipment is often produced in short runs: a big lab is likely to have a few 'SN004's.
  • Homogenous - some serial numbers are numeric, some alphanumeric, and many include -, /, # or other characters.
  • Partitionable - a key that includes 'ABCD00123400013', 'ABCD00123400105', 'ABCD00123400115' is inefficient.
 
  • #8
f95toli said:
  • The data must be in format that is easy to read and understand
  • The software is in reality a number of Python scripts, we will never run more than one script at a time but it is important that reading/writing data is straightforward.

I don't know the script/language discussed so this might be already addressed, but keep in mind that there is always a guy somewhere in every factory/lab who will try to read (and search) any database at hand with a text editor (or some other common, but absolutely inappropriate tool, like: excel).
 
  • Like
Likes Twigg
  • #9
Rive said:
I don't know the script/language discussed so this might be already addressed, but keep in mind that there is always a guy somewhere in every factory/lab who will try to read (and search) any database at hand with a text editor (or some other common, but absolutely inappropriate tool, like: excel).
Excel via ODBMS can be a perfectly appropriate tool for searching a database, and this functionality comes free with the RDBMS solution.
 
  • #10
Thanks for the suggestions.
One potential issue with using JSON is that it is not (yet) a widely used format for scientific data; whereas e.g. HDF5 is very widely used (and is supported by most of the the other SW we use).
That said, that fact that JSON is reasonably easy to read is a bonus; and I guess it would be possible to convert data to HDF5 when needed.

Are they are (preferably free) data browsers for JSON that works well will with scientific data(ideally with e.g. built in plotting)?
 
  • #11
f95toli said:
One potential issue with using JSON is that it is not (yet) a widely used format for scientific data
I don't agree with that, there is a vast amount of NASA data (and other US and UK government agency data) published in JSON e.g. https://api.nasa.gov/neo/rest/v1/feed?start_date=2015-09-07&end_date=2015-09-08&api_key=DEMO_KEY.

f95toli said:
HDF5 is very widely used (and is supported by most of the the other SW we use).
That is a valid point in favour of HDF5.

f95toli said:
That said, that fact that JSON is reasonably easy to read is a bonus;
Well it was a requirement that "the data must be [stored in a] format that is easy to read and understand" so isn't this rather more than a bonus?

f95toli said:
and I guess it would be possible to convert data to HDF5 when needed.
Yes of course.

f95toli said:
Are they are (preferably free) data browsers for JSON that works well will with scientific data(ideally with e.g. built in plotting)?
There is a JSON data browser built into FireFox, and extensions available for Chrome and Edge. I do not know of any generic browser that does plotting though: it would be very easy to build an HTML front end to your dataset using a JavaScript plotting library such as Plotly.js. This could be much more flexible than the HDFView plotter (for instance as far as I know you cannot compare time series data across different HDF databases).
 
  • #12
f95toli said:
Summary:: How to best organise data?

  • The number of devices are relatively small (maybe 20-40) at any given time but we will of course change DUTs between runs and we need a way to keep track of data for each DUT
  • Each device will probably have 5-10 parameters. Mostly floating point numbers but it would be nice to have the option to also save arrays.
  • The values of the parameters for each devices will change over time and will then need to be updated; either because we measure them for the first time; or because they are updated when they are re-calibration . We always need to know the "active" parameter, but being able to also store the history would be a big plus. For the components in the measurement chain it would also be nice to be able to look at trends over long times (months)
Apologies if you mentioned it already, but how are you going to serialize each device, or alternately track it some other way (numbered trays?)? How complex are these devices (single transistor or uC level complexity)? Do they contain any OTP memory or some R/W non-volatile memory?

Are some of these devices from corner lots? How are you envisioning keeping track of that information? Are you storing wafer coordinates for each device as well?
 

What is the purpose of storing device parameter data for a measurement system?

The purpose of storing device parameter data for a measurement system is to keep a record of the specific settings and characteristics of the devices being used in the system. This allows for consistency and accuracy in measurements, and also provides a reference for troubleshooting and maintenance purposes.

What types of device parameters are typically stored in a measurement system?

The types of device parameters that are typically stored in a measurement system include sensor calibration data, measurement ranges, accuracy specifications, communication protocols, and any other relevant settings or characteristics that may affect the readings obtained from the device.

How is device parameter data stored in a measurement system?

Device parameter data can be stored in a variety of ways, such as in a database, a spreadsheet, or within the software of the measurement system itself. The method of storage will depend on the specific system and its requirements.

Why is it important to regularly update and maintain device parameter data in a measurement system?

Regularly updating and maintaining device parameter data in a measurement system is important because devices can change over time due to wear and tear, environmental factors, or software updates. By keeping the data up to date, the system can continue to produce accurate and reliable measurements.

What are the potential consequences of not storing device parameter data for a measurement system?

The potential consequences of not storing device parameter data for a measurement system include inaccurate or inconsistent measurements, difficulty in troubleshooting issues, and increased risk of errors or malfunctions. This can lead to wasted time and resources, as well as potentially compromising the integrity of the data being collected.

Similar threads

  • Programming and Computer Science
Replies
1
Views
283
  • Programming and Computer Science
Replies
14
Views
644
  • Programming and Computer Science
Replies
22
Views
922
  • Programming and Computer Science
Replies
2
Views
860
  • Programming and Computer Science
2
Replies
65
Views
2K
  • Programming and Computer Science
Replies
19
Views
2K
  • Programming and Computer Science
Replies
6
Views
1K
  • Programming and Computer Science
Replies
15
Views
1K
  • Programming and Computer Science
Replies
8
Views
2K
  • Programming and Computer Science
Replies
19
Views
1K
Back
Top