Distribution element: other type than String?

added A1.2.3 label

The following comment was added by NPL to the discussion:

It would be interesting to see statistical distribution to be implemented as enumeration of a list, say {dist1(para1, para2), dist2(para1) …} = {Guassian(mu, sigma), poison(lamda)…}. The implementation could share some similarity with the C language pointer to function? However, it would be difficult to exhaust all possible statistical distribution. For those distributions that not listed, will Latex style syntax be useful? As it is well known to most of the users in scientific field and the syntax is well developed.

I like to add the idea of allowing distribution names similar to those used in the R software for computational statistics.

@ISmith @YLuo

What do you think, can we make up a proposal for an implementation unit next monday?

Enumerations have the disadvantage, that we can not define individial parameter values like "dist1(par1,par2,par3)".

I started to elaborate a different idea:

The enumeration like structure may be realized with a regaular expression instead defining key words for distributions (i.e. normal, rectangular, ...) and add a variable list of parameter with some separation character like normal,0.14,0.02

The key words for distributions can be taken from GUM documents:

GUM S1 - JCGM 101:2008 - Table 1
- rectangular R(a, b)
- curvlineartrapezoid CTrap(a, b, d)
- trapezoidal Trap(a, b, β)
- triangular T(a, b)
- arcsine U(a, b) (sinusoidal - U shaped)
- normal N(x, u2(x)) (Gaussian)
- normal-multivaraite N(x,Ux)
- normal-bivariate N(x,Ux)
- gamma G(q+ 1,1) q = objects counted
- exponential Ex(1/x)
- t-distribution-1 tn−1( ̄x, s2/n)
- t-distribution-2 tνeff(x,(Up/kp)2)
GUM S2 - JCGM 102:201
- t-distribution-multivaraite tν( ̄x,S/n)

Parameters with degrees of freedom (DOF) in the examples above: ν, n-1, q

Parameters with geometrical shape of elements: d, β

All other parameters are already provided with the D-SI data "value", "uncertainty", "covarianceMatrix",...

The validation service from WP 3 could validate this statement.

Decision made:

keep as xs:string
append proposal on distribution identifiers later

closed

reopened

closed

reopened

@all

The issue was reopend to continue to define semantics for stating distributions in the D-SI.

First, please notice the outline of distributions listed in GUM in one of the previous comments. This could be a starting point for the specification of the distributions.

What needs to be discussed is how to provide the parameters of distribution elements (i.e. DOFs). Basic parameters as for example expectation value and variance are already provided within the elements unit and uncertainty of the D-SI.

A basic question is, if the distribution should contain all information. Then it would be possible to use the distribution as a standalone element. Or if it is accaptable to relate to data provided by real, complex, etc.

In the first case, we may simply use existing formats from e.g. the R language. In the latter case, we come up with a new format most probably.

From @MBrennecke in issue https://gitlab1.ptb.de/d-ptb/d-si/xsd-d-si/issues/20:

The element "expandedUnc" in the SI_Format contains the element "distribution", which is currently a string. Almost always the gaussian or rectangular distribution takes place in this area.

My suggestion is to insert a selection list / choice with the following items:

Normal
Rectangular
Triangular
U-Shaped
Step
other

This way, there is no misunderstanding when someone writes "gaussian" and someone else writes "normal", etc.

(Source for the names: Distribution for Uncertainty Estimation using the ISO Guide to the Expression of Uncertainty in Measurements)

Additionally, a string element can be added for "otherDistribution".

made the issue confidential

mentioned in issue #22 (closed)

Update of discussion from DCC development: https://gitlab1.ptb.de/d-ptb/dcc/xsd-dcc/-/issues/168

@CMueller-Schoell & @JHaller : Can we continue the discussion here? We had the impression, that the issue on degrees of freedom is a question regarding the provision of information on numerical distributions.

Please also see #33 regarding discussions about uncertainty dependencies and standard uncertainty.

added Product Backlog label

added Version 2.3.0-Beta label

removed A1.2.3 label

Distribution element: other type than String?

Designs

Child items ...

Activity