The bag model in language statistics (Q1857036)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | The bag model in language statistics |
scientific article |
Statements
The bag model in language statistics (English)
0 references
11 February 2003
0 references
Fuzzy quantitative models of language statistics are constructed. All suggested models are based on an assumption about a superposition of two kinds of uncertainties: probabilistic and possibilistic. The new approach has been applied to the representation of fuzzy sets as a result of the set splitting procedure into usual subsets of some universal set, which is convenient for describing possibilistic and probabilistic superpositions. Let \(\Omega \) be a finite set and \(A\) be any subset, \(A \subseteq \Omega \). Consider a correspondence \( I_{A} \rightarrow (I_{ \widetilde {A}}, I_{ \widetilde {{A}^{D}}}), \) where \(I_{A}\) is the indicator of the subset \( A,I_{ \widetilde {A}},I _{ \widetilde {{A}^{D}}} \in [0;1]^{\Omega} \) and \( I_{A}(\omega)= I_{\widetilde {A}}(\omega)+I_{\widetilde {{A}^{D}}}(\omega)\) \(\forall \in \Omega. \) \(A\) is the support of the mappings \( I_{ \widetilde {A}} \) and \(I_{ \widetilde {{A}^{D}}} \). According to Zadeh, the splitting components \(I _{\widetilde {A}}\) and \(I_{ \widetilde {{A}^{D}}}\) are fuzzy subsets of \(\Omega\). Call \(I_{ \widetilde {{A}^{D}}}\) the dual subset with respect to \(I_{ \widetilde {{A}^{\cdot}}}\) The procedure in which the indicator \(I_{A}\) is compared with the pair \((I_{ \widetilde{A}}, I_{ \widetilde{{A}^{D}}})\) is called ``splitting of indicator \(I_{A}\) (subset \(A\))''. The splitting procedure of some subsets \(A,B \subseteq \Omega\) induces the corresponding splitting of the union and intersection of these two subsets. For split indicators \(I_{ \widetilde {A \cap B}}\) and \(I_{ \widetilde {A \cup B}}\) it is essential to fulfill the natural conditions (as for non-split ones) \( I_{ \widetilde {A}}( \omega ),I_{ \widetilde{B}}( \omega ) \geq I_{ \widetilde {{A} \cap {B}}}( \omega), \) \( I_{ \widetilde{A}}( \omega ),I_{ \widetilde {B}}( \omega ) \leq I_{\widetilde {A \cup B}}( \omega), \omega \in \Omega.\) Then, as can be easyly seen for intersection and union indicators, the following expressions are obtained: \( I_{ \widetilde {A \cap B}} ( \omega)=I_{ \widetilde {A}} ( \omega) \wedge I_{ \widetilde{B}} ( \omega)\) \( \forall \omega \in \Omega\) \((\wedge=\min) \) (simultaneous splitting). \( I \approx_ {A \cap B} ( \omega)=I _{\widetilde {A}} ( \omega) \cdot I_{ \widetilde {B}}( \omega) \) \( \forall \omega \in \Omega \) (sequential splitting), \( I \approx_{A \cup B} ( \omega)=I _{ \widetilde {A}} ( \omega) + I_{ \widetilde {B}}( \omega) - I \approx _{A \cap B} ( \omega) \) \( \forall \omega \in \Omega \) (sequential splitting), \( I_{ \widetilde {A \cup B}} ( \omega)=I_{ \widetilde {A}} ( \omega) \vee I_{ \widetilde {B}}( \omega) \) \( \forall \omega \in \Omega\) \(( \vee \equiv \min) \) (simultaneous splitting). The set splitting procedure is a new tool for defining and calculating random fuzzy event probabilities. On this basis the following distributions have been obtained: Zipf-Mandelbrot, binomial distribution with fuzzy elementary events, fuzzy upper binomial distribution, fuzzy Fuck's distribution. All these distributions describe the possibilistic-probabilistic organization of structures created by different language elements. A general linear mathematical model of language structure is presented in the paper. The main characteristics of the model are components of the linguistic spectrum. Their determination is reduced to the solution of the system of equations: \[ \frac{{ \partial ^{k}}G(y, \alpha)}{ \partial y^{k}} \uparrow_{y\to 1}= \overline {{i(i-1) \cdots (i-k+1)}}^{\exp},\quad k=1,2, \dots, \] where \( G(y, \alpha)= \sum_{ \ell}P( \ell)y^{ \ell} \) \(P( \ell)\) is the probability distribution of the chosen model, \( \alpha \) is a known function of linguistic spectrum components and \( \overline {( \cdots)}^{\exp} \) are measured moments of gap distribution. A special method is elaborated for solving the system. The determination of the linguistic spectrum allows one to calculate the informational content of any given structure. In the final part of the paper applications of these models to language structures are presented.
0 references
fuzzy sets
0 references
membership functions
0 references
probability theory
0 references
linguistic modeling
0 references