Bernstein polynomials and learning theory (Q596812)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Bernstein polynomials and learning theory
scientific article

    Statements

    Bernstein polynomials and learning theory (English)
    0 references
    0 references
    0 references
    10 August 2004
    0 references
    Let \(f(x)= -x\log x- (I- x)\log(1 - x)\) for \(0\leq x\leq 1\); it is referred to as the entropy function. Let \(B_n[f]\) denote the nth Bernstein polynomial of \(f\). One half of the paper establishes the estimates \[ f(x)- B_n[f](x)\geq {1\over 2n}+ {1\over 20n^2 x(1-x)}- {1\over 12n^2}\text{ for }{15\over n}\leq x\leq 1-{15\over n}, \] \[ \lim_{n\to\infty}\, n\{f- B_n[f]\}(z/n)= -z\log z+ ze^{-z} \sum^\infty_{k=1} {z^k\over k!}\log(k+ 1) \] for \(\leq z\leq 15\). They are then used to establish the optimality of an encoding rule for the following problem. Let \(A_0,\dots, A_m\) be an alphabet of \(m+ 1\) letters which are to be encoded with code length \(\log 1/q_i\) for \(A_i\), where \(\sum q_i= 1\). The expected value of the code length is then \(\sum p_i\log 1/q_i\), where \(A_i\) occurs with the unknown probability \(p_i\), and would be minimised by \(q_i= p_i\). The difference \(\sum p_i\log p_i/q_i\) between the expected value and the minimum is the redundancy. A sample of letters of size \(n\) is available and each \(q_i\) is to be chosen, according to the number \(k_i\) of times that \(A_i\) occurs in the sample, by a rule \(q_i= Q(k_i)\). In the case \(m= 1\) a rule \(Q\) is found such that if \(F^*_n(p)\) is the expected redundancy, where \(p= p_1\) then \[ \lim_{n\to\infty}\,\sup_{0\leq p\leq 1}\, nF^*_n(p)= \textstyle{{1\over 2}} \] which by a result of \textit{T. M. Cover} [IEEE Trans. Inf. Theory 18, 216--217 (1972; Zbl 0231.94001)] is optimal. The general case \(m> 1\) is reduced to the case \(m= 1\). The calculations involved are lengthy, intricate and delicate. The paper is a sequel to a paper [How to achieve minimax expected Kullback-Leibler distance from an unknown finite distribution, Berlin: Springer (2002; Zbl 1024.68047] by the same authors et al., it improves a result of \textit{R. E. Krichevskiy} [IEEE Trans. Inf. Theory 44, No. 1, 296--303 (1998; Zbl 1053.94534)] and makes reference to \textit{J. Forster} [J. Comput. Syst. Sci. 65, No. 4, 612--625 (2002; Zbl 1058.68058)].
    0 references
    0 references
    0 references
    0 references
    0 references
    Bernstein polynomials
    0 references
    Entropy function
    0 references
    Learning theory
    0 references
    Optimal encoding
    0 references
    0 references