Advanced statistical steganalysis (Q983158)

It can be observed that the problem of hiding and detection of a covert payload within a cover object received much attention in the last 20 years, especially in the computer science, engineering and mathematics communities. These two fields: hiding and detection, i.e. steganography and steganalysis, became highly explored, and this is expressed (not only) by the number of new journals and articles dedicated to this field. But this new, emerging topic of interest sometimes seems to be quite difficult to browse, especially for those readers who are not necessarily active participants of the whole process. The present book seems to provide a good overview of the work done so far in the considered field and, moreover, provides an account of recent advances in steganalysis and steganography with emphasis on the role of empirical covers. It may seem that steganography is a new field of engineering and science, however, it is an ancient art and it was popularized in the computer science community by cryptographer Gustavus Simmons in 1983 in the case of endeavors to communicate in such a way that the existence of the hidden message cannot be detected. The information that steganography is an ancient art (even older than cryptography) seems to be a very interesting one and in the present book a short introduction is presented in Chapter 2.1.3, though we don't find a detailed exposition of this area. However, since the book is devoted to advanced statistical steganalysis, this gap can be excused. As an example dealing with this area in more detail, we refer to work by F.~A.~P. Petitcolas et al. [``Information hiding -- a survey'', in: Proceedings of the IEEE special issue on protection of multimedia content 87, No.~7, 1062--1078 (1999)], where the reader can find that one of the first known examples of this art is from about 440 B.C., and also the usage of ``invisible inks'' and other techniques that aren't mentioned in the present book. Steganography seems to be a very interesting topic because it can be also related to cryptography. But cryptography merely assumes all efforts performed to hide the information and this is done \textit{explicitly}, i.e. we can see that the secret message is transmitted, but we don't know its meaning, whereas in the case of steganography we can keep confidential even the fact that the communication takes place. This causes that steganography is rather an empirical discipline in contrast to the cryptology, which is strongly based on the theory of information (more generally: based on mathematical results). Now, we will focus on the organization of the book. As we can read in the Outline, the book is divided into three major parts. The first one (\(\approx\)100 pages) shows the general aspects and the theoretical framework of the considered problems. It consists of two chapters. Chapter 2 presents a review of the state of the art giving the essential background for understanding the research results presented in the whole book, and gives necessary definitions and terms used in the book. Successive subsections give information about: the basic communication model (steganographic or stego system) with presupposed notations and conventions; basic criteria of steganographic systems measures (capacity -- as the maximum length of a secret message; security -- as the strength to defeat detection; and robustness -- as the difficulty of removing hidden information from a stego object) with the adversary model as the set of assumptions defining the goals and limiting the computational power and knowledge of the steganalysis; two important paradigms (approaches) to construct steganographic systems with a presentation of appropriate models -- the first (still dominant) approach assumes fewer and small changes in the cover that are less detectable, while the second one is based on the replacement of the cover as input to the embedding function with one that is computer-generated by the embedding function; ways and domains of embedding with special focus on JPEG and MP3 multimedia formats as the most popular and suitable ones; the most popular embedding operations (operations with LSB (replacement and matching), Mod-\(k\) replacement and matching, adaptive methods); architecture of stego systems with problems of key distribution, maximization of embedding efficiency, options of coding a message to minimize the original information distortion; specific techniques of detection for steganalysis (JPEG histograms, universal detectors); some estimators for determination of the secret message length and possibilities to cover secret messages. In this chapter the reader can also find the limitations for the use of steganography -- two important things must be noted: firstly, the cover media should consist of multimedia objects (images, audio files and video) -- their transmission is inconspicuous and doesn't raise any doubts -- and, secondly, the cover object must be large compared to the size of the secret message, because even in the best methods no more that 1\% of the cover size can be used for embedding. Despite the fact that this part of the book focuses on basic aspects of the considered topic, the reader can't find here many examples that would illustrate the presented subject matter. In Chapter 3 we have a reformulation of the theory presented so far, so that it becomes applicable to empirical covers. This gives us a better understanding of the role of knowledge in the problem of choosing of covers in the construction of secure steganographic systems. This is achieved by presenting an epistemological and theoretical framework that describes the role of knowledge about a cover in the construction of secure steganographic systems and effective detectors. The author gives a better understanding of how specific advances in statistical steganalysis can be seen as instances of different types of refinements of a more general cover model. Going back to Simmons' proposal, we can see that messages exchanged between transmitter and receiver must be inconspicuous from the used, i.e. \textit{plausible}, covers. This becomes a very important problem, because the plausibility is a criterion that is empirical and very difficult to define by formal methods. We can hope that this method could be based on probability theory; however, in steganographic communication the partners in communication form rational expectations about what the warden might consider as plausible or not. These expectations may deviate from the partners' notion of plausibility, and the imagination of a universal probability function seems to be hard to define. But there is a solution of this problem -- a common approach is to define the plausibility in a probabilistic sense, i.e., likely messages are plausible \textit{by definition} and this leads to the plausibility heuristic. This requires at least three simplifications that go along with the common plausibility heuristic:{\parindent4mm \begin{itemize}\item[1)] a universal notion of plausibility instead of fragmented and context-specific ones; \item[2)] simplification of reasoning and cognition by probability functions; \item[3)] ignorance of strategic interaction by anticipation of the notion of plausibility. \end{itemize}} Basing on this theoretical framework and applying the plausibility heuristic we can formulate a probability space \((\mathcal{\Omega}, \mathcal{P}_0)\), where \(\mathcal{\Omega}\) represents \(\mathcal{X}^n\) as the set of all covers of size \(n\) and \(\mathcal{P}_0\) stands for the probability distribution of the cover objects that fulfils the probability axioms given by Kolmogorov. Taking these considerations and the Bayes theorem into account, one can define a set of probabilities that allow to classify cover and stego objects. Obviously it is easy to write down these formal relations; however, for practical systems the equations are of limited use for computational and epistemological reasons since, although \(\mathcal{\Omega}\) is finite, \(\mathcal{P}_0\) is in practice difficult to define. This can lead to pessimistic conclusions, but the impossibility of finding \(\mathcal{P}_0\) does not prevent practitioners from developing embedding functions and detectors -- if the models (considered here as the cover generating process) are good enough, i.e., their mismatch with reality is not substantial, this approach is acceptable. As we can see, even these shortly expressed doubts can be serious reasons for disregarding steganography, but on the other hand this cat-and-mouse race can also be a reason for finding better cover models. And in reality this is done, but not only by ad-hoc designed embedding functions, but also by options for formulation of cover models where one can find direct and indirect cover models, conditional models and stego models. Each of them has some advantages and disadvantages and their development is mostly based on a situation that is know in the case of tv-program scrambling: if one detector is able to uncover the hidden information it also serves as an example for efforts to fonding `better' models. This also leads to some improvements of cover finding, because not only can we use homogeneous models, but also combine them for the mixture cover models (heterogeneous ones). The considered section gives us also theoretical limits for the definition of \(\varepsilon\)-secure steganography. This is based on Cachin's approach (with some modifications) where perfectly secure steganography is a special case with \(\varepsilon=0\). This allows for formulation of observability bounds (steganalysis is always based on incomplete information) and computational bounds (steganalyst can have enough information for calculation of a hidden message, but it requires more computing cycles than available). At the end it also presents a very important observation: purely deterministic covers break the secrecy of the message and this implies breaking of steganographic security; thus nondeterministic covers are required for secure steganography. Usually this is achieved by a composition of deterministic and nondeterministic parts in the cover -- the first one is necessary to ensure plausibility, the second one is needed for steganographic security. The second part shows the specific advances in steganalysis and is made up of four chapters. Chapter 4 shows the results of research (and also their updates) that was completed in Spring 2004. It starts with the revision of ideas proposed by Phil Sallee for model-based steganography. This approach can be interpreted as an evolutionary combination of the decomposition idea and Wayner's mimic functions coupled with strong implications for the design of steganographic algorithms. As an example application of this approach the embedding function MB1 for JPEG images is explained -- MB1 embeds by modifying nonzero values of quantized coefficients of all AC DCT subbands; this ensures that the hidden message bits can always be extracted from the resulting JPEG file. For this method there is also a detector presented, for which a theoretical basis and an experimental validation is given. In Chapter 5 we don't have a presentation of a new improved method of detection, but we have considerations about a methodology to deal with sources of heterogeneous covers in the context of steganalysis. It is based on the observation that the success of steganalysis depends on: the embedding function, the detection method, the message, and the cover. The first two items of this list are at the centre of interest in academic papers. But studies of other influencing factors than message length in conjunction with cover size are pretty rare. Nevertheless, in the literature there is some evidence that the properties of covers can matter a lot; thus this chapter is mostly devoted to methodological questions in identifying and quantifying the influence of cover properties for existing detectors of the LSB replacement algorithm. In this chapter we have a detailed analysis of this problem and there are many interesting results presented in tables and figures. Each result is related to the theoretical framework, thus the reader has a comprehensive overview on statistical models for exploration of heterogeneity between covers for the class of quantitative detectors for LSB replacement steganography. As an important remark it is worth noting that in this chapter's summary the reader can find the following sentence (page 151, Section 5.3): ``We have identified a fat-tailed behaviour of the between-image error distribution, in particular one where the second moment (variance) is not necessarily finite. This finding puts a cautionary warning on attempts to model heterogeneity between images with standard statistical tools, which almost always require finite variance. This already affects summary statistics of steganalysis results over a test set of images.'' The existence of fat-tail distributions is well known not only in the case of statistics but also in thermodynamics and other fields of science that use statistical methods (e.g. economics). For many years this fact was rather considered as an \textit{accident during work}; however, nowadays it is rather considered that such distributions can have an appropriate basis -- they can be given based on Tsallis' proposal of non-extensive entropy. Details can be found in the literature, but here it is noted that with this new definition of entropy it is possible to use \(q\)-statistics, which gives interesting tools that can handle the problem of infinite variance. The next chapter is concerned on the Weighted Stego Image (WS) steganalysis, which is a quantitative detector for LSB replacement steganography -- it focuses on steganalysis for covers based on pictures. This chapter consist of two parts: the first one shows an improved model for WS and never-compressed covers, demonstrates its performance and identifies which cover sources lead to advantages of individual improvements. The proposed refined cover model for never-compressed files is based on assumptions on spatial correlations in natural images and empirical investigations of the distribution of better and less predictable areas therein. The second one focuses on problems due to heterogeneous covers -- a specialized WS cover model for JPEG pre-compressed covers is presented. For each part experimental results are presented. Chapter 7 is devoted to steganalysis of compressed audio streams, with special focus on the MP3 format because: this format has high popularity, the typical range of an MP3 file is between 2 and 4 MB and the nature of lossy compression is attractive for steganographic use. But the number of steganographic tools for MP3 is still quite limited. The framework of steganalysis for MP3 files is very specific. On the one hand we have a detector against MP3Stego methods that can distinguish MP3 files with and without steganographic content quite reliably, and this can be achieved for encoding engines based on 8hz-mp3; but other files are identified as false positives. The author presents a set of ten features and uses it to discriminate between 20 different MP3 encoders. The results indicate that the proposed approach is quite reliable, but one must be aware of its limitations -- it may need further development for example in the case of the range of supported bit rates or in the case of influence of stereo modes and other encoding options. Chapter 8, being the last chapter and also the third part of the book, called Synthesis, gives the summary of the whole work. In its first sentences the author writes that this book is ``the first comprehensive work focused on the role of cover in steganography and steganalysis, with an emphasis on the latter.'' And this seems to be the truth, because the author shows many different aspects of steganography and steganalysis. Most of them are based on empirical evidence, but the wide spectrum of results of research presented in this book shows that there is a need to formulate cover models as hypotheses on the cover distribution, which can be tested against empirical observations. The book has also 7 appendices which present some details about: covers used in the experiments, derivations and proofs for the WS Estimator, supplementary figures and tales, etc. As one can see, the organization of the book shows its content is well thought-out -- its three major parts present an approach that indicates that the author knew exactly how the book should look. But if you follow his home page and see there how many interesting publications about steganography and steganalysis (including his Ph.D. thesis) are in his portfolio, you will understand that this is not an accident. Rather, Böhme's work is a successful try to present some of the important key advances in steganalysis. Moreover, the reader will be enabled to improve his understanding of this new and very interesting area. So, in conclusion, I would like to state that my general opinion about the book is very positive. It presents many interesting achievements, shows the theoretical framework of the considerations and for many it can be a very interesting item in their library. This book is warmly recommended despite its few shortcomings. The author even has written that in the book we have ``the unprecedented presentation of the foundations of steganography and steganalysis''; however, in my honest opinion this presentation is rather very good than ``unprecedented'', but we can also agree that ``de gustibus non est disputandum''. Since there aren't many books in this field (this seems to be a bit strange, since this topic is not \textit{that} new) the present book seems to be a very interesting try to fill this gap. Obviously, one should be aware of some limitations that come with the book; the examples presented in its second part are mainly taken to illustrate only selected aspects of theory presented in Chapter 3, but not all. This is directly expressed for example in the summary, where the directions for further research are also presented, showing that there is a lot of work to do. This book can be used by: researches who work in information security and are looking for different approaches for data hiding than cryptography; students specializing in multimedia security, and the practitioners, who can also find a lot of very interesting examples. I found some typos, but they don't influence the general opinion. They are listed here mainly to improve the second edition of this book. Page 81, Paragraph 3, we have the following part of a sentence: ``\dots the probability axioms,; hence'' -- it seems the semicolon is a typo. Page 102, Table 3.2, probably it was printed based on its colored version in the original manuscript and thus its left column is in many places unreadable. Page 145, Table 5.5, and page 150, Table 5.6, again we have two tables in which some values in rows were probably in the original manuscript colored and in the book after printing they are not well readable. Page 146, Paragraph 3, we have the following sentence: ``This holds independently (see \(\langle 2\rangle\), \(\langle 3\rangle\) and Fig. 5.5 (a)--(b)) as well as jointly (\(\langle 4\rangle\))''. It seems that this sentence should be for example read: ``This holds independently (see specifications \(\langle 2\rangle\) and \(\langle 3\rangle\) and Fig. 5.5 (a)--(b)) as well as jointly (see specification \(\langle 4\rangle\))''. Pages 235--239, Tables G.1--G.8: there we have the same problem as mentioned above: some rows in the tables are filled with text that probably in the source version of the manuscript was given in colors and is thus hardly readable.

0 references

reviewed by

Dominik Strzałka

0 references

zbMATH Keywords

steganography

0 references

steganalysis

0 references

covert payload

0 references

embedding