Cryptography and Encryption

Aletheia – Machine Learning Image Steganalysis

Aletheia is a steganalysis tool for the detection of hidden messages in images.

The goal of steganalysis is to identify suspected packages, determine whether or not they have a payload encoded into them, and, if possible, recover that payload.  Unlike cryptanalysis, steganalysis generally starts with a pile of suspect data files, but little information about which of the files, if any, contain a payload. The steganalyst is usually something of a forensic statistician, and must start by reducing this set of data files (which is often quite large; in many cases, it may be the entire set of files on a computer) to the subset most likely to have been altered.


Aletheia Install

First you need to clone the GIT repository:

$ git clone

Inside the Aletheia directory you will find a requirements file for installing Python dependencies with pip:

$ sudo pip install -r requirements.txt

Aletheia uses Octave so you need to install it and some dependencies. You will find the dependencies in the octave-requirements.txt file. In Debian based Linux distributions you can install the dependencies with the following commands. For different distros you can deduce the appropriate ones.

$ sudo apt-get install octave octave-image

After that, you can execute Aletheia with:

$ ./ 

./ <command>


  Attacks to LSB replacement:
  - spa:   Sample Pairs Analysis.
  - rs:    RS attack.

  Feature extractors:
  - srm:    Full Spatial Rich Models.
  - srmq1:  Spatial Rich Models with fixed quantization q=1c.

  Embedding simulators:
  - hugo-sim:       Embedding using HUGO simulator.
  - wow-sim:        Embedding using WOW simulator.
  - s-uniward-sim:  Embedding using S-UNIWARD simulator.
  - hill-sim:       Embedding using HILL simulator.

  Model training:
  - esvm:  Ensemble of Support Vector Machines.
  - e4s:   Ensemble Classifiers for Steganalysis.


Statistical attacks to LSB replacement

LSB replacement staganographic methods, that is, methods that hide information replacing the least significant bit of each pixel, are flawed. Aletheia implements two attacks to these methods: the Sample Pairs Analysis (SPA) and the RS attack.

To execute the SPA attack to an included image with LSB replacement data hidden, use the following command:

$./ spa sample_images/lena_lsbr.png 
Hiden data found in channel R 0.0930809062336
Hiden data found in channel G 0.0923858529528
Hiden data found in channel B 0.115466382367

The command used to perform the RS attack is similar:

$./ rs sample_images/lena_lsbr.png 
Hiden data found in channel R 0.215602586771
Hiden data found in channel G 0.210351910548
Hiden data found in channel B 0.217878287806

In both cases the results provides an estimation of the embedding rate.


Machine Learning based attacks

Most of the state of the art methods in Steganography use some kind of LSB matching. These methods are verify difficult to detect and there is not enough with simple statistical attacks. We need to use machine learning.

To use machine learning we need to prepare a training dataset, used to train our classifier. For this example we will use a database of grayscale images called Bossbase.

$ wget
$ unzip

We are going to build a detector for the HILL algorithm with payload 0.40. So we need to prepare a set of images with data hidden using this algorithm. The following command embeds information into all the images downloaded:

$ ./ hill-sim bossbase 0.40 bossbase_hill040

With all the images prepared we need to extract features that can be processes by a machine learning algorithm. Aletheia provides different feature extractors, in this case we will use well known Rich Models. The following commands save the features into two files, on file for cover images and one file for stego images.

$ ./ srm bossbase bossbase.fea 
$ ./ srm bossbase_hill040 bossbase_hill040.fea

Now, we can train the classifier. Aletheia provides different classifiers, in this case we will use Ensemble Classifiers:

$ ./ e4s bossbase.fea bossbase_hill040.fea hill040.model
Validation score: 73.0

As a results, we obtain the score using a validation set (a small subset not used during training). The output is the file “hill040.model”, so we can use this for future classifications.

Finally, we can classifiy an image:

$ ./ e4s-predict hill040.model srm my_test_image.png
Stego, probability: 0.81

Download Aletheia