System Administration

Assemblyline – Distributed File Analysis Framework

Assemblyline is a scalable distributed file analysis framework. It is designed to process millions of files per day but can also be installed on a single box.

Canada’s electronic spy agency says it is taking the “unprecedented step” of releasing one of its own cyber defence tools to the public, in a bid to help companies and organizations better defend their computers and networks against malicious threats.

An Assemblyline cluster consists of 3 types of boxes: Core, Datastore and Worker.

 

Components


Assemblyline Core

The Assemblyline Core server runs all the required components to receive/dispatch tasks to the different workers. It hosts the following processes:

  • Redis (Queue/Messaging)
  • FTP (proftpd: File transfer)
  • Dispatcher (Worker tasking and job completion)
  • Ingester (High volume task ingestion)
  • Expiry (Data deletion)
  • Alerter (Creates alerts when score threshold is met)
  • UI/API (NGINX, UWSGI, Flask, AngularJS)
  • Websocket (NGINX, Gunicorn, GEvent)

 

Assemblyline Datastore

Assemblyline uses Riak as its persistent data storage. Riak is a Key/Value pair datastore with SOLR integration for search. It is fully distributed and horizontally scalable.

 

Assemblyline Workers

Workers are responsible for processing the given files. Each worker has a hostagent process that starts the different services to be run on the current worker and makes sure that those service behave. The hostagent is also responsible for downloading and running virtual machines for services that are required to run inside of a virtual machine or that only run on Windows.


 

Prerequisites:

  • Ubuntu 14.04.x Server x64 installation media.
  • Install machine (or VM) should have at least 8GB RAM and 20GB of disk space
  • Accessible Ubuntu APT repository
  • You should know the Assemblyline username and password that you will use for the primary account. (we suggest ‘user’)
  • You should know the hostname that will be used for this node.

Single machine install:

  • Your appliance should have a minimum of 96GB RAM, 1TB of disk space and 16 Threads CPU.
  • You are on a network connected to the internet and can download files from Amazon S3
  • Appliance installation instruction

Cluster Install:

  • You have at minimum 12 servers: 1 core, 1 support/logger, 5 riak, 5 workers
  • Core server needs at least 16 CPU threads, 96GB ram and 1TB storage
  • Support/logger server needs at least 8 CPU threads, 48GB ram and 1TB storage
  • Riak nodes need at least 16 CPU threads, 96GB ram and 1TB storage
  • Worker nodes need any number of CPU threads (the more the better), 4GB of RAM/CPU Thread and 10GB of storage/CPU thread with a minimum of 256GB.
  • Workers should not be virtualized if possible to be able to run services that need to spin up virtual machines.
  • You are on a network connected to the internet and can download file from Amazon S3
  • Cluster Installation

Download Assemblyline