How to build a secure AWS infrastructure

 

Every day more businesses migrate from their traditional IT infrastructure, while the pandemic has only accelerated the adoption of cloud technologies among remote workforces. Cloud services such as Amazon Web Services (AWS) have been widely accepted as a channel for cloud computing and delivering software and applications to a global marketplace, cost effectively and securely. However, cloud consumers tend to wash their hands of the responsibility towards securing their cloud infrastructure. 

Cloud service providers and consumers share the responsibility of ensuring a safe and secure experience on the cloud. While service providers are liable for the underlying infrastructure that enables cloud, users are responsible for the data that goes on the cloud and who has access to it. 

AWS cloud service

The AWS Well-Architected Framework is a guide/ whitepaper issued by Amazon on AWS key concepts, design principles, and architectural best practices. Security is one of the five pillars that this Framework is based on, upholding the fact that protecting your data and improving security is crucial for AWS users. This blog intends to summarize the whitepaper on the security pillar and discuss:

  • Design principles for AWS
  • Few use case scenarios, and 
  • Recommend ways to implement a securely designed AWS infrastructure. 

 

AWS provides a variety of cloud services, for computation, storage, database management, etc. A good architecture commonly focuses on the efficient methods for reaching peak performance, scalable design, and cost saving techniques. But other cloud infrastructure design aspects are given more importance, quite often, compared to the security dimension.

The security of the cloud infrastructure can be divided into five phases:

  1. Identity verification and access management with respect to AWS resources.
  2. Attack detection, identification of potential threats and misconfigurations.
  3. Controlling access via defining trust boundaries, applying best practices in operation.
  4. Classifying all data, protecting data at all states: rest and transit.
  5. Incident response: Pre-defined mechanisms to respond and mitigate any surfacing security incident.

 

The Shared Responsibility Model

As I mentioned earlier, it is the collective responsibility of the user and the AWS service provider to secure the cloud infrastructure. It is important to keep this in mind while we explore the different implementation details and design principles. 

AWS provides plenty of monitoring, protection and threat identification tools to reduce the operational burden of its users, and it is very important to understand and choose an appropriate service to achieve a well secured environment.

AWS offers multiple services of different nature and use cases such as EC2 and Lambda. Each of these cloud services have varying levels of abstraction that enable users to focus on the problem to be solved instead of its operation. The share of each party’s responsibilities similarly vary based on the level of abstraction. With higher levels of abstraction, the share of responsibility to provide security in the cloud shifts further to the service providers (with some exceptions).

AWS - Shared Responsibility Model
AWS – Shared Responsibility Model

 

Management and Separation of User Accounts to Organise Workload

Based on the nature of processes that are run on AWS, and the sensitivity of the data that is processed, workloads can change. They must be separated by a logical boundary and organised into multiple user accounts to make sure that different environments are isolated. For instance, the production environment commonly has stricter policies, more compliance requirements, and must be isolated for the development and test environments.

It is important to note that the AWS root user account must not be used for common operations. And using AWS Organizations one could simplify things and create multiple users under the same organisation, with different access policies and roles. Also, it is ideal to enable Multi-Factor Authentication, especially on the root account.

 

Managing Identity and Permissions

AWS Resources can be accessed by humans (such as developers or app users) or machines (such as EC2 instance or Lambda functions). Setting up and managing an access control mechanism based on the identity of the requester is very important, as these individuals seeking access could be an external or internal part of the organization. 

Each account should be granted access to different resources and actions using IAM (Identity and Access Management) roles, with policies defining the access control rules. Based on the identity of the user account and the IAM attached, certain critical functionalities can be disabled. For example, denying certain changes from all the user accounts, with exceptions for the Admin. Or preventing all users from deleting Amazon VPC flow logs.

For each identity added on AWS Organisation, they should be given access to only a set of functions that are necessary to fulfil the required tasks. This will limit unintended access to functionalities. And unexpected behaviours arising from any identity will only have a small impact. 

 

Leveraging AWS Services to Monitor and Detect for Security Issues

Regular collection and analysis of logs generated from each workload component is very important to detect any unexpected behaviour, misconfiguration or a potential threat. However, collection and analysis of logs is not quite enough. The volume of incoming logs can be huge, and an alerting and reporting flow should be set up along with an integrated ticketing system. AWS provides services such as these to ensure automated and easy processes:

  • CloudTrail: Provides the event history of the AWS account activity which includes all AWS services, Management console, SDKs, CLIs, etc.
  • Config: Enables automated assessment, auditing, and evaluation of the configuration of each AWS resource.
  • GuardDuty: Continuous security monitoring service that flags malicious activity surfacing within AWS environments by analysing log data and searching for patterns that may indicate any sort of privilege escalation, exposed credentials, established connections to malicious IPs, or domains.
  • Security Hub: Presents a comprehensive view of the security status of AWS infrastructure by enabling aggregation, prioritization, deduplication of security alerts from multiple AWS services and even third party products.

 

Protecting the Infrastructure: Networks and Compute

Obsolete software programmes and outdated dependencies are not unusual and it is essential to patch all systems in the infrastructure. This can be done manually by system administrators, but it is better to use the AWS Systems Manager Patch Manager which basically automates the process of applying patches to the OS, applications and code dependencies.

It is crucial to set up AWS security groups in the right way, mainly during the phase when the infrastructure is growing at a fast rate. Things often go wrong when unorganized, messy security groups are added to the infrastructure. Creation of security groups and assignment of them should be dealt with caution, as even a slight overlook can result in the exposure of critical assets and data stores, on the internet. Security groups should clearly define ingress and egress traffic rules, which can be set under the Outbound traffic settings. 

If some assets are required to be exposed on the internet, make sure your network is protected against DDoS attacks. AWS services such as Cloudfront, WAF, and Shield help to enable DDoS protection at multiple layers. 

 

Protecting the Data

The classification of all data stored at multiple locations inside the infrastructure is essential. Unless it is clear which data is most critical and which ones can be directly exposed on the internet, setting up protection mechanisms can be a bit of a task. Data resting inside all the different data stores must be classified in terms of sensitivity and criticality. If the data is sensitive enough to prevent direct access from users, policies and mechanisms for ‘action at a distance’ shall be put in place. 

AWS provides multiple data storage services, the most common ones being S3 and EBS disks. Application data can usually be found lying around inside data stores self hosted on EBS volumes. Also, all sensitive data that goes into S3 buckets should be properly encrypted prior to that. In fact, it would be better to enable encryption by default on these.

Protecting in transit data is also equally important, and to do that, secure connections are required, which can be obtained using TLS encryptions. Making sure that data is transferred over secure channels should be enough. AWS Certificate Manager is a good tool to manage SSL/ TLS certificates.

 

Preparing and Responding to Security Incidents the Right Way

Once all the automation has been set up, and security controls are put in place, designing incident response plans and playbooks becomes easier. A good plan must cover the response, communication, and recovery steps following any security incident. This is where the logs, snapshots and backups, GuardDuty findings play a critical role. They make the task relatively more efficient. Overall, the aim should be to prepare for an incident before it happens and to iterate and train the entire team to thoroughly follow the incident response plan.

Combating data breaches caused by misconfigured apps

From the outset of the pandemic, we have seen a dramatic increase in the number of cyber attacks and data breaches. And with much success, threat actors are abusing the fear and panic these adverse conditions are causing. As a result, there has been a precipitous rise in the number of COVID-themed trojans, ransomware attacks, as well as scams and phishing attacks across organisations and verticals. As more organizations shift to remote work, with inadequate policies and strategies in place, they gamble on their own employee and business data security, and privileged controls. And this has served as a catalyst, for an increased number of data breaches, across the globe. 

This article delves into the various ways in which data breaches can occur, and safety practices to ensure that you organization is not impacted by:

  • Cloud misconfigurations
  • Elasticsearch exposures
  • Exposed Internal API/ portals 
  • Phishing attacks and credential disclosure
  • Insecure WiFi/ no VPN

Cloud Misconfigurations

Cloud misconfigurations have led to massive data breaches. For example, The “Capital One” and “Imperva” data breaches were caused by the disclosure of AWS API keys. 

Fugue’s survey shows that 84% of the 300 IT professionals surveyed believe that they are already victims of undiscovered cloud breaches.

 

Data Breach: Fugue Survey
Fugue Survey

As pointed out by the survey, the most common causes of cloud misconfigurations are: 

  • Lack of awareness of cloud security and related policies, 
  • Insufficient controls and lapse in supervision, 
  • Too many cloud APIs to adequately govern, and 
  • Negligent internal activities

Although Cloud operations take a considerable load off of developers, and facilitate the smooth management and monitoring of multiple services, enforcing proper access control policies, user management, access key management, API access control becomes essential.

How to prevent cloud misconfiguration 

  • Understand and utilise the ‘shared responsibility’ security model.
  • Ensure multiple checks while shifting operations to the cloud giving careful consideration to IAM roles, user account permissions, key rotations, test accounts, and storage bucket permissions.
  • Review inbound and outbound traffic rules carefully for the VPC. Security groups are also susceptible to misconfigurations. Therefore, enforce a zero trust policy, and enable VPC logs and monitoring. 
  • Set up behavioural analysis and activity monitoring in addition to strict access policies.

 

Elasticsearch Exposures

Elasticsearch is a search engine that indexes data in the form of documents. Typically, the size of data that this engine indexes is quite large and the indexed result comprises metadata, personal user information, emails or application logs, and more. The service, by default, runs on TCP port 9200. Moreover, most Elasticsearch instances are self-hosted free versions of the software. 

CloudSEK XVigil’s Infrastructure Monitor has detected a significant increase in Elasticsearch instances running on the default port. But it is not rare these days. Recently a UK-based security firm accidentally exposed an Elasticsearch cluster, leaking more than 5 billion documents of breached data between 2012 and 2019.

How to secure Elasticsearch

  • Prevent access to Elasticsearch clusters from the internet. This is the best approach for most databases.
  • Practice ‘security by obscurity,’ whereby, the installed services are not run on the default port. This measure does not merely fix the problem, but drastically reduces the chances of exploitation even via unfocused attacks. 
  • Perform periodic assessments of vendors’/ partners’ networks and ensure that their security controls are set properly. The misconfiguration of privately-owned infrastructure, as well as that of partners and vendors in possession of critical data, adversely impact businesses.
  • Analyse and test every potential entry point to any critical data source/ functionality. This includes supplementary tools, used to expand an application’s capabilities. Most users instal Kibana along with Elasticsearch, which helps to visualise the data Elasticsearch indexes. Kibana dashboards are usually left unauthenticated, inadvertently granting anyone access to the indexed data. 
  • Encrypt the stored data, to render the data useless to the attacker, even if it is accessible. 
  • Employ Elasticsearch’s security methods for authentication, including:
    • Active Directory user authentication
    • File-based user authentication
    • LDAP
    • SAML
    • PKI
    • Kerberos
  • Enforce role-based access control policy, for users who access the cluster.
  • Update Elasticsearch versions regularly, to safeguard the cluster from frequent exploits that affect the older versions. 
  • Back up the data stored in the production cluster.  This is as important as the security measures adopted. A recent attack campaign accessed as many as 15,000 Elasticsearch clusters, and their contents were wiped using an automated script. 

 

Exposed Internal APIs/ Portals

Organizations deploy various applications for internal use. This includes HR management tools, attendance registration applications, file sharing portals, etc. In the event that the entire workforce shifts to remote work, such as times like now, it becomes difficult to track the access and usage of these applications. To top it off, applications are increasingly allowed traffic from the internet, instead of local office networks. As a result, applications and APIs, which lack authentication or use default credentials, are increasingly surfacing on the internet. 

In the past couple of weeks, a number of HR Portals, payroll applications, lead management dashboards, internal REST APIs, and shared FTP servers have surfaced on the internet. Most of the applications are self-hosted, and their default passwords can be used to access them. XVigil has detected multiple instances of directories that contain transaction reports, employee information documents, etc. being served without any authentication. 

How to prevent data disclosure through APIs/ portals

  • Security teams must test these applications thoroughly. 
  • Continuously monitor all internet facing servers. 

 

Phishing attacks and credential disclosures

With a remote workforce communicating primarily via text-based channels such as emails, chats and SMS, it has been much easier for phishing campaigns to take advantage of the distributed workforce. Consequently, the number of spear phishing attacks have surged. Barracuda researchers have observed 3 main types of phishing attacks in the last couple of months: 

  • Scamming
  • Brand impersonation
  • Business Email Compromise (BEC)

Individuals fall prey to phishing attacks, especially during the pandemic, due to:

  • Lack of direct communication
  • Absence of processes and strategies for situations such as this
  • Lack of awareness 

Since emails that use the word COVID have higher click-rates now, scammers are increasingly using them as lures to spread malicious attachments. Once the attachment is downloaded and the malware payload is dropped, threat actors can access keystrokes, files, webcam, or install other malware or ransomware. (Access CloudSEK’s threat intel on COVID-themed scams and attacks)

 

Data breach: Phishing mail
Phishing mail (https://blog.f-secure.com/coronavirus-spam-update-watch-out-for-these-emails/)

How to prepare for phishing attacks

  • Be extremely cautious about any mail you receive.
  • Verify the source of the email, before clicking on any links or attachments. 
  • Even if the links look legitimate, double-check for malicious files. For example: hovering over the attachment will show its actual URL. 

 

Insecure WiFi/ No VPN

Today, every remote workforce is connected to their personal devices and networks. So, the connectivity of such devices should be secured. 

How to prevent attacks via WiFi

  • To avoid brute force attacks, set complex passwords for the router. If the router is an old model, it may use weak encryption for connections, which can be cracked in no time. 
  • Employees working from shared spaces such as hostels, may be connected to shared wifi networks as well. So, to ensure that the data is not tampered within such insecure channels, set up a VPN. In case your organization does not provide a Business VPN, do not download free VPNs which might log your traffic data.

How do threat actors discover and exploit vulnerabilities in the wild?

 

In the recent past, several security vulnerabilities have been discovered, in widely used software products. Since these products are installed on a significant number of devices, connected to the internet, it entices threat actors to develop botnets, steal sensitive data, and more.

In this article we explore:

  • Vulnerabilities detected in some popular products.
  • Target identification and exploitation techniques employed by intrusive threat actors.
  • Threat actors’ course of action in the event of identifying a flaw in widely used internet products/technology.

 

Popular Target Vulnerabilities and their Exploitation

 Ghostcat: Apache Tomcat Vulnerability

All Apache Tomcat Server versions are vulnerable to Local File Inclusion and Potential RCE. The issue resides in the AJP protocol, which is an optimised version of the HTTP protocol. The years old vulnerability is vulnerable because of the component which handled a request attribute improperly. The AJP protocol, enabled by default, listens on TCP port 8009. Multiple scanners, exploit scripts, honeypots surfaced in a matter of days after the original disclosure by Apache.

Stats published by researchers indicate a large number of affected systems, the numbers being much greater than originally predicted.

Twitter post on the number of hosts that have vulnerabilities
Twitter post on the number of affected hosts

Citrix ADC, Citrix Gateway RCE, Directory Traversal

Recently, Directory Traversal and RCE vulnerabilities, in Citrix ADC and Gateway products, affected at least 80,000 systems. Shortly after the disclosure, multiple entities (ProjectZeroIndia, TrustedSec) released PoC scripts publicly that engendered a slew of exploit attempts, from multiple actors in the wild.

Stats on honeypot detects per hour on expose vulnerabilities
Stats on honeypot detects: https://twitter.com/sans_isc/status/1216022602436808704

Jira Sensitive Data Exposure

 A few months ago, researchers found Jira Instances leaking sensitive information such as names, roles, email IDs of employees. Additionally, internal project details, such as milestones, current projects, owner and subscriber details, etc., were also accessible to anyone making a request to the following unauthenticated JIRA endpoints:

 

https://jirahost/secure/popups/UserPickerBrowser.jspa

https://jirahost/secure/ManageFilters.jspa?filterView=popular

https://jirahost/secure/ConfigurePortalPages!default.jspa?view=popular

Companies affected due to Jira vulnerabilities
Companies affected due to the Jira vulnerability

Avinash Jain, from Grofers, tested the vulnerability on multiple targets, and discovered a large number of vulnerable Jira instances, revealing sensitive data belonging to various companies, such as NASA, Google and Yahoo, and its employees.

 Spring Boot Data Leakage via Actuators

Spring Boot is an open source Java-based MVC framework. It enables developers to quickly set up routes to serve data over HTTP. Most apps using the Spring MVC framework now also use the Boot utility. Boot helps developers to configure what components to add, and also to setup the Framework faster.

An added feature of the tool called Actuator, enables developers to monitor and manage their applications/REST API, by storing and serving request dumps, metrics, audit details, and environment settings.

In the event of a misconfiguration, these Actuators could be a back door to the servers, making exposed applications susceptible to breaches. The misconfiguration in Spring Boot Versions 1 to 1.4 granted access to Actuator endpoints without authentication. Although later versions secure these endpoints by default, and allow access only after authentication, developers still tend to ignore the misconfiguration before deploying the application.

The following actuator endpoints leak sensitive data:

/dump performs a thread dump and returns the dump
/trace returns the dump of HTTP requests received by the app
/logfile returns the app-logged content
/shutdown commands the app to shutdown gracefully
/mappings returns a list of all the @RequestMapping paths
/env exposes all the Spring’s ConfigurableEnvironment values
/health returns application’s health information

 

There are other such defective Actuator endpoints, that provide sensitive information to:

  • Gain system information
  • Send requests as authenticated users (by leveraging session values obtained from the request dumps)
  • Execute critical commands, etc.

Webmin RCE via backdoored functionality

Webmin is a popular web-based system configuration tool. A zero-day pre-auth RCE vulnerability, affects some of its versions, between 1.882 and 1.921. This vulnerability enables the remote password change functionality. The Webmin code repository on SourceForge was backdoored with malicious code allowing remote command execution (RCE) capability on an affected endpoint.

The attacker sends his commands piped with Password Change parameters through `password_change.cgi` on the vulnerable host running Webmin. And if the Webmin app is hosted with root privileges, the adversary can execute malicious commands as an administrator.

Command execution payload
Command execution payload

Why do threat actors exploit vulnerabilities?

  1. Breach user/company data: Data exfiltration of Sensitive/PII data
  2. Computing power: Infecting systems to mine Cryptocurrency, serve malicious files
  3. Botnets, serving malicious files: Exploits targeted at adding more bots to a larger botnet
  4. Service disruption and eventually Ransom: Locking users out of the devices
  5. Political reasons, cyber war, angry user, etc.

 

How do adversaries exploit vulnerabilities?

On disclosure of such vulnerabilities, adversaries probe the internet for technical details and exploit codes, to launch attacks. Rand corporation’s research and analysis on zero-day vulnerabilities states that, after a vulnerability disclosure, it takes 6 to 37  days and a median of 22 days to develop a fully functional exploit. But when an exploit disclosure comes with a patch, developers and administrators immediately patch the vulnerable software. Auto update, regular security updates, large scale coverage of such disclosures help to contain attacks. However, several systems run the unpatched versions of a software or application and become easy targets for such attacks.

Steps involved in vulnerability exploitation

Once a bad actor decides to exploit a vulnerability they have to:

  • Obtain a working exploit or develop an exploit (in case of a zero-day vulnerability)
  • Utilize Proof of Concept (PoC) attached to a bug report (in case of a bug disclosure)
  • Identify as many hosts as possible that are vulnerable to the exploit
  • Maximise the number of targets to maximise profits.

Target Hunting

Even though the respective vendors patch vulnerabilities reported, upon searching GitHub or specific CVEs on ExploitDB, we can find PoC scripts for the issues. Usually PoC scripts require a host/ URL as an input and it measures the success of the exploit/ examination.

Adversaries identify a vulnerable host through their signatures/ behaviour, to generate a list of exploitable hosts. The following components possess signatures that determine whether a host is vulnerable or not:

  • Port
  • Path
  • Subdomain
  • Indexed Content/ URL

Port

Many commonly used software has a specific default installation port(s). If a port is not configured, the software installs on a pre-set port. And in most cases a software installs on the default port. For example, most systems use default port 3306 to install MySQL and port 9200 for Elasticsearch. So, by curating a list of all servers with an open 9200 port, a threat actor can determine systems running the Elasticsearch. However, port 9200 can be used to install other services/ software as well.

Using port scans to discover targets to exploit the Webmin RCE vulnerabilities

  • Determining that the default port where Webmin listens to after installation is Port 10000.
  • Get a working PoC for the Webmin exploit.
  • Execute a port scan on all hosts connected to the internet for port 10000.
  • This will lead to a discovery of all possible Webmin installations that could be vulnerable to the exploit.

In addition, tools like Shodan make port-based target discovery effortless. At the same time, if Shodan does not index the target port, attackers leverage tools like MassScan, Zenmap and run an internet-wide scan. The latter approach hardly takes a day if the attacker has enough resources.

Similarly, an attacker in search of an easy way to find a list of systems affected by Ghostcat, will port scan all the target IPs and narrow down on machines with port 8009 open.

Path

Software/ services are commonly installed on a distinct default path. Thus, the software can be fingerprinted by observing the signature path. For instance, WordPress installations can be identified if the path ‘wp-login.php’ is detected on the server. This facilitates locating the service as it accesses a web browser.

For example, when phpmyadmin utility is installed, by default it installs on the path ‘/phpmyadmin’. A user can access the utility through this path. In this case, a port scan won’t help, because this utility doesn’t install on a specific port.

Using distinct paths to discover targets to exploit Spring Boot Data Leakage

  • Gather a list of hosts that run Spring Boot. Since the default Spring Boot applications start on port 8080, it would help to have a list of hosts that have this port open. This allows threat actors to see a pattern.
  • Hit specific endpoints like ‘/trace’, ‘/env’ on the hosts and check the response for sensitive content.

Web path scanners and web fuzzer tools such as Dirsearch or Ffuf facilitate this process.

Though responses may include false positives, actors can use techniques, such as signature matching or static rule check, to constrict the list of vulnerable hosts. As this method operates with HTTP requests and responses, the process can be much slower than mass scale port scans. Shodan can also fetch hosts based on http responses, from its index.

Subdomain

Software are commonly installed on a specific subdomain since is an easier, standard, and convenient way to operate the software.

For example, Jira is commonly found on a subdomain as in ‘jira.domain.com’ or ‘bug-jira.domain.com’. Even though there are no rules when it comes to subdomains, adversaries can identify certain patterns. Similar services, usually installed on a subdomain, are Gitlab, Ftp, Webmail, Redmine, Jenkins, etc.

Security Trails, Circl.lu, Rapid7 Open Data hold passive DNS records. Other scanners that maintain such records would be sites such as Crt.sh and Censys. They collect SSL certificate records regularly and have an add-on feature that supports queries.

Indexed Content/Url

The content published by services is generally unique. If we employ search engines such as Google, to find pages based on particular signatures, serving specific content, the results will have a list of URLs running a particular service. This is one of the most common techniques to hunt down targets, easily.
It is commonly known as ‘Google Dorking’. For instance, adversaries can quickly curate a short list of all cPanel login pages. For which, they could use the following Dork in Google Search: “site:cpanel.*.* intitle:”login” -site:forums.cpanel.net”. The Google Hacking database contains numerous such Dorks and after understanding the search mechanism, it is easy to write such search queries.

Observations

There have been multiple honey pot experiments to study the mass scale exploration and exploitation in the wild. Setting up honey pots is not only a good way of understanding the attack patterns, it also serves in identifying malicious actors out there, trying to exploit systems in the wild. These identified IPs/ Network trying to enumerate targets or exploit vulnerable systems end up in various public blacklists. Various research attempts have set up diverse honeypots and studied the techniques used to gain access. Most attempts are to gain access via default credentials, and originated mainly from blacklisted IP addresses.

Another interesting observation is that, most honeypot detected traffic, seems to originate from China. It is also very common to see honeypots specific to a zero-day surface on Github as soon after a the release of an exploit. The Citrix ADC vulnerability (CVE-2019-19781) also saw a few honeypots being published on Github within a short time after the first exploit PoC was released.

Research carried out by Sophos highlights the high rate of activity on exposed targets using honeypots. As reported in the research paper, it took from less than a minute to 2 hours for the first attack on the exposed target. Therefore, if an accidental misconfiguration leaves a system exposed to the internet, for even a short period of time, it should not be assumed that the system was not exploited.