Zabbix – Jupiter Broadcasting

Super special savings for TechSNAP viewers only. Get a .co domain for only $7.99 (regular $29.99, previously $17.99). Use the GoDaddy Promo Code cofeb8 before February 29, 2012 to secure your own .co domain name for the same price as a .com.

Pick your code and save:
cofeb8: .co domain for $7.99
techsnap7: $7.99 .com
techsnap10: 10% off
techsnap20: 20% off 1, 2, 3 year hosting plans
techsnap40: $10 off $40
techsnap25: 25% off new Virtual DataCenter plans
Deluxe Hosting for the Price of Economy (12+ mo plans)
Code: hostfeb8
Dates: Feb 1-29

Direct Download Links:

Subscribe via RSS and iTunes:

Fill out my Wufoo form!

Show Notes:

Ongoing targeted attacks against defense and aerospace industries

The research provides detailed analysis of the ‘MSUpdater Trojan’
The trojan was mostly spread using targeted spear phishing attacks, emailing people who would have access to sensitive information
The goal of the remote administration trojan was to steal sensitive or classified information about aerospace or defense designs
The trojan changed rapidly to avoid detection, and used a variety of methods to infect computers, including zero-day PDF exploits, fake conference invitations (usually specifically targeted to the recipient area of interest, including ISSNIP, IEEE Aerospace Conference, and an Iraq Peace Conference)
Communications between the infected machines and the C&C servers often took the form of HTTP traffic using the URL structure of Microsoft Windows Update (where the trojan got its name) and Windows Error Reporting likely to avoid detection by some IDSs and manual traffic analysis. Other versions of the trojan included fake google searches with encoded parameters
The trojan dropped was able to detect that it was being run in a virtual machine, and if so would not attempt to infect the machine. This allowed it to go on undetected for a longer period of time and until discovered, hampered its analysis by researchers
Outline by Researchers
Research and Analysis of the Trojan
Research paper on detecting Virtual Machines

DreamHost suffers massive outage due to automated Debian package updating

DreamHost had a policy where they would automatically install the latest packages from the their repository on all of their machines, including VPS and Dedicated servers rented to customers
Something in one or more of these packages caused some dependencies to be uninstalled resulting in Apache, the FTP server and in some instances, MySQL being uninstalled or unable to start properly
DreamHost is a very large attack target due to the number of servers and domains that they host, they must work diligently to ensure updates are applied to prevent massive numbers of machines from becoming compromised
DreamHost has to manually resolve many of the dependencies was unable to fix the issue in an automated fashion, requiring hands on admin time on each individual server and VPS
DreamHost has now changed their policy regarding updates, where they will now test all of the packages from Debian extensively before they are pushed to all customer servers

Feedback

Q: Chris D asks about monitoring solutions

A: I personally use Nagios + NagiosGraph for my monitoring, although I have also experimented with Zabbix recently. We discussed a number of monitoring applications in TechSNAP 20 – Keeping it up . Nagios configures each host/service from files, but supports extensive templating and host/service groups, allowing you to quickly configure servers that are nearly identical. Zabbix is powered by a database, which is both a pro and a con, but the main advantage I gave to NagiosGraph was that the historical data is stored in RRD files rather than a database, meaning it is aged to require less space. Zabbix by default deleted old data to avoid accumulating massive amounts of data.

Chris uses: monitor.us (want’s them to sponsor us)
Allan has monitoring included in his DNS Failover Service from DNS Made Easy
*

Q: Joshua asks about DNS A Records vs CNAME Records

A: If the CNAME is inside the same domain, the authoritative server will usually return the result with the response for the CNAME. For example, if static.example.com is a CNAME to www.example.com, the A record for www.example.com will be included in the response. However if the CNAME is for something like example.cdn.scaleengine.net then a 2nd lookup is required. To answer the second part of your question, it is not possible to do an HTTP redirect at the DNS level, so NGINX is the best place to do it, if done correctly this redirect can be cached by Varnish to avoid any additional latency. You could hard-code the redirect in to Varnish as well. I applaud your use of a cookieless domain for your static content.

War Story

This week’s war story is sent in by Irish_Darkshadow (the other other Alan)

The Setting:

IBM has essentially two “faces”, one is the commercial side that deals with all of the clients and the other is a completely internal organisation called the IGA (IBM Global Account) that provides IT infrastructure and support to all parts of IBM engaged with commercial business.

The events described here took place in early 2005.

The Story:

There is an IBM location in Madrid, Spain which was stafffed by about two thousand people at the time of this war story. The call centre in Dublin was tasked with supporting the users in that site and every single one of them had been trained in what I called “Criticial Situations – Connectivity Testing”. The training took about 4 hours to deliver and was followed up with some practical tests over the next two weeks to ensure the content was sinking in. There was also some random call recording done to detect the techniques being used on live calls too.

Early one morning a call came in to the Spanish support line from a user who had arrived to work late and was unable to get access to her email server. The agent immediately started to drill into the specifics of the problem and realised that the user simply had no network connectivity to her email. The next step in the training says to establish whether the user actually has partial connectivity or a complete loss. The agent began with a simple IPCONFIG /ALL and noticed right away that the user had a 192.168.x.x IP address. This is quite an unusual thing to get on a call from an internal IBM user and the agent didn’t know what to do next and started to get some empirical data before escalating the issue. The key question was – are you the only user affected? The user confirmed that everyone around her was working away with no issues.

The team leader for the Spanish support desk picked up on the call and decided to call my team for some troubleshooting tips. I dropped over to the call and started listening in (which was useless as it was all in frickin’ Spanish) in the hopes of catching something “weird” from the call. The 192 address piqued my curiosity so I had the agent check for a statically assigned IP address…the XP based computer the user was operating was set to use DHCP. Hmmmm…

While this call really started to gain my interested I started hearing of other calls beginning to come in from other users in the same building with the same problem. The agents on those calls were able to confirm to me that these users were on different floors than the original user. So I now had a building on my hands that was slowly losing connectivity to these 192 addresses and the only possibility was a rogue DHCP server.

I suspected that the network topology and physical structure was about to play an important part in isolating the problem so I called up the onsite technicians and managed to get one who knew the building and the network inside out. Each floor of this 20+ floor building has a comms room where 24 / 48 port switches were used to supply each area of the floors. The best part was that this guy actually had a map of which ports were patched to which desks for every floor.

Now that I was firmly into Sherlock Holmes mode I asked the onsite guy to arrange some teams for me. For each of the know affected floors I needed a tech in the comms room and another testing computers. We had hatched a plan to start from the original floor that was affected by unpatching one switch at a time from the building network and doing a release / renew on a PC in that newly unpatched section to see if we got a 169.254.x.x address. If that happened then we knew that the rogue DHCP server was not in that specific section (clever eh? what do you mean no? well screw you, you werent’ there man…it was a warzone!). We repeated this pattern for five floors with no success so we expanded one floor up and one floor down. Eventually one of the techs ran the test and the PC picked up a new 192.168.x.x lease…..we had the root of the problem within our grasp and it was time to close the net (too much? I’m trying to make this sound all actiony….it my head it has AWESOME danger music).

The onsite guys managed to check every PC in the suspect floor area and the rogue server was still not found. They yanked the cable from every PC in the area and while the rest of the building was recovering, we knew that if we repatched this section that the problem would spread again. When all the PCs were disconnected, I asked the onsite guy to check the switch for activity and there was still one port showing traffic. Despite having all the PCs on the floor disconnect…the rogue was still operational. I questioned if there were any meeting rooms or offices on the floor and there was one. AHA! Upon closer inspection, the empty office had a laptop on the desk that was showing activity on the NIC lights. They yanked the cable and tested a PC on the floor…..169.254.x.x…SUCCESS. The switch was repatched to the building network and all of the PCs recovered. The technician I had called originally started to cackle maniacally over the phone. Perhaps it was better described as derisive laughter. Apparently the door to the office that housed the rogue DHCP laptop had a sign on it that read – IT Manager!!!

When we managed to get a full post mortem / lessons learned done it turned out that the IT Manager had arrived to the building about an hour after most users start work and half an hour prior to the arrival of the original caller to the Dublin support centre. So every user who worked normal hours had arrived to work and gotten a valid IP lease. Then the IT Manager showed up, connected his laptop and buggered off to a meeting. 192.168.x.x addresses started getting issued. At that point the original user arrives to work, gets a bad IP and calls the support desk. It turned out that over the weekend the IT Manager had enabled Internet Connection Sharing so that his daughter could get online through the broadband on the laptop from her home PC. He hibernated the laptop, forgot all about the ICS being enabled and just connected it up at work that morning without even thinking about it .

Sometimes, late at night….I can still hear that derisive laughter and it makes me sad when I think of all those IT Managers out there who can do stupid shit like this and yet retain their positions!

It just goes to show, that the methodical approach may not always be the fastest approach, but because it solves the problem every single time, it usually results in a faster resolution and a better understanding of what the issue was.

Round Up

The post DHCP Attacks | TechSNAP 43 first appeared on Jupiter Broadcasting.

Keeping it Up | TechSNAP 20

chris — Thu, 25 Aug 2011 21:33:51 +0000

Apache and PHP have hooked up at the fail party, and we’ll share all the details to motivate you to patch your box!

Then Microsoft takes a stab at AES and we wrap it all up with a complete run down of Nagios, and how this amazing tool can alert you to a potential disaster!

All that and more, on this week’s TechSNAP!

Direct Download Links:

Subscribe via RSS and iTunes:

[ad#shownotes]

Show Notes:

All versions of the apache web server are vulnerable to a resource exhaustion DoS attack

A single attacker with a even a slow internet connection can entirely cripple a massive apache server
The attack uses the ‘Range’ header, requesting 1300 different segments of the file, causing the web server to create many separate memory allocations. The existing attack script defaults to running 50 concurrent threads of this attack, which will quickly exhaust all of the ram on the server and drive the server load very high.
Apache 1.3 is past it’s End Of Life and will not receive an official patch
A different aspect of this bug (using it to exhaust bandwidth) was pointed out by a Google security engineer over 4 years ago

PHP 5.3.7 contains a critical vulnerability in crypt()

Official Bug Report
The crypt() function used for hashing password received much attention in this latest version of php, and a bug was inadvertently introduced where when you hash a password with MD5, only the salt is returned. This means that when validating a login attempt, when the hash of the attempt is compared to the stored hash, only the salt will match, resulting in a failed login attempt. However if the user changes their password, or a new user registers, the stored hash will only be the salt, and in that case, any attempted password will result in a successful login attempt.
PHP 5.3.7’s headline bug fix was an issue with the way blowfish crypt() was implemented on linux (it worked correctly on BSD). Some passwords that contained invalid UTF-8 would result in very weak hashes
It seems that this error was caught by the PHP unit testing framework, so the fact that it made it in to a production release means that the unit testing was likely not properly completed before the release was made.
5.3.7 was released on August 18th. The release was pulled on August 22nd, and 5.3.8 was released on August 23rd

Researches have developed a new attack against AES

Researchers from a Belgian (Katholieke Universiteit Leuven) and a French (Ecole Normale Suprieure) University, working with Microsoft research have developed a new attack against AES that allows an encryption key to be recovered 3 to 5 times faster than all previous attacks
The attack would still take billions of years of CPU time with currently existing hardware
Full Paper with Details
Comments by Bruce Schneier
Additional Article

Feedback

Q: (DreamsVoid) I have a server setup, and I am wondering what it would take to setup a backup server, that would automatically take over if the first server were to go down. What are some of the ways I could accomplish this?

A: This is a rather lengthy answer, so I will actually break it apart, and give one possible answer each week, for the next few weeks. This weeks solution is to use DNS Failover. For this feature, I personally use a 3rd party DNS Service called DNS Made Easy . Once you are hosting your DNS with them, you can enable Monitoring and DNS Failover. This allows you to enter the IPs of more than one server for the DNS entry such as www.mysite.com. Only one IP will be used at a time, so it is not the same as a ‘Round Robin’ setup. This simplifies problems with sessions and other data that would need to be shared between all of the servers if they were used at the same time. DNSMadeEasy will monitor the website every minute from locations all over the world, and if the site is unreachable, it will automatically update your DNS record to point traffic to the next server on your list. It will successively fail over to each server on the list until it finds one that is up. When the primary server comes back, it can automatically switch back. We use this for the front page of ScaleEngine.com, if the site were ever down, it would fail over to a backup server we have at a different hosting provider. This backup copy of the site is still reliant of a connection to our centralized CMS (which also uses DNS Failover), and if that were down too, it fails over to a flat-HTML copy of our website that is updated once per day. This way, our website remains online even if both our primary and secondard hosting are offline, or if all 3 fail over servers for the CMS are down as well.

Q: (Al Reid) Nagios seems to be a very good open source and widely used network monitoring software solution, is it possible that you guys could discuss the topic of network monitoring for services, hosts, router, switches and other uses?

A: Nagios is an open source network monitoring system that can be used to monitor a number of different aspects of both the hosts (physical and virtual servers, routers) and the services of those hosts (programs like apache, mysql, etc). The most basic monitoring is just pinging the host, and entering an alert state if the host does not response, or if the latency or packet loss exceed a specific threshold. However the real power of a network monitoring system comes not only from alerting you (via email, text message, audible alarm) when something is down, but actually monitoring and graphing performance over time. For example, with my MySQL servers, nagios monitors not only that they are accessible, but graphs the number of queries per second, and the number of concurrent connections. This way, if I notice higher than expected load on one of the servers, I can pull of the graph and see that, yes, a few hours ago the number of queries per second jumped by 30%, and that is obviously what is causing the additional load. A huge number of things can be monitored using a combination of the nagios tools and the SNMP (Simple Network Management Protocol) interfaces exposed by many devices. For example, we monitor power utilization from our PDUs and traffic through each of our switch ports. Some of the main metrics we monitor on each server are: CPU load, load averages, CPU temperature, free memory, swap usage, number of running processes, uptime (alerts us when a device reboots unexpectedly), free disk space, etc. We also monitor our web servers closely, monitoring the number of connections, requests per second, number of requests waiting on read or write, etc. Nagios monitoring can be taken even further, more advanced SNMP daemons on servers can list the packages that are installed, and a nagios tool could be setup to alert you when a known vulnerable package is detected, prompting you to upgrade that package. Nagios can also monitor your SSL certificates and Domain Names, and alert you when they are nearing their expiration dates (Chris should have this so he doesn’t forget to renew JupiterBroadcasting.com every year). Nagios supports two different methods of monitoring. The first is ‘active’, which is the most commonly used, nagios connects to the server/service and checks that it is running, and gets the performance data, if any. However nagios can also support ‘passive’ data collection, where the server or service pushes performance data to nagios, and nagios can trigger an alert if an update is not received within a specific time frame, this can help solve a common issue we have discussed before, where the monitoring server is a weak point in the security of the network, a single host that is able to connect to even the most secure hosts in your network. With passive monitoring, you can have secure hosts or unroutable LAN hosts push their monitoring and performance data to nagios from behind the firewall, even when nagios cannot connec to that host. Other alternative to nagios are Zabbix, SpiceWorks or Cacti, but I have never used them.

Random SQL Injection Comic

Round Up:

Bitcoin Blaster:

The post Keeping it Up | TechSNAP 20 first appeared on Jupiter Broadcasting.