December 12, 2024

Internet-Wide Recon: Moving Past IP-Centric Approaches

Episode Summary

In this episode, we discuss the blindspots of IP-centric approaches to asset discovery and the importance of understanding the full attack surface of an organization.

We unpack the challenges posed by modern cloud architectures, load balancers, and WAFs, and how these can create blind spots in reconnaissance efforts. We also highlight the significance of subdomain data and passive DNS in uncovering hidden attack surfaces that traditional scanning methods might miss.

We talk about:
- The limitations of Internet Wide Scanning
- The importance of breadth and depth in attack surface mapping
- Real-world examples of blind spots in modern infrastructure
- The role of DNS and path-based routing in security assessments
- Insights into IPv6 and its implications for discovery

For more details about Assetnote's Attack Surface Management Platform, visit https://assetnote.io/

Transcript

MG:
Hey, okay, another episode. So this week I think we want to follow on a little bit from the episode we did last week, talking about a bit more of a framework for Recon. It's been a pretty popular episode and we've had a lot of very positive feedback on that, which has been really great. But one of the topics that we wanted to maybe dive a little bit deeper into this episode is something that came up when we were talking about that, which is the blind side of Internet Wide Scanning and really driving a little bit deeper and exploring what we mean there and exploring that concept a little bit further because I think that caught a lot of people's attention. And so in terms of the blind side of Internet Wide Scanning, maybe we should just recap quickly from the last discussion. What do we really mean by the blind side of Internet Wide Scanning?

Shubs: Yeah, I mean, as we discussed last time, the internet wide scanning approaches and techniques, while they were kind of revolutionary at the time when they came out, or kind of changed the way that we looked at discovery when they came out, as time goes on, we're really realizing that this isn't the best approach to covering an entire attack surface. And yeah, we discussed last time as well that it is a very supplemental approach, but not an absolute approach. And what I mean by that is there's just so much infrastructure on the internet that is using like WAFs, CDNs, any level of TLS-SNI routing, any level of V-host-based routing, internet-wide discovery really doesn't work that well. And that's not because, you know, internet-wide discovery is, you know, flawed in that sense. It's more that there are so many different ways that applications can be routed to on the internet. And looking at everything from an IP-centric approach is, in many cases, not a complete picture of an attack surface. So yeah, I think that generally the way I see it is internet wide scanning data is good to have, but it's not the only thing you need for asset discovery and discovery of applications.

MG: Yeah, we spoke last week about the more conceptual ideas of reconnaissance and discovery, and specifically, you know, there were sections where we spoke about breadth and depth, and I think one of the challenges with internet-wide scanning, right, is that there's an assumption of breadth that is maybe not there. And that's really the core point. And, you know, if you are, you know, in our context, right, we're building an ASM product. And I know we focus very heavily on the security research that we do and the exposure monitoring elements of this. But a foundational layer of all of that is really the attack surface mapping and understanding what a customer's or an organization's attack surface is. And so, you know, being complete there is really important because, you know, from our perspective, right, you have the attack surface data that is really underpinning everything that we do. And it feeds very closely into the exposure monitoring, right? If you're missing attack surface, you're also missing exposure in that attack surface. So it is really important to get this right. And so we've spent a lot of time thinking about this sort of stuff. It's very important. And we don't necessarily make the assumption that internet wide scanning is on its own at least is enough to get comprehensive coverage, I guess, of an attack surface or a particular organization's attack surface. And, you know, that IP-centric approach, as you say, is broadly over time, I think, decreasing in effectiveness when it comes to thinking about those concepts of breadth and depth. And there's many reasons for that. One of the ideas that we talk about a lot at Asset Note is that asset discovery or even security in general, it doesn't exist in a vacuum. It's a reflection of the larger trends in IT. And I think this is what's driving some of this blind side and people perhaps don't have the perspective to be able to cover that off. And so let's discuss that a little bit more in terms of you know, what some of those challenges are and what some of the conditions are that are causing this sort of blind spot, so to speak, in internet-wide scanning, and then, you know, some of the ways that we think about, you know, addressing that and, you know, maintaining solid breadth when it comes to discovery.

Shubs: Yeah, I think, you know, we've always kind of considered subdomain data as like a higher quality source of data in terms of exposure scanning, especially at AcidNode. We've designed many things in that way, but it doesn't mean that we discard IP-based scanning or internet-wide discovery. It just means that if I had a choice between a subdomain asset to scan which is pointing to an IP or just the IP, I would definitely take the subdomain. And I've seen it time and time again where we've got thousands and thousands of subdomains that are pointing to the same IP. And the classic thing is every internet-wide scanner only knows of that single IP and potentially doesn't know of those thousand subdomains. That means that the host header doesn't have the correct value. That means that you're not being routed to the correct application, which means you're essentially missing attack surface. And that's really what I mean by the blind side. And there are obviously many other cases where attack surface is missed as a result of IP-based scanning, such as TLS-SNI as well. But what we're seeing more of, especially from my side as a practitioner, is there are more and more companies that are moving to a very modern cloud approach of doing things. And this often involves complex load balances, WAFs, and CDNs. And in those cases, if there is not enough metadata available on the IP itself, and what I mean by that is there's no SSL certificate that tells you what the common names are, then in those cases, you're flying blind if you're just looking at the IP addresses. And without that extra crucial piece of information within the subdomain itself, the actual host name, then you're not going to be able to be routed to the correct application. That's how these things work. And just fundamentally, this is how applications are architected in 2024. This is how reverse proxies work. This is how load balancers work. This is how WAFs work and CDNs work. And it's kind of annoying sometimes when I see all these really large IP-centric discovery sources saying that they are the best at discovery, because in many cases, they're missing out on a huge attack surface on the internet. And to be frank, I'm not saying that there's an easy fix for this. When going at the internet-wide scale and collecting data at the internet-wide scale, to be able to do this level of discovery that I'm talking about at an internet wide scale would probably blow out their requests in a very large proportion. And it's not feasible, I don't think. So many challenges. And yeah, it's definitely something that I think is a bit more related to the approach that they're taking with discovery less than, you know, just the technique of internet wide discovery.

MG: Yeah, and I think this is kind of the point, and it's what we discussed last week, which is if you start to think about this more from a conceptual level rather than from a tools and tips and tricks kind of level, then you start to identify these gaps, right? You start to identify these areas where it's perhaps you know, not as covered as well. And I think what it does more fundamentally is it challenges the assumptions. And this is kind of what I'm trying to get to, you know, with this episode and the last episode is that the assumption that, oh, well, I scan every IPv4 address on the internet, right, across the internet, then, you know, I've got complete coverage. You don't. You don't have anywhere near that level of coverage. And you know, bring that back to, you know, more of a, I guess, a customer level or a practitioner level like a security team and organization that's trying to understand that. So they're trying to understand, you know, the breadth of their attack surface so they can then implement you know, processes and monitoring to understand, you know, what that is, but then also, you know, how they deal with issues with that, whether that's security exposures, or otherwise. And, and if you don't have that sort of core basis of, of completeness, or as much breadth as you can possibly get, then you're missing that. And that means by extension, those processes are less effective. And so, so I think You know, more of the point of this, for me at least, is to really challenge this assumption that, oh, if I just scan every IP on the internet, I know what the attack surface is for everybody that's exposing anything to the internet. And it's just not that simple in practice. You know, I think maybe, you know, extending on this idea, you know, what are some of the approaches and some of the examples of, you know, more specific examples, let's say, of, you know, where where technology practices and architecture practices in a modern sense are causing these blind spots and what are some of the ways that we can approach understanding that a little bit better and I guess uncovering what's behind that.

Shubs: Yeah, I think that's a great question. I can think of a few examples that come off the top of my head that I have experienced in the last 30 days. One of the big examples is when you're dealing with Akamai's WAF. Akamai obviously has IPs all over the world. There's a huge network that they have, all globally located. What often happens with organizations is that they'll issue a wildcard certificate, and then they will have Akamai fronting all of their subdomains, basically. And what that means is we don't have enough information from an IP perspective to tell what exactly can we use to route to the specific applications. This is where, honestly, this is where passive DNS data comes really handy because now you know, okay, there is a route to these applications via these subdomains. That's one example. The other example is really modern frameworks and cloud providers. So for example, Vercel, for example, Heroku, even in the past, any of the cloud providers that allocate a subdomain underneath their main domain, and they just use this really gigantic load balancer to route you to the place you need to go to. Things like that, we're not going to discover through internet-wide scanning. And this is where subdomain data becomes very, very useful. And seeing where the internet is really going with this is that more and more developers are using these cloud platforms for all sorts of random development. And often, a lot of this is just kind of unknown to a lot of the IP-based internet-wide scanners. And I think that this is something that you can only really figure out through subdomains. Another example is IPv6, where we don't actually have a way to scan the entire Internet's IPv6 attack surface. through the same way that we do IPv4. So, yeah.

MG: Well, it's kind of an interesting point that you mentioned IPv6, and so it's worth talking about it. First of all, the rise of internet-wide scanning came once there was enough, I guess, compute and ability to be able to scan the IPv4 address space. But I'm not sure, more specifically, in a reasonable amount of time, right, or in a small amount of time. I don't know that we're there with the entire IPv6 address space. So on one level, there's a fundamental gap if you're taking an IP-centric approach because you just can't cover off the IPv6 address space. But then I think there'll be a lot of people that say, well, everybody's been talking about IPv6 for many, many years. It's not necessarily going to sort of dominate and be the default. But I think while that's probably a sort of a valid point, there are other elements when it comes to IPUv6 that I think are really valuable when we think about it from a discovery perspective as well.

Shubs: Yeah. And yeah, I think it's a bit of a meme at this point about when IPv6 will actually be adopted across the internet. But no, I mean, we see the graphs every year. There's internet wide surveys and, you know, the adoption is increasing gradually. And, you know, we might see in our lifetime, I don't know, maybe your kids will see it. Maybe we'll be gone by then. But we might see it in our lifetime that IPv6 is fully adopted and everyone's kind of moved away from IPv4. I think there's also a lot of money to be made in IPv4, so I think there's still a lot of people beating that drum quite hard. I think from a discovery perspective though, I find this pretty funny because the solution that a lot of these IP-centric, internet-wide scanning tools have taken for discovering IPv6 is kind of a bit dodgy, to be completely honest. And they're like, OK, well, we can't scan the entire internet for IPv6. So what we're going to do is we're going to position our infrastructure in really critical parts of the internet, such as NTP servers. and we'll submit to these NTP server lists and suddenly when you spin up a new Ubuntu box on AWS and it connects to the NTP server, we now know that IPv6 and immediately we're going to scan that IPv6 for as much information as we can. And that's a bit dodgy, and they're being called out for that, they're being caught, and I think that's still going on. I think that's well and alive and kicking on. So if you're ever wondering how they got IPv6 addresses in your favorite internet-wide scanning tool, They most likely breach someone's privacy through something like that. So that's how they currently do it. But yeah, it's a tricky challenge. And I think that, honestly, passive DNS is going to come out probably as the leader in discovering IPv6 for infrastructure. Now that's maybe a distinction to make because we're not going to be able to discover all the IPv6 instances that are running like a random RDP or SSH or something like that, or client instances, but infrastructure in the sense of like web servers and things that may be responding from a web perspective, at least that need a subdomain to be routed to. So yeah, there might still be some services there, but it's going to primarily be From that infrastructure perspective and less from like a oh, this is random machine inside an office inside this company for example so that's that's where I think that you know, there's potentially value in some of the approaches some of the IP centric internet wide scanners are taking which is Basically inserting themselves in very critical parts of the internet to log.

MG: I mean IPv6 at the end of the day That's essentially what passive DNS is if you really think about it in a lot of ways That's right, yeah. But there's also potential pitfalls, I think, in terms of the future when it comes to thinking about passive DNS as well, in terms of various sort of, you know, DNS over HTTP and things like that. of HTTPS, I think. And, you know, I think there's also potential that that can be limited in a way. And this to me really comes back fundamentally to the idea that we discussed last week. It's, you know, how do you think about this stuff? If you just think about it solely from one dimension, IP-centric, subdomain-centric, and that's sort of it, then you're just missing out, right? Like, you need to understand where the gaps are and try and extend your techniques and extend your coverage as broadly as you can. in the various areas to kind of like overlay, you know, broad coverage, you know, of an attack surface. And so, you know, I mean, while we're on the topic of like subdomains and DNS, right, I mean, there is a lot of, you know, hidden attack surface in DNS as well. That's kind of interesting. Is there anything that you wanted to sort of dive into there?

Shubs: Yeah, I mean, it's going to be something that we discuss quite heavily next week at B-Sides Canberra. But, you know, we've noticed a lot of really wacky things with DNS over the last six years, and this one is probably one of the wackiest of them all. where we've identified, you know, basically poisoning across a very large set of domains, over 30 million domains, potentially more. And, you know, it affects primarily Chinese-based infrastructure. And yeah, it is very interesting, but essentially, you know, we'll be discussing that next week, but it will open up a can of worms in the sense of, you know, there's not really any amazing fixes out there for that. And what we're really trying to do is raise awareness, because I think that there's a lot of infrastructure on the internet that's affected by this. And the hidden attack surfaces, you know, if you're not in control of your DNS, and sometimes that means you have to be in control of that infrastructure that even serves the records at the end of the day, the name servers. But if you're not in control of that, then anything could happen along the way, including poisoning attacks, which is what we've seen. for some of this infrastructure that we've discovered. And yeah, it's definitely very interesting to us from an attack surface perspective, because I don't think any of the companies on these 30 million domains are even aware that this is happening right under their nose. And this attack surface is definitely accessible and exploitable from the external internet by anyone, but there's not much information on the internet. They don't know about it and it's still present today.

MG: Yeah, and it is very hidden attack surface and it's kind of interesting because that topic, if we want to tie it into what we were discussing last week, I'd say that kind of veers a lot more into depth, more so than breadth in that particular case. It is genuine attack surface that is creating exposure and it'll be an interesting talk. We're going to talk about it in detail next week or at least next week at the time of recording this in Canberra at BSAT's Canberra. So if anybody's going there, check it out. I'm sure they'll be published and we'll publish a blog post and the slides and things like that if anybody's interested when we've presented that next week. But yeah, so is there anything else really, you know, to continue on this idea of this sort of blind side of internet-wide scanning or even just, you know, generally sort of blind spots in, you know, IP-centric attack surface mapping?

Shubs: Yeah, definitely. I think the big one is path-based routing. So this is where you have a number of different applications that are being routed to you through specific paths. Now, I've seen this across so many large companies in the last couple of years where what they'll do is they'll have one host set up, which is kind of acting as some sort of reverse proxy to a bunch of different internal hosts, but all of those internal hosts can only be routed to if you've got a specific directory provided inside that initial host. And what this means is, again, internet-wide scanning tools are going to be completely blind to this. They're not going to know what directories to hit, how to get there, and so on and so forth. But this is actually a really tricky challenge because it is It is a step up on obscurity than anything else. We don't have amazing data that's being logged anywhere about what these paths are going to be. I know there are some data sources like VirusTotal and a few other places that log things like paths and obviously archive.org and things like that. At the same time, a lot of this information is quite obscure and you have to do a lot of brute forcing to discover these paths, or you have to do a lot of analysis of other elements and aspects like we discussed in the depth section of last episode, which is the JavaScript and other things that we may notice on the attack surface. Even redirect locations could be useful in some cases as well. But yeah, this is definitely something that I think is happening more and more as people are developing more resilient web proxies to internal infrastructure. We're seeing the rise of technologies like Envoy, Envoy and Kong and Nginx, some of the Nginx functionalities where a lot of people are just proxying things. at a very large scale through a really big load balancer to something that's internal of nature when provided a specific path. And this is a huge blind area for all internet wide scanning tools.

MG: Yeah, and it's a tricky thing to deal with. And as you said, again, I mean, I hate to harp on about it, but it really extends on what we were talking about last week around trying to think about things from a different angle. Don't think about this sort of problem as well. If I've got this one tip and trick that will help you sort of uncover and unwrap all this path-based rounding and all these other tools, It's not the right way to think about it. If you encounter this, you need to start thinking about how can I use other elements to expand the breadth, and that's looking at other concepts as well. It's like looking at the depth, understanding what's running, like you mentioned, like JavaScript can be very useful here, even redirect locations, like all this other sort of metadata that you can collect, but also context as well is also very important and understanding what the application is, you know, understanding, you know, if it is a service, if it is an open source tool, like how does that actually work more specifically and are there angles within that to be able to then unlock the breadth that's been hidden away and blinded by these techniques. I think it's just reinforcing that idea. This is how, overall, you get better at discovering things and you get more completeness. I was a little bit disheartened to see some comments from last week that I was like, yeah, but you know what would be really great? So if you just gave us a list of tips and tricks that you use for recon, it's like, mate, you've missed the point entirely of what this is. So that's why we're trying to reinforce that a little bit. And I think That's what really, truly separates the folks who are a cut above when it comes to discovery and attack surface mapping versus those who aren't at that level, let's say. And it's because they think about it in a completely different way, and they're not reliant on just, you know, regurgitating techniques or copying techniques. And that goes not just, you know, I'm not just talking about say bug hunters, I'm talking about, you know, products and tools as well, right? You know, it's beyond that. You know, there are different tiers of this. And I think it all stems to the approach that you take to this and how you think about this more than anything else, because that's how it's going to develop and improve. And so, so yeah. I think we can wrap it up there. You know, as we mentioned, you know, we're going to be in Canberra next week for B-Sides. We've got a really interesting presentation. And so, you know, if you're in town, if you're heading to that conference, hit us up. You know, we'd be happy to meet you. Otherwise, you know, we'll be around next week for another.

Shubs: I got one last thing to add, and it's probably one of the funniest observations I've had in reconnaissance. Yeah. So there's this really common pattern that I see across anyone doing reconnaissance. And it's so funny because we as hackers are supposed to challenge things, right? So what I see is every time someone comes across an asset, asset that represents like on the index page, it's like a 404 or a 500 or a 502 or the IIS blue page. They just immediately say, oh, there's nothing there. Let me skip this asset. And it's like, no, that's actually like, there's actually probably a really good reason that that stuff is there and there's probably something there. And it's one of the funniest things I've seen. And I think time and time again, if you challenge that perception of, no, there's nothing there, you will find something there eventually.

MG: It's simply thinking, right? And it's about trying to take it to another level.

Shubs: Yeah. I mean, if you just think about it a bit more, it's like, why would the index page give you a 502? Is it because it's trying to go to an internal host that's offline? Is there a specific path that may go to another internal host that's online? These are some of the things that I think, and this is what I think you were talking about and what I'm talking about too, is the two sessions we've had on this, the two podcasts we've had on this, is about trying to teach people to think in a more intelligent way and reconnaissance, not just use tools and tips and tricks and whatever. This is about a mindset and a methodology at the end of the day.

MG: Yeah. I mean, it's just like what you said, you know, somebody sees that and they think, Oh, there's nothing there. But what you see out of that and what we see out of that is like a thousand different questions of like, wow, this is, you know, this could be something right. Um, and it's worth exploring it. And it just comes from understanding things a little bit deeper and approaching it in a certain way. And that's, that's what we're trying to, I guess, shine a light on and share in terms of how we think about that at least. And hopefully it can extend on that.

Shubs: No one spins up infrastructure for no reason. No one spins up blank infrastructure. They don't think they don't just say yeah I was gonna have I was gonna spin up a web server with nothing on it.

MG: Yeah, that's not I'm gonna spend more money on yeah, so whatever just to have more stuff

Shubs: Exactly. There's usually always something there.

MG: But again, that comes back to context. It's like, you know, if you're looking at a company, it's like, that's exactly the case. It's like, nobody does things for no reason, you know? And, um, and so, you know, if you understand that you can maybe, you know, find things out a bit more, but you know, it's another good episode. Um, you know, we'll be back, uh, talking about more of this stuff, I guess, and keep recording more episodes. So, uh, yeah, I think we can wrap it up there. Thanks.

Shubs: Thanks everyone.

‍

Subscribe to our newsletter

Subscribe to our newsletter and stay updated on the newest research, security advisories, and more!

Your subscription could not be saved. Please try again.

Your subscription has been successful.

More Resources Like This One

Ready to get started?

Get on a call with our team and learn how Assetnote can change the way you secure your attack surface. We'll set you up with a trial instance so you can see the impact for yourself.

Request a Demo