Host Name Lookups with F#

Introduction

I recently found myself needing to use remote desktop on a machine which for some reason I:

  1. Could not access by name.
  2. Could not find the IP of.

Rather than facing the embarrassment of asking one of the network guys for help, I put together an F# script to find the IP of the machine I required. However, there were a few pitfalls along the way regarding performance, so I’d like to talk about how the script started and what it eventually evolved into.

A Synchronous Version

After a little research, I found that the static methods contained in System.Net.Dns could be used to do host name lookups. I knew the range of IPs used for our local network, so I came up with a script similar to the following. The script below includes a bit of diagnostics code in order to time the operation.

  1. open System.Net
  2. open System.Diagnostics
  3. open System.Threading
  4.  
  5. let time a =
  6.     let s = Stopwatch()
  7.     s.Start()
  8.     a()
  9.     s.Stop()
  10.     s.ElapsedMilliseconds
  11.  
  12. let getHostNamesSync() =
  13.     Seq.init 256 (fun i -> IPAddress([| 192uy; 168uy; 0uy; i |> byte |]))
  14.     |> Seq.map (fun ip -> try Some(ip, Dns.GetHostEntry(ip)) with ex -> None)
  15.     |> Seq.choose id
  16.     |> Seq.iter (fun (ip, hn) -> printfn "%O – %s" ip hn.HostName)
  17.  
  18. getHostNamesSync |> time |> printfn "took %i ms"

I ran that script on a network on which ~20% of those IPs are actually used, and away it went. The final few lines of the script were something along the lines of:

  1. 192.168.0.146 – somemachine.domain.com
  2. 192.168.0.149 – anothermachine
  3. 192.168.0.150 – mymachine.domain.com
  4. took 1834333 ms

1834333ms – that’s more than half an hour! After a little thought, I figured that my machine was basically sat there waiting for lookups to time out for the vast majority of that time. This wasn’t really acceptable, even for a script that would only ever run on my development machine!

An Asynchronous Version

Luckily, Dns supports an asynchronous version of GetHostEntry, so my next iteration used that.

  1. let getHostNamesAsync() =
  2.     Seq.init 256 (fun i -> IPAddress([| 192uy; 168uy; 0uy; i |> byte |]))
  3.     |> Seq.map (fun ip -> ip, Dns.BeginGetHostEntry(ip, null, null))
  4.     |> Seq.toArray
  5.     |> Seq.map (fun (ip, ias) -> try Some(ip, Dns.EndGetHostEntry(ias)) with ex -> None)
  6.     |> Seq.choose id
  7.     |> Seq.iter (fun (ip, hn) -> printfn "%O – %s" ip hn.HostName)

The changes here may require some extra explanation. Rather than calling Dns.GetHostEntry, we call Dns.BeginGetHostEntry which returns an IAsyncResult. The IAsyncResult is then passed to Dns.EndGetHostEntry when we’re ready to accept the result.

The key line here is the call to Seq.toArray. This evaluates all of the BeginGetHostEntry calls at that point, which queues all the work required onto the thread pool and creates us an array of IAsyncResult objects. This ensures that all BeginGetHostEntry calls are evaluated up front, i.e.

  1. Dns.BeginGetHostEntry
  2. Dns.BeginGetHostEntry
  3. Dns.BeginGetHostEntry
  4. Dns.EndGetHostEntry
  5. Dns.EndGetHostEntry
  6. Dns.EndGetHostEntry

Without that call to Seq.toArray, we’d just have a lazily evaluated sequence of IAsyncResult objects which would be evaluated one at a time. The calls to BeginGetHostEntry would occur as we iterated over each item, followed by a corresponding call to EndGetHostEntry, i.e. in this order:

  1. Dns.BeginGetHostEntry
  2. Dns.EndGetHostEntry
  3. Dns.BeginGetHostEntry
  4. Dns.EndGetHostEntry
  5. Dns.BeginGetHostEntry
  6. Dns.EndGetHostEntry

That’s essentially synchronous. We’d get no benefit whatsoever. However, with Seq.toArray causing evaluation of every BeginGetHostEntry up front, we get our desired order of evaluation.

At this point, it’s worth noting that many operations will not benefit from starting a couple of hundred similar operations up in parallel. Reading files is just one obvious one I can think of, where disk I/O is a major bottleneck. In situations like that, using this technique is likely to do more harm than good. However, for this particular operation, where the bottleneck is simply caused by timeouts, we can gain a big performance boost. In fact, this particular workload could be described as embarrassingly parallel.

When timing the asynchronous function, the last few lines of output were:

  1. 192.168.0.146 – somemachine.domain.com
  2. 192.168.0.149 – anothermachine
  3. 192.168.0.150 – mymachine.domain.com
  4. took 69096 ms

That’s much, much better. 26.5 times faster in fact! However, I couldn’t help but feel I could do even better.

More Improvement

My search for further improvement was based around this theory:

If each lookup started at exactly the same time and ran truly in parallel, the time taken to run this thing should be little more than the time taken to timeout on one request.

I knew that each call to BeginGetHostEntry queued some item of work onto the thread pool. The thread pool starts out quite small, but will gradually expand to accomodate more threads if work is left waiting for long. I realised that many of my calls were probably going to start out sat in a queue, waiting for a thread to become available before they could execute.

If this was the case, increasing the initial number of threads available in the thread pool should allow my code to run more quickly. Now it’s not normally advisable to mess with thread pool settings unless you’re deity appointed threading royalty, but as this was simply a lowly F# script, and not part of a larger system, I didn’t see any issue with it.

I added this line to the start of my script and re-ran (note, my machine is a quad core, so completion port threads remains at its default):

  1. ThreadPool.SetMinThreads(250, 4)

The output read:

  1. 192.168.0.146 – somemachine.domain.com
  2. 192.168.0.149 – anothermachine
  3. 192.168.0.150 – mymachine.domain.com
  4. took 9377 ms

That was over 195 times faster than my original synchronous script! A much more acceptable 9 and a half seconds. To test my theory about the requests running in parallel, I did a single synchronous lookup for an IP I knew had no machine name. It took a couple hundred milliseconds less time than the asynchronous lookup for 256 IPs, ~9100ms. My script to lookup 256 host names took a very similar amount of time as one which looked up a single non-present host name!

Conclusion

I found the machine I was looking for, and after playing with its IP configuration for a little while, I was able to get it onto the network properly. For anyone that’s interested, the final version if the script I created is pasted below:

  1. open System.Net
  2. open System.Threading
  3.  
  4. ThreadPool.SetMinThreads(250, 4)
  5.  
  6. let getHostNamesAsync : IPAddress seq -> (IPAddress * IPHostEntry) seq =
  7.     Seq.map (fun ip -> ip, Dns.BeginGetHostEntry(ip, null, null))
  8.     >> Seq.toArray
  9.     >> Seq.map (fun (ip, ias) -> try Some(ip, Dns.EndGetHostEntry(ias)) with ex -> None)
  10.     >> Seq.choose id
  11.  
  12. let printResult (ip, hn:IPHostEntry) =
  13.     printfn "%O – %s" ip hn.HostName
  14.  
  15. Seq.init 256 (fun i -> IPAddress([| 192uy; 168uy; 0uy; i |> byte |]))
  16. |> getHostNamesAsync
  17. |> Seq.iter printResult
Share and Enjoy:
  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Yahoo! Buzz
  • Twitter
  • Google Bookmarks
  • email
  • LinkedIn
  • Technorati