Unexpected performance problems, such as low throughput or high latency, often arise in distributed systems, and are of particular concern in high-performance distributed systems. Finding the reasons for these problems is challenging because the nature of the systems tends to multiply the number of possible points of failure.
NetLogger is a methodology that enables the real-time diagnosis of performance problems in complex high-performance distributed systems. NetLogger includes tools for generating precision event logs that can be used to provide detailed end-to-end application and system level monitoring, and tools for visualizing log data to view the state of the distributed system in real time.
NetLogger will be described as it presently stands including current problems and shortcomings. Using this as a basis, future development directions will be discussed.
Snacks will be provided.
See Conundrum Talks for more information about this series.