Docs

    ElastiCache exposes many metrics through CloudWatch. Blue Matador monitors some of these metrics for anomalies so you can correlate them with issues in your application.

     Evictions


    Blue Matador will monitor each node in your cluster for an abnormal number of evictions. Evictions occur when the cache engine removes things from the cache to clear up memory, and a spike in evictions can mean your cluster is running out of memory. If you are experiencing eviction spikes, you may need to scale up your cluster.

     

    Current Connections


    ElastiCache reports the number of connections currently made to the cache. A healthy threshold for this metric depends on your application’s cache needs. Blue Matador applies anomaly detection to this metric to determine what is normal for your application. When this metric spikes, you will need to investigate your application to see why it is connecting to the cache more frequently. The spike could be caused by a bug in code, or in a sharp increase in traffic to your application.

     

    Replication Lag


    If you’re running a Redis cluster in ElastiCache, Blue Matador will also monitor your replicas for replication lag. A high replication lag value means your read replicas are overloaded by the amount of load they are handling and are unable to keep themselves in sync with primary nodes. If your replicas are experiencing replication lag, check your application to see if it is reading more from the cache than it should. If you are experiencing a legitimate amount of load, you will need to add read replicas to your cluster.

     

    Resources