Yesterdays Outage

Andrew · Apr 17, 2014

Yesterday around 1PM our server became unreachable. It became unreachable not because the server itself was having issues but rather the network the server was attached too was having issues. Unfortunately this has been the third such incident from our host since our server was moved out of the Dallas datacenter and into the Provos datacenter. It is now very apparent our host is incapable of maintaining a network on a consistent basis and having any time of redundancy. There are the types of mistakes that we can't live with.

We will be moving servers in the coming weeks and we will keep you posted. We have a lot of custom code and configuration and our site takes up a large amount of space so we have some prep work ahead that will make this move not so instant.

Andrew

Angry Ibis · Apr 17, 2014

Andrew said:
Yesterday around 1PM our server became unreachable. It became unreachable not because the server itself was having issues but rather the network the server was attached too was having issues. Unfortunately this has been the third such incident from our host since our server was moved out of the Dallas datacenter and into the Provos datacenter. It is now very apparent our host is incapable of maintaining a network on a consistent basis and having any time of redundancy. There are the types of mistakes that we can't live with.

We will be moving servers in the coming weeks and we will keep you posted. We have a lot of custom code and configuration and our site takes up a large amount of space so we have some prep work ahead that will make this move not so instant.

Andrew

Less talk. More wins.

rsa coral gables · Apr 17, 2014

how about the NAP of the Americas, right in downtown miami?

Cribby · Apr 17, 2014

I'm almost positive that Denofrio was behind this

JHallCanes · Apr 17, 2014

Sounds like Dorito blaming his players

We'retheBoss · Apr 17, 2014

Cribby said:
I'm almost positive that Denofrio was behind this

His D last year couldn't shut down anything, much less a network.

canesmang1 · Apr 17, 2014

Can the new server play DT

Andrew · Apr 17, 2014

JHallCanes said:
Sounds like Dorito blaming his players

292d, 7h, 54m, 50s

That is the uptime of our actual server. Yesterday was not a server crash. This is the message we got from our host. Clearly they blame a firmware bug but make no mention of redundancy which should be in place of their network.

Greetings,

While most all services have been restored as of this time, I'd like to first note that we're still working towards tying up the final lose ends. These remaining issues are absolutely a priority for us currently. In the meantime we do want to provide some more information and answer everyone's questions with what details we do have available currently.

Q: What happened?
A: We experienced a degradation of network service in one of our data centers due to a firmware bug in one of our vendor’s hardware solutions. This was an undocumented bug and we worked with our partner to diagnose the issue and deployed a firmware update to the systems to remediate the problem. Only websites that were being served by this hardware were affected.

Q: Was this related to any previous outage?
A: No, this is unrelated to any previous outages.

Q. Have you identified the problem?
A. Yes, we have isolated the problem to this firmware failure and the downstream effects that resulted from it. We have reviewed our entire network to make sure this problem will not occur elsewhere.

Q. Why did it take so long to address the problem?
A. We started to address the problem immediately when we began to see performance issues. The root cause of the problem was complicated to diagnose because it was an undocumented bug in software of a vendor’s hardware solution. Full service for some customers was restored immediately, but some servers were not visible on our network. We apologize for any downtime that you experienced. The servers continued to operate during this entire period, which means, that at no point in time was your data at risk. The problem was access to the servers because of the firmware issue.

Q. What happened to any email that was sent to me while this firmware issue was affecting the network?
A. There is good and bad news. Unfortunately, any message that was sent to you while we were experiencing this issue would not have been delivered, however the sender should receive a notice that their mail wasn't delivered and most mail servers will continue to try to re-send that email at periodic intervals, anywhere from 2 days to up to 7 days. While we cannot guarantee that any emails sent to you will be delivered, there is a very good chance that it will arrive...slightly delayed.

Q: How has Endurance's involvement with HostGator affected the situation?
A. Actually, this was not a result of Endurance. In fact, the team at our corporate headquarters was tremendously helpful in our recovery effort. They stayed with us throughout the entire incident. By committing the resources of the entire company, including technicians, customer service reps, and engineers, we were able to swarm the problem and address it as quickly as possible.

Q. Why did you leave SoftLayer?
A. We moved out of SoftLayer to be able to more fully control our server environment to provide a better customer experience. We work really hard to prevent issues like this from happening. We recognize that this transition has not been as smooth as either you or we would like and we take the issues that have occurred very seriously. We believe in the long run this is the best environment to deliver service to you.

Q. Do I have to worry about this happening again?
A. We would like to say that we will never have a network service outage again, but realistically that isn’t something we can promise. What we can assure you is that we are continually taking steps to audit and improve the performance of our infrastructure, and investing a large amount of capital and people to do this.

Last and certainly not least, I want to thank everyone for your extreme patience throughout this. We realize the situation is hugely frustrating, but we look forward to getting this resolved for you all and hopefully moving forward stronger.

RBhurricane87 · Apr 17, 2014

ACC Championship or get the **** out.

ThePorge · Apr 17, 2014

HostGator? Really? ...And you wonder why you've got problems.

cowboycane · Apr 17, 2014

Scheme?

Angry Ibis · Apr 17, 2014

Andrew said:
JHallCanes said:

Sounds like Dorito blaming his players

Click to expand...

292d, 7h, 54m, 50s

That is the uptime of our actual server. Yesterday was not a server crash. This is the message we got from our host. Clearly they blame a firmware bug but make no mention of redundancy which should be in place of their network.

Greetings,

While most all services have been restored as of this time, I'd like to first note that we're still working towards tying up the final lose ends. These remaining issues are absolutely a priority for us currently. In the meantime we do want to provide some more information and answer everyone's questions with what details we do have available currently.

Q: What happened?
A: We experienced a degradation of network service in one of our data centers due to a firmware bug in one of our vendor’s hardware solutions. This was an undocumented bug and we worked with our partner to diagnose the issue and deployed a firmware update to the systems to remediate the problem. Only websites that were being served by this hardware were affected.

Q: Was this related to any previous outage?
A: No, this is unrelated to any previous outages.

Q. Have you identified the problem?
A. Yes, we have isolated the problem to this firmware failure and the downstream effects that resulted from it. We have reviewed our entire network to make sure this problem will not occur elsewhere.

Q. Why did it take so long to address the problem?
A. We started to address the problem immediately when we began to see performance issues. The root cause of the problem was complicated to diagnose because it was an undocumented bug in software of a vendor’s hardware solution. Full service for some customers was restored immediately, but some servers were not visible on our network. We apologize for any downtime that you experienced. The servers continued to operate during this entire period, which means, that at no point in time was your data at risk. The problem was access to the servers because of the firmware issue.

Q. What happened to any email that was sent to me while this firmware issue was affecting the network?
A. There is good and bad news. Unfortunately, any message that was sent to you while we were experiencing this issue would not have been delivered, however the sender should receive a notice that their mail wasn't delivered and most mail servers will continue to try to re-send that email at periodic intervals, anywhere from 2 days to up to 7 days. While we cannot guarantee that any emails sent to you will be delivered, there is a very good chance that it will arrive...slightly delayed.

Q: How has Endurance's involvement with HostGator affected the situation?
A. Actually, this was not a result of Endurance. In fact, the team at our corporate headquarters was tremendously helpful in our recovery effort. They stayed with us throughout the entire incident. By committing the resources of the entire company, including technicians, customer service reps, and engineers, we were able to swarm the problem and address it as quickly as possible.

Q. Why did you leave SoftLayer?
A. We moved out of SoftLayer to be able to more fully control our server environment to provide a better customer experience. We work really hard to prevent issues like this from happening. We recognize that this transition has not been as smooth as either you or we would like and we take the issues that have occurred very seriously. We believe in the long run this is the best environment to deliver service to you.

Q. Do I have to worry about this happening again?
A. We would like to say that we will never have a network service outage again, but realistically that isn’t something we can promise. What we can assure you is that we are continually taking steps to audit and improve the performance of our infrastructure, and investing a large amount of capital and people to do this.

Last and certainly not least, I want to thank everyone for your extreme patience throughout this. We realize the situation is hugely frustrating, but we look forward to getting this resolved for you all and hopefully moving forward stronger.

Click to expand...

Did not read.

What page of the Binder is that on?

All I see in the table of contents is Excuses: System Outages; see also Defense

Cane Dynasty · Apr 17, 2014

Can we sue them for depravation of rights and mental anguish?

Angry Ibis · Apr 17, 2014

Cane Dynasty said:
Can we sue them for depravation of rights and mental anguish?

For kicking out the WEZ?

CaneSwag8 · Apr 17, 2014

Andrew said:
Yesterday around 1PM our server became unreachable. It became unreachable not because the server itself was having issues but rather the network the server was attached too was having issues. Unfortunately this has been the third such incident from our host since our server was moved out of the Dallas datacenter and into the Provos datacenter. It is now very apparent our host is incapable of maintaining a network on a consistent basis and having any time of redundancy. There are the types of mistakes that we can't live with.

We will be moving servers in the coming weeks and we will keep you posted. We have a lot of custom code and configuration and our site takes up a large amount of space so we have some prep work ahead that will make this move not so instant.

Andrew

Trust the process...

U Know · Apr 17, 2014

Stay tune!!!

UMprodigy · Apr 17, 2014

Get ready to open up your wallet errrrone

cane305 · Apr 17, 2014

#backinblack
power #surge
where is the #juice

eliteproxy · Apr 17, 2014

this is what happens when you don't use all 12 pillars.

edge · Apr 17, 2014

No due diligence when choosing this hosting company to make sure redundant backbone connections are in place? Sorry had to ask.

Yesterdays Outage

All American

The []_[]

Malentendido

All-American

2028 is our year

Sophomore

Sophomore

All American

Senior

Redshirt Freshman

All-ACC

The []_[]

Thunderdome Survivor

The []_[]

Canes fan

Junior

Duke County Brah

Band

Junior

Junior