The 9 months of my life. PART 1

Erin's picture
So it is finished. There is allot of things I have been neglecting over the past 6-9 months. This blog is one of them. About 2 years ago I saw the first boss I had in Tokyo in the lobby of the building I work in. He said "You finally made it." "The only people who believed I could not do it were you and my wife." I replied. In some ways I regret that statement, even though it was true. About my former boss, I owe him allot. I was in a pretty messed up place when we met and he helped me out, allot. When we met I had nothing. He recognized my talent but later did not exactly know how to apply it. None the less he gave me a good salary and in addition and set me up in a place in Tokyo. No small feat at that time because just getting a place, a tiny 1 bedroom apartment took almost 5000 American dollars, and involved jumping through more hoops than a circus seal. He fronted it all for me and managed me paying him back out of my salary over 6-7 months. I left that job after 9 months. I learned allot during that period, mostly about Linux and Linux based systems, but also about who I was and would become. My boss I think learned allot as well. About me and about dealing with non-japanese employees, who were not afraid to loose everything and starting again. I am sure that his whole office is better off for all the shit him and I went through together. Flash back to today. This week I migrated my last servers out of a VERY costly data center. To a more cost efficient situation closer to the office I work at. At one time there were 60+ servers. Solaris Unix from 2.6-5_10 with a mix of Sparc and i386 hardware. My company had bought a few smaller ISPs to improve its balance sheet 5 years before and the ISP that I worked on was one of them. It took me about 3 trips to the data centers we had to realize the hardware was old, the os'es were old and the ideas that ran them were old. It was a inefficient waste of space to energy, and money. We needed to do better. After a few months of not knowing what to do or how to do it. I became closer friends with the senior engineer in our team. He helped me with the things I did not understand and showed me how to use some of the most powerful tools in Unix like snoop/tcpdump and truss/strace. I still will never EVER have the grasp that he has but I at least have a clue now. "When you realize how little you truly understand, you are finally on the road to true learning." --Me. One day he showed me a newish style system which he envisioned could replace all the systems we currently held into one. It could reduce costs across the board, in hardware, software, and rack space. I believed in that. It seemed so simple and elegant at the time. Last year around October one of business managers held a meeting to say he wanted to find away to move all the servers in one of the services I worked on to a less costly solution. He asked me to set up some servers to handle the duties of the servers he wanted to retire. I said "no." I remember his face. I do not think he had been told "No" directly to his face, in a long long long time. I told him about my team mates solution and how if we used this service as a starting point, at least we would be starting with X number of users. People nodded and smiled. I felt sick and was sure I would be fired. In Japan the nail that sticks up the highest gets hit down the hardest. The meeting ended. I was not fired which was kind of a surprise to the people in the meeting. Then about 1 or 2 months later we were told that a manager had a bold new vision of how we would do things in the future. IE My team mates idea would be "used" and we would begin the project. At first there was rejoice and praise. That quickly turned into "Well how will we ??????" the questions were never ending. Some where well deserved others were frivolous. Time was wasted and at this time my team mate was under the most stress. His plans for provisioning existing and new users came under almost no scrutiny and he attempted to create a system to make user registration, payment, and services automatic. I tried to help out, but my efforts were largely ineffectual, and served mostly as comic relief. If any one we work with reads this, they might be offened, cause I did not say "so and so did XYZ while this and that person did 123." The truth is 1 guy did 90% of the "Getting it done." Around March we started to get ready to actaully move some users onto the system. We migrated the data about 40 or 50GB from 17 different servers to the new system. Finally one day after the users had been provisioned, all accounts created, and a initial rsync of the data finished. We cut over. We stopped all services. We rsynced all the final data. We moved the data into the correct directories. We changed the DNS. We held our collective breath at 2am and finally took a breath around 12 the next day. We were live on the new system. It was a awesome feeling to pull that off. 17,000 user accounts from one system that ran Solaris 2.6 mostly to a new one that used 1/2 the power 1/3 of the space and was on old hard ware that was ready to die at any second to new easily replaceable and redundant hardware with, less than 6 hours of down time*. There were problems and little gotchas we did not see coming but most were solved in the following 12 hours and the rest were solved in the next 48. I sat down with one of the Customer Service Managers to talk about what went right and wrong. We did what we could to affect change for the next time around. There were problems but they were very superficial, compared to what we would face. Very superficial. There was no real after action and we just simply accepted a slightly hollow victory. It was huge monkey off my back though. The old servers caused more than one sleepless night... a week. End Part 1. Coming Soon Part 2