7680
Comment:
|
10101
|
Deletions are marked like this. | Additions are marked like this. |
Line 15: | Line 15: |
'''Comment''': I didn't really think about it, I guess I assumed we were using Cyrus because I hadn't looked. I don't know anything about Courier, so I'm open to discussion about what the best server is for our configuration. | '''Comment''': I don't have a reason for putting Cyrus in the list. I did it because I hadn't looked at our present servers to see that they've got Courier on them and I assumed it was Cyrus. I don't know much about IMAP servers in general, so I'm open to whatever the group thinks is best for us, long-term and in the present. --JustinLeitgeb |
Line 21: | Line 21: |
* A hardware firewall is really not very useful unless we were planning on transfering hundreds of gigabytes a day, and even then a machine running OpenBSD with PF would be faster per dollar than any dedicated hardware. | |
Line 36: | Line 37: |
* 1GB of RAM is OK for a machine that serves up static HTML, but the machine that runs the dynamic content should have a lot more memory. * Dual power supplies are also '''really''' overkill. |
|
Line 37: | Line 40: |
* If we use AFS (we really should) then it doesn't matter so much if the machine has room to grow since adding more diskspace is as easy as adding another machine | |
Line 41: | Line 45: |
* Dell should be avoided simply because their build quality is suspect, and EMT64 Xeons are grossly overpriced compared to Opterons ($/performance) | |
Line 42: | Line 47: |
* Penguin Computing Altus servers * [http://www.penguincomputing.com/index.php?option=com_content&id=123&Itemid=184&task=view Altus 1400] for web serving * [http://www.penguincomputing.com/index.php?option=com_content&id=246&Itemid=346&task=view Altus 2100] for file serving (perhaps overkill; a 1400 would probably be fine given that it has four drive bays) |
|
Line 49: | Line 57: |
All are in agreement that we need a robust backup plan in our new architecture. It seems that it will include the continued use of [ http://www.rsnapshot.org rsnapshot], and that this utility will save even the front-end server data to the fileserver with RAID 5. Additionally, we should have data stored off-site in a manner that allows us to recover, even in the event that we are "rooted". We are looking for backup capabilities in colocation providers. Another option could be to have rsync-style backups to some administrators connection over the Internet, but this might not be tenable given the amount of data, the need for quick restores, etc. Let's continue to edit this section! |
All are in agreement that we need a robust backup plan in our new architecture. It seems that it will include the continued use of [http://www.rsnapshot.org rsnapshot], and that this utility will save even the front-end server data to the fileserver with RAID 5. Additionally, we should have data stored off-site in a manner that allows us to recover, even in the event that we are "rooted". We are looking for backup capabilities in colocation providers. Another option could be to have rsync-style backups to some administrators connection over the Internet, but this might not be tenable given the amount of data, the need for quick restores, etc. Let's continue to edit this section! |
Line 52: | Line 59: |
== Scaling Out == | == Scaling for Redundancy and Performance == |
Line 54: | Line 61: |
The next configuration should be reasonably scaleable, as we are expecting to grow rapidly in size. How should we scale our systems? Some ideas follow: | The next configuration should be reasonably scaleable, as we are expecting to grow rapidly in size. How should we scale our systems? Although the new configuration, which will start with only three servers, will have many single points of failure, a goal of the Hcoop, Inc. engineering team should be to reduce these to a point where downtime is close to 0%. Following are some ideas about how we can accomplish this: |
Line 56: | Line 63: |
* Make sure our software is able to publish to a web cluster. Eventually, we may have a dedicated "development" web node with a testing apache configuration. Once they are satisfied with their changes, they could push out these changes to a cluster. This could just mean putting them in a special location on the fileserver that is pulled from by a set of web nodes. Think about how to do load balancing on this cluster. F5 Networks has a device that will load-balance, but perhaps a linux solution would be more affordable. * Precisely how do we scale our fileserver? Does AFS have a mechanism where new fileservers can be added to the available space? Is RAID 5 the best solution, or should be start with something like RAID 10 for better IO throughput? Will AFS caching to the front-end web nodes be sufficient to mitigate the latency that will be introduced by a relatively slow RAID 5 configuration? * Database clustering possibilities: Many of our sites seem to depend on MySQL. In the future, something like MySQL clustering may help us by having a tuned database cluster to be shared by a lot of dynamic web sites. This could then be put behind a VIP (Virtual IP). |
=== Web Server Clustering === |
Line 60: | Line 65: |
== Page version history == | Lots of shops hide a cluster of web servers behind a virtual IP address for increased capacity and redundancy. Load balancing of this type may be accomplished through round-robin DNS with multiple A records (see [http://www.zytrax.com/books/dns/ch9/rr.html#services HOWTO - Configure Round Robin Load Balancing with DNS] for more info), or perhaps in a more elegant fashion with a single A record pointing to a virtual IP address. Web servers are hit in a round-robin fashion, and any that become unavailable are dynamically marked as "down" in the router. |
Line 62: | Line 67: |
Initial page created Sat Mar 25 11:52:03 EST 2006 JustinLeitgeb. | The following devices and programs may be able to do this for Hcoop: * [http://www.apsis.ch/pound/ Pound] is an open-source load balancer that would take care of load balancing and failed web servers. According to the Pound home page, it has been tested with up to 5M hits per day. * [http://www.linuxvirtualserver.org/ The linux virtual server project] may be a viable alternative. * [http://www.f5.com/ F5 networks builds load balancing routers that support virtual IP addressing] * Other options? === File Server Clustering === Since Hcoop, Inc. is using AFS, we should look at ways that we can use multiple file servers in a cluster. Perhaps a virtual IP will work here as well, so that the loss of a fileserver doesn't bring down the web servers. Would AFS caching be able to perform the same function for us? === Database Clustering === [http://www.mysql.com/ MySQL] 5.5 supports clustering which helps to achieve higher numbers of queries per second, as well as redundancy in the event that we lose a node. MySQL replication is another option. Web servers would point to a virtual IP in this schema as well, in order to have the loss of a database cluster node be a no-downtime situation (given that load is not too much for the remaining nodes). Both Postgres and Oracle have similar technologies that could be discussed here. Using one of these replication technologies, in order to avoid complicating the applications resting on our sites we would probably have to rely on a virtual IP. [http://www.f5.com F5 Networks] offers one possible VIP option. |
1. Details about the next Hcoop Architecture
This page is intended to facilitate discussion of details relating to our next server architecture. Currently, the first draft of this page, written on Sat Mar 25 10:18:12 EST 2006 by JustinLeitgeb, is based upon discussions from the hcoop mailing list. Please feel free to contribute or change anything here!
1.1. Hcoop Future Network Overview
The architecture for the next hcoop.net network involves three physical servers:
- A fileserver, running AFS which is accessible via ssh only for administrative purposes
- A public login and http server, accessible by all members. User files are stored on the fileserver mentioned above. This will host all user pages, including dynamic content.
- A server for hcoop needs that most users won't need direct shell access to. This will run Cyrus IMAP, exim, and Apache primarily for hcoop administrative purposes.
Question: I don't remember deciding on using Cyrus IMAP. We currently use Courier, and I know that the formats the two use are incompatible. I also believe that Cyrus has more overhead needed to get it going, but that it scales better, especially to cases where most e-mail accounts aren't associated with UNIX usernames. Is that why you chose it? The other two daemons you listed are what we already ready. --AdamChlipala
Comment: I don't have a reason for putting Cyrus in the list. I did it because I hadn't looked at our present servers to see that they've got Courier on them and I assumed it was Cyrus. I don't know much about IMAP servers in general, so I'm open to whatever the group thinks is best for us, long-term and in the present. --JustinLeitgeb
Additionally, we will need certain networking equipment:
- A gigabit switch that will be the initial backbone of the hcoop LAN.
- Perhaps (still not finalized in plans) a hardware firewall for the hcoop LAN. Ideas on this from members?
- A hardware firewall is really not very useful unless we were planning on transfering hundreds of gigabytes a day, and even then a machine running OpenBSD with PF would be faster per dollar than any dedicated hardware.
We should also remember that all of our servers will most likely have at least two NIC's. How can we utilize these best? Some sites have one NIC doing backups or logging, and another handling requests from the Internet. Perhaps we could segment our traffic to two local area networks, one for services to the Internet and another for local file access (i.e., traffic between the two "public" servers and the file server).
1.2. Hcoop Future Network Diagrams
The following are a preliminary version of a network plan that JustinLeitgeb created on March 25, 2006, after discussions on the hcoop.net mailing list. Included in the design is a hardware firewall, which was not finalized in previous discussions. Let's collect thoughts and alternate plans here as we work towards solidifying plans.
- [attachment:network_diagram_20060325.dia Network planning diagram in "dia" format for editing]
- [attachment:network_diagram_20060325.png Network planning diagram in PNG format for easier viewing]
1.3. Server Hardware
This may be a moot point as we are looking for a shop that can give us hardware support, and this may require that we buy their supported machines. However, it seems that many colocation providers will try to push us into a deal where their support consists in a "remote hands" plan where they will fix any reasonably standard hardware that we send to them for an hourly rate. If that is the case, our discussions on possible server hardware on the list may still be valid. Generally, we have decided that what we need in terms of hardware is more or less as follows:
- Two web servers with at least 1GB of RAM each. Redundancy should include a RAID 1 configuration with two 73 GB drives, and dual power supplies.
- 1GB of RAM is OK for a machine that serves up static HTML, but the machine that runs the dynamic content should have a lot more memory.
Dual power supplies are also really overkill.
One file server with more storage space and room to grow. It doesn't need to be exceptionally fast because of AFS's caching mechanisms. Perhaps a small RAID 5 configuration of 3 x 500 GB SATA devices would be a good place to start. It should certainly be hardware - based RAID so that main CPU power is not needed for read and write operations. JustinLeitgeb suggested a [http://www.3ware.com/products/serial_ata2-9000.asp 3Ware Escalade Controller] in this machine.
- If we use AFS (we really should) then it doesn't matter so much if the machine has room to grow since adding more diskspace is as easy as adding another machine
The list also discussed hardware vendors. If this isn't a moot point based on our decision of a colo provider with specific needs, the following list of possibilities may still be relevant:
Dell PowerEdge servers. JustinLeitgeb suggested [http://www1.us.dell.com/content/products/productdetails.aspx/pedge_1850?c=us&cs=04&l=en&s=bsd Dell 1850's] for the web servers, and a [http://www1.us.dell.com/content/products/productdetails.aspx/pedge_2850?c=us&cs=04&l=en&s=bsd 2850] for the fileserver. One drawback to this server line is that it uses Intel, which is less desireable currently than AMD. However, this server line has been in production for quite a while and it has proven stable in many situations.
- Dell should be avoided simply because their build quality is suspect, and EMT64 Xeons are grossly overpriced compared to Opterons ($/performance)
- Sun fire servers. These machines use AMD processors but are considerably more expensive than comparable Dell machines.
- Penguin Computing Altus servers
[http://www.penguincomputing.com/index.php?option=com_content&id=123&Itemid=184&task=view Altus 1400] for web serving
[http://www.penguincomputing.com/index.php?option=com_content&id=246&Itemid=346&task=view Altus 2100] for file serving (perhaps overkill; a 1400 would probably be fine given that it has four drive bays)
1.4. Networking Hardware
Here we should talk about the specific networking equipment that we need. Ideas on vendors or models for the gigabit switch? Thoughts on if we should start with a hardware firewall device? Also it was mentioned that we should invest in a serial console for remote access when a machine goes down. Thoughts on this?
1.5. Backup Configuration
All are in agreement that we need a robust backup plan in our new architecture. It seems that it will include the continued use of [http://www.rsnapshot.org rsnapshot], and that this utility will save even the front-end server data to the fileserver with RAID 5. Additionally, we should have data stored off-site in a manner that allows us to recover, even in the event that we are "rooted". We are looking for backup capabilities in colocation providers. Another option could be to have rsync-style backups to some administrators connection over the Internet, but this might not be tenable given the amount of data, the need for quick restores, etc. Let's continue to edit this section!
1.6. Scaling for Redundancy and Performance
The next configuration should be reasonably scaleable, as we are expecting to grow rapidly in size. How should we scale our systems? Although the new configuration, which will start with only three servers, will have many single points of failure, a goal of the Hcoop, Inc. engineering team should be to reduce these to a point where downtime is close to 0%. Following are some ideas about how we can accomplish this:
1.6.1. Web Server Clustering
Lots of shops hide a cluster of web servers behind a virtual IP address for increased capacity and redundancy. Load balancing of this type may be accomplished through round-robin DNS with multiple A records (see [http://www.zytrax.com/books/dns/ch9/rr.html#services HOWTO - Configure Round Robin Load Balancing with DNS] for more info), or perhaps in a more elegant fashion with a single A record pointing to a virtual IP address. Web servers are hit in a round-robin fashion, and any that become unavailable are dynamically marked as "down" in the router.
The following devices and programs may be able to do this for Hcoop:
[http://www.apsis.ch/pound/ Pound] is an open-source load balancer that would take care of load balancing and failed web servers. According to the Pound home page, it has been tested with up to 5M hits per day.
[http://www.linuxvirtualserver.org/ The linux virtual server project] may be a viable alternative.
[http://www.f5.com/ F5 networks builds load balancing routers that support virtual IP addressing]
- Other options?
1.6.2. File Server Clustering
Since Hcoop, Inc. is using AFS, we should look at ways that we can use multiple file servers in a cluster. Perhaps a virtual IP will work here as well, so that the loss of a fileserver doesn't bring down the web servers. Would AFS caching be able to perform the same function for us?
1.6.3. Database Clustering
[http://www.mysql.com/ MySQL] 5.5 supports clustering which helps to achieve higher numbers of queries per second, as well as redundancy in the event that we lose a node. MySQL replication is another option. Web servers would point to a virtual IP in this schema as well, in order to have the loss of a database cluster node be a no-downtime situation (given that load is not too much for the remaining nodes).
Both Postgres and Oracle have similar technologies that could be discussed here. Using one of these replication technologies, in order to avoid complicating the applications resting on our sites we would probably have to rely on a virtual IP. [http://www.f5.com F5 Networks] offers one possible VIP option.
1.7. Related Pages
ColocationPlans is the main page for items related to the new architecture. ColocationPlansServiceProviders provides information about the service providers we are currently looking at.