![]()
Broadcasting Computer News / Reviews - Since 1989 - Now Hear it on the Web via the Internet
“Why External Drives Must Have an Automatic
Defrag Solution
To Maintain Peak Performance”
Complete Transcript of
Rick Cadruvi – Diskeeper Interview
on Let’s Talk Computers
Host Alan Ashendorf
July 12 2008
|
Alan: Now, we know how important defragging our computer systems are; but what about external drives? How important is defragging them? Our guest today is Rick Cadruvi, Senior Software Engineer with Diskeeper Corporation. Welcome back to Let’s Talk Computers, Rick. Rick: Well, thank you very much, Alan. It’s good to be on. Alan: When we talk about internal drives, we know that that’s very important because it makes a major difference on the CPU and what kind of resources we steal from the computer. But, what about external drives? How are they affected by not being defragged, properly? Rick: They might even be a worse problem for fragmentation, because not only do you have the general problem of the speed of getting the data to and from the disk, because of fragments – but the file system on your local computer wants to pick up data according to the fragments, themselves. So, if you have a lot of very small fragments you end up generating a lot of requests in order just to retrieve the data. Alan: Let’s just take for instance a consumer’s computer and they have an external drive. In most cases they are going to be using what – a USB drive? Rick: Particular interfaces like USB, tend to be significantly slower than an internal drive that would be attached, let’s say, via a SATA buss. The interface, itself tends to be slower. If you have to make numerous requests to get the data, then that also affects your ability in terms of processing speed and in terms of the amount of resources the system has to use on the busses, just to get the data. Alan: So, what you’re saying is that if we can cut back as many times as we need in order to access the hard drive, the faster our computer system seems to run? Rick: It not only seems to run, it’s actually faster running. The fact that we don’t have to do as many operations means that both the operating system, itself has a lot less work to do in order to retrieve the data and the data, once it starts coming from the drive can come all at one time into memory, usually through some kind of a direct memory access operation and therefore, the system just plain runs faster. The data is retrieved more quickly and therefore, the user’s application runs a lot faster than it otherwise would. Alan: Most people think when you’re talking about accessing a computer system or a hard drive for just reading purposes, that you’re going out to the hard drive and the hard drive is saying, “Hello, here’s where my information is,” and it’s going to be fed up very quickly. It’s not that way – there’s a lot of handshaking that is involved, isn’t it? Rick: Yes, there’s a lot of handshaking and in traditional hard disk drives, there’s also the fact that it’s a mechanical device and the heads have to be moved to the appropriate location on the disk. When you have fragmentation that means that those head locations can be all over the disk and that’s a fairly slow operation. They have to be started and stopped. Then you have to wait for the disk, itself to spin to the correct location. That’s something we call, “rotational latency.” That’s why the faster the drive in terms of RPMs or the faster it tends to be in terms of access, just because of the issue of rotational latency, waiting for the platter, itself to spin to where the head is so that it can actually retrieve the data. Alan: In the best possible world, when we ask the hard drive for a file it’s going to serve it up all as once piece, without stopping the read access; it’s going to start reading and just read until it gets to the end of the file and all of that’s going to be streamed to us. But that doesn’t happen, does it? Rick: No, for various reasons. Some of that has to do with fragmentation. Wherever the file is entirely contiguous there are limits in terms of how much data can be transferred at one time from the hard drives. So, there are multiple operations in play and if the data on the drive is at a contiguous set of locations and the file doesn’t have to be served from fragmented places across the drive, then the opportunity for each of those transfers to happen, quickly is a lot greater than if it’s a bunch of small transfers that have to be made from disparate places across the drive. Alan: Talking about “contiguous,”– that’s a word that a word that a lot of people don’t really fully understand. Can you explain it to us? Rick: When we talk about a file being contiguous, we mean that as far as the disk drive is concerned that all of the data, starting with the very first piece of data of the file to the end of the file are in blocks that progress sequentially from one block to the next to the next and they are not split to different places, surrounded across the disk. If you think of it as like a chessboard – if all of the chess pieces are right next to each other then that would be effectively a contiguous file. But if you have pieces that are put from one place to another to another and you have lots of gaps in between them, that’s what fragmentation is like. You have to collect all those pieces. If you just take that same analogy and you see the distance from one piece to the next when they are standing to each other on a chessboard, as opposed to scattered all over the board, that’s the same kind of effect in terms of the distance and in essence, the time and resources required to gather that data when it is non-contiguous. Alan: But, a lot of people have the misconception that if I’m going to organize my hard drive and I can see the folders, I can put the files that I need to access quickly into the same folder, so therefore, when the hard drive reads it, it’s all going to be read at one time. That really doesn’t work, does it? Rick: No, because when people look at their folders on their drive they think that that’s how the drive is laid out. They are sadly mistaken. All of that is managed by a layer of software in the operating system, called a file system. The file system’s job is to make a raw disk drive that just has blocks of empty space, “magnetic storage,” into something that can be viewed by an end user as in a layer of folders. All of that is done a large number of data structures and things that allow the file system to say, “When I want file a” - file A is actually located in blocks 100 through 200 on the disk. And in order to do that they use things called, “directories” or “folders,” which are just a table of information that then tell the files system where the “metadata,” is or the data that is necessary for the file system to figure out where that data is stored on disk. So, if I want file A out of folder B, the file system first has to go out and find folder B, read its data off the disk and then find where file A is located so that it can read that. There’s no relationship between what it looks like on your screen in terms of folders and where it’s stored on the disk drive. Alan: And a lot of these new hard drives have what they call, “external buffers,” and that helps when it’s being fed to the hard drive or when the hard drive is reading, but it has nothing to do with file placement, does it? Rick: No, nothing to do with file placement at all. What that does help with, especially in terms of reads is they will do a certain level of caching and if you’re reading the exact same data over and over again, then it will help some with speed. Those buffers are fairly small, relative to the size. You look at the typical say, a 500 gig disk drive and it has what may be an 8 or a 16-megabyte cache. So, it’s a very small amount of data relative to the total disk drive and the file systems also tend to do a certain level of caching. That does help when you’re reading the same data; but as soon as you want something that you are not currently looking at, that has to go out to the hard drive. The likelihood that that’s going to be in any kind of a buffer is almost zero. Alan: First, when we were talking about external hard drives, we had USB 1.1. And then they came out with USB 2.0. How much difference in speed was that? Rick: Oh, huge - it was at least on order or two orders of magnitude more speed for USB 2.0. Now, they’re looking at USB 3.0 for the future, which is yet another order of magnitude of speed. So, if you’re running USB 1.1 you’re definitely inhibited by the speed of the buss, itself. USB 2.0 is fast in relative terms, but it’s nowhere near as fast say, SATA would be on an internal drive. Alan: When you talk about SATA or in this case because it would be external, it would be called eSATA, (external SATA drive), what is the difference in speed between, say USB 2.0 and eSATA? Rick: It’s easily in order of magnitude. If you’re talking SATA in general, SATA 2 can run at 300 megabytes per seconds. USB 2.0 is significantly less than that. Alan: Well, didn’t they come out with a type of format called, “FireWire?” That was supposed to replace the USB 2.0. For a while it caught on and a lot of people turned to that, but then it kind of lost favor, didn’t it? Rick: In fact, interestingly enough, even Apple, who is the company mostly behind FireWire, has started using it less and less. I think that it just didn’t catch on as fast and it didn’t have the cost-effectiveness from a manufacturing viewpoint. And USB 2.0 is pretty close to the same speed as FireWire. Alan: Well, even though eSATA drives are extremely fast – I mean they are basically the same speed externally as they are internally. It still makes a major difference by having a very clean defrag system, doesn’t it? Rick: Yes, because you still have to get the data to and from the drive and the file system does that according to where the file is laid out on disk. So, if it’s fragmented, the file system has go to retrieve a bunch of little pieces. If it’s defragmented, then the file system can go and just say, “Give me a whole bunch.” And while that disk data is being transferred in one big burst it ends up using a lot less resources for the system in general, just as one big burst in and the data becomes available much quicker. Alan: When I’m setting my external device, whether it’s a USB 2.0, or an eSATA, which is more efficient, an NTFS format or a FAT32 format and why? Rick: NTFS is much more efficient in terms on a file system and it has to do with the sophistication of it. FAT is an old architecture that is very simple. It’s a real simple way for you to retrieve data. It’s really easy; you can write fairly trivial programs to go out and actually find the raw data on a FAT device. But, NTFS is much sophisticated in its ability to manage the data on disk and as a result of that it’s much more efficient from an overhead system and in terms of the ability of the file system to manage large amounts of data. If you have a small amount of data, then FAT is not too bad. But, if you have a very large amount of data, FTFS does a much better job of managing it. Alan: But don’t we pay a price for using the NTFS system as if it’s not organized properly? Rick: Even FAT can suffer from fragmentation. You have that problem on file systems in general. Yes, you can pay a price if it’s not well organized. But, NTFS is quite flexible and as long as you defragment it, it’s very efficient. NTFS is better capable of managing large sums of data. The directory’s tree structure is a tree structure and therefore, much better able to manage large directories. Metadata within NTFS is capable of handling files that enormous in size and are scattered across vary large disks. You also don’t have to have nearly the size of the cluster size. With FAT, you have a limited number of clusters on the disk, which is how a file system views data. Data on a raw disk is viewed in blocks or sectors. Actually, there are things called sectors. File systems tend to group sectors together and then use them. We call those things, “clusters.” That’s how we tend to access it. Because FAT is very limited in terms of the number of clusters on a disk, you end up wasting a tremendous amount of space also in FAT. Where NTFS can take cluster-size down, its default is 8 sectors, but it can take it all the way down to a single sector. Alan: Well, I think anybody that has a computer system has to have an automatic defrag solution and if they want to get Diskeeper, where would they go? Rick: They should go to http://www.diskeeper.com. Alan: Rick, we’ve run out of time. We didn’t even get a chance to talk about another type of external device that is getting to become very popular – that’s Network-attached storage (NAS) and storage area network (SAN). And we will have to talk about those the next time. Rick: Well, thank you very much, Alan. It’s been a pleasure. |
||
Let's Talk Computers ® ranks as one of the longest running radio computer talk shows, distributing up-to-the minute computer information since 1989. Produced in Nashville, Tennessee, USA, At present, it is broadcast via radio in 7 states: Tennessee, Alabama, Kentucky, Illinois, Indiana, Texas and New Mexico and on the world wide web via the Internet.
Hosts Alan Ashendorf and Sandra Ashendorf interview representatives from the computer industry about products and industry trends. Guests have included representatives from Adobe, Microsoft, Novell, IBM, Lotus, "PC World", Seagate, Citrix, Compuserve, Computer Associates, Corel, Symantec, "MarketWire", Ziff-Davis and a host of other companies. Our goal is to let you know what is happening in the computer industry.
Let's Talk Computers makes every effort to evaluate the products and services it showcases in real-world situations by applying solutions to real-life problems. Let's Talk Computers uses the expertise and facilities of Total Solutions, Inc. to assist in that regard. This helps to insure that, unlike many other so-called reviewers, we have actually used the products that we talk about.
Listeners of Let's Talk Computers shows for this week and last week are eligible to enter postcard drawings to win valuable prizes. Listen to these shows to see how you can enter to win.
If you can't receive Let's Talk Computers in your area, contact the program manager, Jim McClurg, about broadcasting the show on a radio station in your area. He can be reached via email, or by paper mail at Let's Talk Computers, 488 Saddle Drive, Nashville, TN 37221.
Let's Talk Computers ® is a trademark of the Let's Talk Computers Tennessee partnership. Any use of this trademark without written authorization is strictly forbidden and a violation of state and federal law. All other brands or product names are trademarks or registered trademarks of their respective holders.
Web Hosting and Internet Access Provided By ISDN-Net