When content is growing exponentially and doubling every eighteen months with massive redundancy as the norm- when application data silos continue to grow and regulatory requirements continue to grow and change when it is so difficult to locate a specific piece of information from the billions of objects and ?Enterprise Data Amnesia? results in loss of corporate intellectual capital?how do you capitalize on your business information? How do you transform data into relevant, accessible information that creates business value? Many companies are turning to information lifecycle management ILM- solutions to address these challenges. ILM spans all the technologies and business processes needed to optimize the information lifecycle and value chain: from information creation and capture through its management to delivery and ultimate deletion. It ensures that only the right people have access to the information, which they receive in a timely, secure, and auditable manner. A critical part of an ILM solution is an effective, scalable archive. This article describes considerations when planning an enterprise data archive for the purpose of storing, searching, and retrieving business information. The Right Information at the Right Time Archiving is about persistence and accessibility. It is the longterm storage and timely, accurate access and retrieval of information that is critical to a business. Archiving and its fast, efficient information retrieval plays a key role in regulatory compliance, legal proceedings, proper corporate governance, application performance optimization, and efforts to reduce the total cost of information management. Archiving must complement the ILM strategy, allowing organizations to capture and retain information, both structured and unstructured, in such a way that they can easily and quickly retrieve the relevant information. The objective: having the right information at the right time, in the right place, in a cost-effective, scalable way. Positioning For Growth The challenge with building any archive system is how to cope with millions, even billions, of unique objects. For example, if the average office worker sends/receives 80 emails per day, a 5,000- employee organization will accrue approximately 100 million messages per year. Each unique object in an archive needs to be indexed, enabling an end-user to find any object, the proverbial needle-in-the-haystack. Scaling an archive presents a further challenge. A system may be adequate the first year, but as the number of objects grows exponentially in subsequent years, it may struggle to index and retrieve information fast enough. Ultimately, the archive must have the ability to search and retrieve billions of objects in seconds and scale to hundreds of terabytes of data without performance degradation. Furthermore, to ensure ongoing compliance, it must store all records as tamper-proof, digitally signed, and timestamped objects managed according to set business rules. There are two basic options: independent servers and storage and an integrated archiving appliance. Independent servers and storage and discrete applications require a Lego-like approach to building an archiving system. These implementations require application-aware middleware to determine which information in the application can be archived based on policy. A filesystem or database is required to store this archived data. In order to find the data and search by contents, an indexing engine is required. This typically utilizes another database and additional storage. The ?Lego? solution also requires storage software and servers. HSM software is required to drive the storage hardware layer comprised of traditional SAN, NAS, or direct-attached storage, or a content addressable storage device. All of this adds to the cost of the total solution and requires different skills to manage the various pieces of this option. The combined solution requires integration of products and leaves the entire system vulnerable at several points. For example, the indexing engines and databases themselves are potential points of weakness?the path to the information itself may become corrupted or lost. Additionally, indexing or search and retrieval functions can have a hard time scaling in performance as the number of objects increase. Integrated appliance-based archiving solutions consist of software packaged on standard servers to create an applicationcentric active-archiving platform. Archived data is turned into information via content-based indexing while powerful search tools ensure unprecedented accessibility to information. Data is stored securely to mitigate risk while also assisting users in complying with retention regulations. The resulting solution can be described as a network-attached ?black box? for reference information. Linear scalability is provided without loss of performance by intelligently distributing terabytes of content across a grid of storage smart cells. Each time one of these ?black-boxes? is added to this distributed grid, both the storage and additional processing power is recognized and integrated into the grid, which is key when it comes to achieving high performance search and retrieval operations, the ultimate goal of active archiving storage platforms. With a simplified, integrated storage infrastructure, quick data access, and vast enterprise scalability, an archiving appliance can coexist seamlessly within the current application infrastructure and should require no changes to the existing architectures. Regardless of which option you pursue, the following recommendations will improve your results. What To Consider To reduce cost and complexity, select an approach that streamlines storage management by automating repetitive management functions such as migration, back-ups, and archiving. In this way business and regulatory requirements can be aligned with data retention policies while reducing the cost of storage management and optimizing storage space. Storage costs can be further reduced by selecting an archive solution that has single copy instancing, which removes costly duplicate data objects while making search and retrieval more ef- ficient. Duplicate data or objects can result from many situations, such as when an identical email and any attachment are stored in many users? mailboxes. Efficiently removing duplicate data requires advanced single instancing and filtering techniques. The archiving solution also should provide a single view of the information even though there are multiple users. This is done by presenting each user with a logical repository of archived records. Access controls and retention periods can be set per repository, making the same record available to different users/groups requiring different retention periods. For ease of use and ease of management, provide full-content indexing and use an interface that users already are familiar with such as a Web interface with Google-like search- for search, retrieval, and export of archived records. Also look for solutions that return the first part of the query before the whole search is completed to optimize response time. For example, return a search result set that lists metadata about the object and offers a preview pane through which users can view more information about the object. For business continuity and availability in an increasingly ?always on? world, make sure the archive has the ability to replicate multiple copies of the data both locally and remotely to ensure the highest levels of data availability. You want to neither lose valuable information nor delay access. Finally, don?t forget new information access channels. Select an integrated solution that delivers the right information in the preferred format to the right destination, whether that is a cell phone or PDA, a desktop, laptop, corporate TV, other applications, the Web, or even a fax machine. Such a solution will eliminate business disruption due to information dissemination failures. It should, however, include global administration, control, and visibility to ensure a secure, end-to-end delivery infrastructure. The Information Archiving Future That 80 percent of all information lies untapped in disorganized and unstructured content is a well-accepted, if somewhat overworn, statistic. Current business intelligence technologies focus mainly on the remaining 20 percent, the structured content in data marts and data warehouses. However, expect to see unified information access technology emerge to deliver converged access to both structured and unstructured information. As we move to the next level of information demand?applications such as online speech recognition, widespread streaming video, distributed scientific processing, and a whole host of services we can?t yet imagine?we will continue to need new ways of archiving, searching, and retrieving. We will need smart classi- fication tools, extensible metadata management, and new storage technologies such as content chunking and routing technologies all of which are being actively researched. Today we might paraphrase Coleridge?s Ancient Mariner: data, data everywhere yet each bit is hard to find. Certainly this doesn?t have to be the case. As we move towards a world where information is abundant, ever growing and changing, we must begin to build archive solutions to store, search, and retrieve information so we can drink just the data we need, when we need it, wherever we are. Source: www.edocmagazine.com