Latest news for arava 20mg pills $154.00
I was asked by the editors of the quarterly genealogy journal to write a piece for them about the genesis and development of LeafSeek, and I'm pleased to say that it was published as the lead article in their spring issue. Here's the text of the article, and I put together too.
From AVOTAYNU Volume XXVIII, Number 1, Spring 2012
Introducing Leaf Seek: A Free, Open Source Genealogical Search Engine in a Box
by Brooke Schreier GanzLeafSeek, developed by the author for the All Galicia Database, won second prize in the 2012 Developer Challenge, a programming contest sponsored by the RootsTech genealogy conference, held in February 2012, in Salt Lake City, Utah. This article will be especially useful to genealogists with some knowledge of how to build websites and web applications --Ed.
The ProblemIs there ever too much of a genealogical good thing? For the non-profit Jewish genealogy special interest group (SIG) Gesher Galicia, the answer, for a short time, actually was "yes. " Members of the group had worked hard for years to capture and collate any information about ancestors from the former Austrian province of Galicia, today southeastern Poland and southwestern Ukraine. After the Soviet Union fell and Eastern European archives opened to Western researchers, new information about Galician towns and families began to circulate. Suddenly, Gesher Galicia, through the work and generosity of its members, found itself the keepers of new spreadsheets full of data of all types, sizes and sources-birth records transcribed from the Central State Historical Archives of Ukraine in Lviv; old, out-of-copyright Polish telephone books from library collections; early 20th-century tax records copied from the private files of a Chabad rabbi in Ukraine; landsmanshaftn records obtained from the YIVO archives in New York; Holocaust files from Yad Vashem; and more. This influx of new information was wonderful, but almost overwhelming, like a downpour coming after a drought. Clearly the new data sets must be shared, preferably over the Internet, but how? The obvious answer was to turn the various spreadsheets into searchable online databases--a long-awaited All Galicia Database. For various reasons, some technical and some procedural, simply putting all the data on JewishGen did not work out initially, so the group decided that Gesher Galicia would assemble all the varied source materials as a new stand-alone database, hosted on its own web server. As a web developer and programmer for various major media companies-and a genealogist with mostly Galitzianer ancestors-the job to accomplish this task fell to me. The solution eventually crafted has been online for almost a year now as the backbone behind the new All Galicia Database (AGD), located at , where more than 190, 000 records containing about half a million names are freely accessible to all. It was not enough to simply make a database; however, I decided to make this underlying software available to all, so that others who wanted to create archival records databases would not have the same problems we had faced. I released the code as "open source" meaning that others can edit and change the software as much as they like, to suit their own needs, rather than being restricted by a typical copyright. I also eventually realized that I could not keep referring to the project as "that code thing I built for that database, " and so I named the software "LeafSeek, " as it allows one to search for a leaf on a family tree. I built a new website at www. leafseek. com providing a feature list, basic documentation and installation instructions, a link to download the software, and a blog about the project's progress.
Development ProcessGesher Galicia wanted to establish a new, stand-alone genealogy database, but could see no obvious way to do it. First we considered Steve Morse's easy-to-use One Step Search Tool creator which could turn Gesher Galicia's individual spreadsheets of data into individual searchable databases. This tool would have been a good choice if only a few data sets existed, but the SIG had more than 40 different data sets, with many more coming steadily, especially once Gesher Galicia's multi-year Cadastral Map and Landowner Records Project began at the archives in Lviv. The SIG would have been left with many different and separate One Step Databases to maintain. Furthermore, some of the individual data sets had multiple surnames or given name fields, and the One Step Search Tool would have created databases in which each potential surname field would have to be searched individually, rather than all at once. For example, a single data set of marriage records for one town might include many different family names: the bride's maiden name, the groom's family name, the bride's mother's maiden name, the groom's mother's maiden name, the witnesses' surnames, and so on. The One Step Tool limitations meant that a user would need to search each of these surname fields separately within each individual database-and then search all the other databases as well, one by one.
Gesher Galicia wanted to establish a new, stand-alone genealogy database, but could see no obvious way to do it.We needed one over-arching and comprehensive way to search all the data sets already in hand and all the data sets soon to come. In addition, the system would need to recognize what a surname field was, and that each individual data set within the database might have many surnames-and to search them all simultaneously. Ideally, the system also would treat the surname field differently from a field that held less important data, such as occupation or a note. The same was true for other key fields such as given names, towns or years. Each type of data needed a different treatment. In addition, a single database system serving multiple data sets could crystallize another bit of information for a user: the fact that those data sets mayor may not be connected. For example, a data set of Town X's death records and another, contemporaneous data set of Town X's cemetery records ought to be searchable and viewable simultaneously-because, obviously, many of the people in one of the record sets also would appear in the other data set. Segregating each data set off into its own little box would make it harder to see how the records might or might not interconnect across a community. The problems seemed well-defined, but the solution was not. Surely, I thought, other people must have run into this problem when publishing their own archival databases over the years, but the more I searched the web for solutions, the more no obvious solution was forthcoming. Plenty of products, tools and services are available for genealogists to use to publish personal family trees online- but no products, tools or services for genealogists to publish record collections (the forest, rather than the tree, one might say).
Enter Apache SolrEventually, I stumbled across Apache Solr. The Apache Software Foundation is a well-known, non-profit group that nurtures and distributes free, open source software of various types. It is most famous for the Apache HTTP Server, which has been the most popular web server on the Internet for more than 15 years, but it also hosts and cultivates almost 100 other software projects. One of these is an enterprise search platform, a system for building reliable business-grade search solutions, called Solr, which adds extra web-based functionality to an older Java-based Apache project called Lucene. Solr is used primarily by large businesses to search massively large data sets, usually of a company's intranet documents or consumer products. For example, the huge online shoe store Zappos, uses Solr to index its holdings. I also strongly suspected that some of the world 's largest genealogy websites (e. g. , the Mormon/LDS Church's FamilySearch, Ancestry. com, and the British genealogy company brightSolid's new British Newspaper Archive) used Apache Solr or its major component, Lucene, as the backbone for their genealogy search engines-a hunch later verified when I attended the RootsTech conference in February 2012. One of the nicest features of Apache Solr (in addition to the fact that it can handle large data sets with ease) is that it allows for something called "faceted search. " This is a way of seeing what pieces of data are related to the results of a search query, and it allows a user to dig deeper within the results to find the exact bit of needed information. For example, on the Zappos site, when one initiates a general search for something like "brown shoes, " the left side of the screen shows what types of data are connected to the search results thus far. From' there, one may click on the "facets" on the left side of the page, such as "10. 5' (a shoe size) or "casual" (type of shoe) or "$100 and under" (price) to narrow a search for exactly what the user wants. As one adds more "facets" to a search, they are [arava 20mg pills $154.00] added as values at the top of the screen and can be removed if desired. A buyer could chose to search for the long phrase "brown shoes size 10. 5 casual under $100" in the search box at the very start. but by choosing a general search term instead and then narrowing the search facet by facet, the buyer obtains a much better picture of what is available. For example. certain brands may entirely disappear as options for purchase once the "10. 5" facet is chosen, because most U. S. brands do not manufacture a women's size 10. 5 (they either stop at 10 or else skip up to 11). So for this hypothetical customer and her specific search needs, being able to facet this massive database at "10. 5" is a key feature. The fact that the Zappos website offers searches by size saves this customer considerable time and disappointment when shopping for her big feet--and Zappos gets more of her business as a result. Similarly, genealogists want the ability to focus on search results exactly tailored to specific needs, especially unusual needs. This is especially important because genealogists often find themselves working from a small amount of known data that may be incomplete. One researcher might want the names of every baby born in town X with a mother named Rivka from town Y, and yet not know basic information such as the baby's surname or year of birth. Perhaps a researcher wants to see a list of every surname that is somehow associated in available records with a different surname that already has been researched in order to find interconnections between families. Or maybe someone just wants the records mentioning Rivka whose original record books are stored in a specific archive location. Database designers cannot know what kind of "slice-and-dice data" a specific researcher might need. A database that could respond to these kinds of unusual problems would be more useful than one that was more traditional-and more limited. Thus, Apache Solr looked like a good fit for the proposed new All Galicia Database. The fact that it was free, open source software certainly did not hurt either. This was not just because "free" meant "the price is right" for a small, non-profit genealogy group, but because "free" in this case also means "free to edit, change, tinker and redistribute, " the "open source" meaning of the term. No one may freely change or redistribute a program such as Microsoft Word, or database "solutions" from many traditional and proprietary software vendors. The danger in using one of these products lies not only in the lack of customization options, but also in the very real possibility that a vendor may go out of business, leaving a lovely new database without any hope of technical support, security updates or new features in the future. Using an open source solution like Solr, that has a vibrant developer community and releases new versions every few months, would help ensure that if I spent the time and effort to use Solr as a genealogy database solution, it would be a little less likely to crumble into uselessness.
Making Sense of the Solr ResultsSolr on its own only holds and indexes data and it spits out results that are not easy for human beings to read. (See screenshot to the right for an example of what this not-veryuser- friendly format looks like. ) Gesher Galicia also needed software that could look at the Solr results and format them into a useable website. For this need, I decided on a new open source software called Solarium, which forms a bridge between Solr and the programming language PHP. Solarium takes the raw data held in Solr, turns its results into PHP values and formats it nicely for a person using a website. This combination of Solr and Solarium made the new website workable, but it still looked extremely plain. While Solarium allows data to appear on the screen of a person using a web browser, it just does so in one long list of plain text attributes for each record. No differentiation is made between the important aspects of the records (such as names and dates), the less important aspects (e. g. , an occupation, a note, a school class name), and the finding aids for that record (page number, line number, microfilm number, book number, repository name, or other data). Human intervention was needed to define what was more or less important and to make that apparent in a visual medium such as a website. Although I am not a graphic designer or user experience engineer, I knew that a database's usefulness is not measured merely by the kinds of records it can produce from its depths, but also includes good design choices that can help a user make sense of the relative importance of various bits of data. It would not be sufficient to build a new archival database with all sorts of bells and whistles and fancy search functionality if a naive user could not easily use it, nor understand what to make of the records he or she received. I decided that the most important aspect of each record must be the primary name or names, and, consequently, I prioritized that data, displaying it in a much larger font as the first line of each record, the first thing a user sees when he or she browses the list of search results. Next, I added the capability for the software to determine if a parent, two parents and/or a gender for each person was mentioned and if so, to add a note such as "daughter of so-and-so" or "son of so-and-so and such-and-such, " or else "child of so-and-so, " if the gender was not specified in the record. I did the same for spouses, if the spouse's name was provided in a particular record. Other key information to emphasize for each record was the primary year of the record, which could be listed instead as the year of the event if "year of event" was not a blank field in the database (for example, in a delayed birth record situation) and the type of record, such as birth, marriage, death, divorce, landsmanschaft, school and so forth . In the default page layout and design included in LeafSeek and also used in the AGD, these two bits of information appear with a colorful background, as little " labels" on each record. At this point, we had the primary name(s) of the record, the primary year, and the type of the record, all visually highlighted as important through text size and color. Still, scroll ing through all the data for the rest of each record's fields would have been difficult for a user trying to scan quickly for familiar names or other information. To address this situation, I hid most of the rest of the data within a "click to expand" system. Finally, at the very bottom of each record, visually separated out through the use of lighter grey text, is the text that lists any finding aid information, such as a page number or the repository name. I now had a back-end Solr web server and a front-end PHP system to display the eventual data. Now came the fun part: putting the genealogical data into Solr. By now, Gesher Galicia had several spreadsheets: each spreadsheet representing a transcribed record set, and each row of that spreadsheet representing a single record. One spreadsheet might hold a roster of students from a Galician school from the 1930s; another, the early 20th-century membership list of a landsman shaft in New York for a Galician town whose 19th-century vital records had not survived. In other words, the files had not been created in a standardized format. They needed to be checked for consistent standards, formatted and turned into comma-separated value (CSV) files, one by one. Generally, Gesher Galicia's data normalization standards are much the same as those for submitting a spreadsheet to a database such as JewishGen- with one difference: the SIG encourages transcribers to add as many new columns as needed, if warranted, to capture the data source's actual data correctly and completely, rather than to try to fit all data into an existing standardized template. As a result, some of the data sets, such as a marriage data set, might potentially have more than 20 columns of data, while others, such as a tax list, had perhaps 5 columns. Fortunately, Solr does not require a consistent data template from one data set to another data set.
How LeafSeek Works; Major FeaturesLet us move from the very technical to the very broad: food. Imagine that a genealogy database is like a turkey sandwich. We have the important part of the database, the literal and metaphorical "meat" which is the actual data. We have a nice method of conveying that messy data to a user in a cleaner and prettier manner-the bread, or the user interface. But a turkey sandwich is rarely just turkey and bread. It really needs some condiments to make it tastier: some lettuce, a slice of tomato, a little mayonnaise and so forth. Similarly. a database is rarely just the raw data and the user interface; it needs some nice features to make the experience more pleasant and useful to the user.
One feature is the ability to search through nicknames, alternate names and even kinnui.LeafSeek's code, thanks to Apache Solr's power, is full of that lettuce, tomato and mayo. Some useful features in the All Galicia Database enhance the meal--er, the search. One feature is the ability to search through nicknames, alternate names, and even kinnui (secular names)- a first, I think, for a Jewish genealogy website. Thanks to Solr's synonyms. txt file, a simple text file that is easy to edit; a search for a given name such as Rivka can also yield results for Rivca and Rebecca and Beckie. Better yet, a search for Yehudah can also bring up kinnui such as Loew or Leib; a search for Tzipporah may find the calque Feige; a search for Jenta can find not just Yente but also its diminutive form Yentl; and so on. The associations can be as broad or as narrow as the administrator editing the text file thinks are appropriate. Furthermore, this synonyms. txt file can and should be edited to be relevant to the focus of a specific database. Thus, if LeafSeek is used with a database of Romanian Jewish names, perhaps add Marku and Mordechai as possible matches; for a Russian Jewish database use Motel instead, and with a Mediterranean Sephardic Jewish database, choose Mordocheo or Marco. For more information on Jewish given names and kinnui, see the online JewishGen InfoFile on the subject, written by Warren Blatt and based on two previous articles, one in AVOTAYNU, Vol. XIV, No. 3, Fall 1998, the other in the Avotaynu Guide to Jewish Genealogy (2004).
Other LanguagesAnother nice feature already built into the current version of LeafSeek is the ability to translate every piece of text in the user interface, meaning the actual part of the website with which people interact. Users will be able to read the words of the website, though not the raw data (yet), in the language of their choice. Why translate all the text pieces of a website, since a user already can use a free online translation program such as Google Translate? Two reasons come to mind. First, the "creative" translations those programs can return, especially if the website content is somewhat technical or specialized, can be mind-bogglingly bad, when not unintentionally funny. Second, how can any genealogical organization hope to communicate effectively with a worldwide population if it puts all the burden of communicating on the non-English speaking user- not only somehow to discover the website in the first place, probably by purposely using English language terms in a nonEnglish language search engine, but then to browse page after page of muddled auto-translated text? For example, many Galitzianers immigrated to South America in the 19th and 20th centuries; today, in the 21st century, their descendants likely number in the many tens of thousands. To effectively build a bridge to them and to their heritage (and Gesher in Gesher Galicia does mean "bridge") why not offer the AGD in a natively translated version of Spanish? For that matter, why not Polish or Ukrainian for the many Poles and Ukrainians who are curious about the histories of their towns? Why not Hebrew for the many Israelis who visit the AGD from Israeli Internet service providers-between 10 and 20 percent of its visitors within the last four months? For these reasons, the AGD will launch a new fully translated multi-lingual interface in Summer 2012. It will start with five major languages, and expand, as needed, throughout 2012 and 2013 . Now that the code infrastructure is in place with LeafSeek, it will just be a matter of polishing up the text translation files in a few spots. English speakers probably will not notice anything different except, hopefully, more interest and activity on mailing lists from genealogists from a greater number of locations. Anyone who uses LeafSeek will be able to use this new ability to natively translate their site's interface into new languages by editing simple files. LeafSeek also can search for accented and non-Latin characters in records using plain unaccented text searches. A search for Jozef will find Józef, and vice versa. This allows record transcriptions to adhere as closely as possible to an original data source. By the end of 2012, LeafSeek plans to add a feature to index records originally recorded in the Hebrew or Cyrillic alphabets; users then can search for a name recorded as Шейндл with a Latin alphabet search term such as Sheindel. The searches will be approximate and phonetic, not exact, but they are one step closer to the ultimate goal of having a genealogy search engine system that, wherever possible, stays true to the original records without too much intermediary transliteration or translation.
Mapping and Geospatial SearchesOne of my favorite features of LeafSeek is the built-in ability to make "geospatial searches"- to search for a record that is only within a certain chosen radius of a specific town. Currently a search of the AGD for the name Kahan yields 535 individuals. Add an optional parameter specifying only records from locations within 50 kilometers of L'viv, and the list is whittled down to 60 records. The nice part about doing a geospatial search is that the researcher need not worry about searching for a record by its town name, nor know which spelling of that town name was in use at a given time, or which other little towns were nearby where an ancestor might have been hiding, or have to remember the latitude and longitude of the town. Instead, he or she just picks a center point from a drop down list pre-populated with towns that are relevant to the database's area of focus, specify the radius size, and hit enter. Apache Solr understands latitude and longitude, and, therefore, does not need to know where a border was drawn or which sovereign controlled a town on a specific date. Also relevant to the subject of mapping is the integration of LeafSeek with Google Maps. The maps work with a LeafSeek database in two different ways. First, each individual record can be "tagged" in Solr with a primary location, which can then display a link to a Google Map of that location, using its modern name. So, for example, records from the Żółkiew Jewish Death Records (1855-70) data set within the AGD will have a link within their records that, when clicked, up pops a small Google Map displaying the modern name and location of that town, which is Zhovkva, L'vivs'ka oblast, Ukraine. The pop-up map will not appear for every town mentioned in a record, such as a parent's birth town, only for the primary location of the record, but it provides a quick look at the location of the record the user is looking at, right from the actual search results, without having to open up another web browser tab for a mapping site. (With a little work, a user can switch this feature to another map provider like Microsoft Bing or OpenStreetMap, if desired. ) LeafSeek also automatically generates a web page devoted to the representation of all the towns and locations in the database, which features a Google Map with pushpins automatically placed in towns that have records coverage. Arava 20mg pills $154.00 this allows database users to see easily what areas of the database have decent record availability, or conversely which areas of a territory need some more attention. For example, at the time of this writing, the AGD shows very little record coverage in the area southeast of Lviv, an area encompassing the Ukrainian towns of Berezhany, Burshtyn and Rohatyn. This helps the SIG know what areas need more targeted outreach so that they will be represented by at least some records in the AGD. In this instance, it means that when the SIG continues transcribing school yearbooks for Galician towns later in 2012, it will move two yearbooks from Berezhany to the top of the list, to work on filling in that hole in the map. Finally, the most widely used features for LeafSeek-the mayonnaise and mustard on the turkey sandwich- one might say is its wildcard feature and its Beider-Morse Phonetic Matching (BMPM) feature. Apache Solr allows developers to specify how generous to be with wildcard matching. For LeafSeek I decided to let users choose as many wildcards in as many different positions as they like. The wildcard designator is an asterisk, and it can match one or multiple characters. Users can search for given names and/or surnames with wildcards in any position of the name, including at the very beginning with no minimum number of letters required. Multiple wildcards may be used in the same search, such as s*w*z to match surnames as diverse as Szwarz and Schlomkowicz. Beider-Morse Phonetic Matching can be used to search for phonetically similar surnames. This is optional and can be enabled or disabled with a checkbox, but we recommend that it be left on. For example, my maiden name Schreier has only nine exact matches in the AGD, but it has 127 matches with BMPM enabled, including spelling variants such as Szreyer that even a wildcard search might not have found . Users are advised, however, to do either a wildcard search or a BMPM phonetic search, but not to combine the two techniques for the same surname, which can produce less-than-desirable results. Beider-Morse Phonetic Matching was not even a feature of Apache Solr before a few months ago. Earlier versions of Solr offered the option of using rudimentary sound-alike name systems such as Soundex, Caverphone and Metaphone, but it did not include systems that might be useful to Jewish genealogists, such as Daitch-Mokotoff Soundex or BMPM. Knowing that Gesher Galicia would want the ability to use a modern sound-alike method for its database search, and with the consent of BMPM creators Alexander Beider and Steve Morse, we underwrote the expansion of their original BMPM code into the Java computer language by hiring a British computer developer to do the conversion for us, thereby enabling this option for everyone else in the future who might want to use BMPM in a Java-based program of any kind. The Java version of BMPM was added to yet another Apache program called the Apache Commons-Codec pack, which will be included in Apache Solr software distributions starting with version 3. 6. We hope this will mean that many other genealogy platforms that already make use of Solr, such as the previously mentioned Ancestry. com or other commercial providers. might be able to start using BMPM in their systems in the future, should they desire to do so. To our knowledge, however, the AGD is the only Solr-based system currently using BMPM.
The API: Bread-Free Web ResultsThe final key feature of LeafSeek is one that is the least commonly understood but perhaps the most important: the API. As noted earlier. Apache Solr serves up its content in fairly ugly JSON or XML format, and it is up to Solarium to translate that code into something nice and easy for the average user to understand. To reiterate the sandwich metaphor, Solr provides the data, which is the turkey, and Solarium and PHP and all the rest of the front-end code provides the user experience, which is the bread, and the other features previously mentioned are all nice-to-have condiments. But, what if you are afflicted with celiac disease or gluten intolerance and simply can not eat the bread at all? What if you have to, or just want to, consume the data, or turkey, with a fork? In fact, this is very much how a computer thinks; a computer does not need a shiny web interface, it only cares about the data itself. The fork then would be a metaphor for something called an API, or Application Programming Interface. It allows for the consumption or exchange of data between two computer systems without the need for a front-end (bread) at all. Here is why an API is such a nice feature. Consider a genealogy website that contains Galician records, called the All Galicia Database (AGD). Now consider a second genealogy website that contains scanned phonebooks and directories and other out-of-copyright historical documents called Genealogy Indexer (GI), , run by Logan Kleinwaks. With an API powered by LeafSeek, the AGD can share some of its search results on the Kleinwaks website, but without Kleinwaks actually hosting the AGD data. This means that a person searching the Genealogy Indexer for a certain surname from Galicia also can be notified at the bottom of the page of other records that match his search on the AGD, without ever having to leave the GI site at all. Kleinwaks has built an API for his site that will be featured in the AGD search results, too, as a "You Might Also Like. . . " section at the bottom of the AGD search results pages. We hope to have these complementary APIs up and running on both our sites, allowing users to see results from one site on the other site (but clearly marked as such) perhaps even before this issue of AVOTAYNU has even gone to press. Any websites running LeafSeek to power their search engines will easily be able to share some of their results with third-party sites, without actually having to share the underlying data. For Gesher Galicia, this means that many other sites will soon have the ability to embed live search results from the AGD within their own site or as an adjunct to their own search results, simply by adding a few lines of code to the bottom of their web pages. In fact, the LeafSeek API is arava 20mg pills $154.00 probably how data from the All Galicia Database will be shared with the main JewishGen website in the near future.
SummaryI have tried to explain LeafSeek, the totally free "genealogy database in a box" system that lets genealogists share genealogical or historical records online as searchable interconnected databases. What is next for LeafSeek? Since it is officially open source software, I welcome anyone who wants to edit it arava 20mg pills $154.00, add to it, change it, or fork it (no, this is not another sandwich metaphor, the proper term really is "fork") to check out the code online and start playing around with it. In a talk entitled "Building an Open Source Genealogical Search Engine with Apache Solr, " I presented LeafSeek at the 2012 conference. Interested readers may view the slides from that talk on the LeafSeek blog. The last slides deal with reasons why I think the software is actually rather important, not merely cool new technology. Amid the plethora of for-pay genealogy website vendors that would gladly take in genealogy groups' records behind their pay walls to enhance their own financial position, it is nice to have an option, at least, of keeping records within an open source and freely available framework. I think that what the genealogy community really needs now is more open and communally governed tools, not just hot new products constrained by copyright and software patents, or web companies more concerned about a past or future IPO than about open access to records. More than anything, what I would love is for people, or genealogy groups, or historical societies, or libraries, or students, or whatever and whomever, to use LeafSeek to turn their own collections of records into freely shared online databases. I hope the software will in time become more widely adopted so that archival groups of all types can open up their information to the world.
Brooke Schreier Ganz is a web developer who lives in Los Angeles, California. Prior to her recent career move to fulltime mommyhood, she was the Senior Web Producer for the Bravo cable channel's websites, Lead Programmer for a well-known Warner Brothers entertainment website, and a web developer at Disney Consumer Products. She is vice president of Gesher Galicia for whom she designed and built the new All Galicia Database in early 2011.