As a
personal project I worked in the development of a system which search separates
articles from a web site. The system takes the most important words which describe
a concept amongst different writers and automatically the system builds an ontology (RDF graph) with all of the
concepts with the sentences and the articles where they were found.
For the system’s
development I used the python programming
language using for the selection of the most important words which describe
a concept, the Bag of words (BoW)
algorithm where I constructed a histogram with the words used in the
different articles and its repetition numbers. Each histogram element it was clustered using the k-means algorithm amongst “so repeated”,
“normally repeated”, and “not repeated”, filtering and only taking the words classed
as “normally repeated”.
Paris - France, February
2013
No hay comentarios:
Publicar un comentario