Altair line chart

Altair line chart DEFAULT

🤔 Graphing Kiva Data with Altair¶

Although making graphs with the turtle is a good way to learn about a few basics of computer graphics it is definitely not going to encourage you to make lots of graphs to explore your data. Exploring a new pile of data is something that every curious data scientist should want and need to do. There are many many graphing options available for Python programmers including:

  • Matplotlib

  • Seaborn

  • Plotly

  • Bokeh

  • Altair

Altair was designed to work in a browser and makes use of a very powerful concept that we won’t go into here called a grammar of graphics. Its pretty easy to use once you understand a few of the basics. The two key ideas to understand are “marks” and “encodings”

A mark essentially specifies the kind of chart we are going create such as a bar chart or a line graph or a scatter graph. In our version of Altair we support three different kinds of marks:

  • mark_bar() – draw this chart using bars

  • mark_point() – draw this chart using points

  • mark_line() – draw this chart using lines

An encoding allows you to declare links between data columns and “visual encoding channels.” You specify an encoding for a particular mark by passing named parameters to the method. We will get into more detail on named parameters later in the book. For now you can just enjoy the fact that they act link any other parameter but are in some ways easier to deal with because you don’t have to worry about the order, you can just use them in any order by their name.

Some typical encoding channels include:

  • x - what data should be used for the x axis

  • y - what data should be used for the y axis

  • size - what data should be used to set the size of the mark (especially for mark_point)

  • color – what data should be used to color the mark

  • shape – what data should be used to set the shape. Good for when you want to plot several things related columns of data together.

The mark and the encoding work together very well and allow us to create many different kinds of graphs without having to explicitly draw anything! In this way Altair is a declarative graphics package.

Altair assumes that you have data that you can organize into a table of rows and columns where each of the rows represents an observation and each column is something that you can observe or label. The data can be categorized according to several different types. These are bit more high level than the types we have been talking about with Python but you will probably recognize them. They are:

  • Quantitative – quantitative data is any numeric data it might represent a temperature or speed or a gpa.

  • Ordinal – ordinal data is used when the numeric value tells you something about the order of choices. Such as in a survey when you are asked to rank your satisfaction on a scale from 1 to 5

  • Nominal – nominal data is typically used to name things

  • Temporal – temporal data is data about time.

When you tell Altair what data you want to use for the x or y axes you will often want to give it a hint as to what type the data is. It can often infer that for itself but not always. When you do give it a hint you just append a short string to the column name consisting of a colon and the first letter of the data type. for example to tell it that the column category is nominal you should specify the column

The easiest way to learn Altair is through some examples. So let us look at a few.

Bar Charts¶

To make a chart in Altair you have to specify the data that you want to work with, how you want to mark the data, and how you should encode the columns of data with the kind of mark you have chosen.

Lets make a Bar chart. Here is a table of data:

customer

cakes

flavor

Alice

5

chocolate

Bob

9

vanilla

Clair

7

strawberry

Breaking this down line by line:

First we create a representation of the table for altair by calling . Here is an example of using named parameters. Note that the parameter name will become the name of the columns. This example shows you the data printed in tabular form.

Back to the original code:

On line 4 of the program we make a Chart. The chart is the holder of the data that we will mark and encode. You can also give chart an optional parameter to tell it the title of the chart.

On line 5 we tell the chart that it is going to be a bar chart by calling the method

On line 6 we tell the mark the encodings to use. We tell it that the values for the x axis should come from the customer column and to treat them as nominal data. That is great for a bar chart as the columns often do correspond to names. The values for the y axis will come from the cakes column. In a bar chart its natural to think of the values being proportional to the height of the bar. Which is exactly what Altair does for a bar chart. We are also going to color the bars (this is optional) using the value from flavor column.

On line 7 we tell Altair to display the chart.

You Try

  1. Change the values so they are all bigger by a factor of 10. Does the Chart automatically re-scale itself?

  2. Add more data to customer, cakes, and flavor to represent five more rows in the table and redo the graph.

  3. What happens if you change the columns for x and y ?

Did you notice anything interesting? Suppose the additional rows looked like this:

customer

cakes

flavor

Drake

10

chocolate

Emma

82

vanilla

Alice

70

strawberry

Emma

42

chocolate

Ginger

64

strawberry

Can you explain why the graph looks like it does? Just to be sure we are seeing the same thing, here’s the code for the two tables combined below.

Notice that it adds together all of the the rows for the x channel. This is just what you would want to if you wanted to show a total for a particular category. Such as Graph the total amount of money lent in each country. It also further shows the distribution – by color – of another variable within that category. Sometimes this is called a stacked bar chart. Just think of the work you would need to do to replicate this if you had to compute it all yourself and then draw it with a turtle.

Line Graph¶

Lets make a line graph, this is the kind of graph that you would typically see in a math book to graph a function. Let us first make some data to graph using the function $$y = x^2$$ We will graph it over the range of -10 to +10 for the x values.

What we are doing here is calculating the square of the integers from -10 to +10 and storing them in y_vals. You can even print y_vals to see that its just an ordinary list if you want.

On line 8 we make data, just like we did in or previous examples. You may want to add a line to see the data for the chart in its tabular form by adding a line after 8.

On line 10 we tell the chart that the mark will be a line.

One line 11 we tell the chart to use the column named X as the x values and Y as the y values. Notice that we don’t need to tell it what kind of data the columns contain as Altair will infer that both are Quantitative. It doesn’t hurt to add a ‘:Q’ at the end if you want.

You try

  1. Change the mark to instead of

  2. Change the mark to be a – Cool right, its like 3 graphs for the price of one!

  3. Change the data so that instead of calling the columns X and Y you call one and the other

  4. Add a color parameter to the encoding use either X or Y to specify the color value. Don’t give it a type and see how it looks. then specify that you want it to treat the column as nominal ‘:N’ and you will see how the color scheme changes. This gives you a lot of flexibility in how your chart gets colored.

  5. Choose a different function sine, log,

  6. plot $x(t) = 2 cos(t) + sin(t) cos(60t), y(t) = sin(2t) + sin(60t)$ over a range of t values.

  7. The equation for the “Butterfly curve” is Here can you generate the data and use Altair to plot it?

Scatter plot¶

Most often the mark_point option is used to make a scatter graph. Here is a well known data set that contains the measured number of chirps per second along with the temperature at the time the number of chirps were measured. The theory goes that if you are without your thermometer or weather app you can figure out the temperature by counting the number of times per second a nearby cricket chirps. Using the data given make a scatter plot. This is really just like the line plot we did above but using

Histogram¶

Here are 100 IMDB movie ratings. Lets figure out the distribution of these ratings by making a histogram. Do do this we will introduce a couple of new ways to enhance the encoding of our different channels. Recall that in order to make a histogram we divide up the data into bins and count the number of observations that go in each bin. We can tell Altair that we want our axis to be a binned axis but to do that requires more than just giving it the name. So we have an Axis object that we can use to help communicate this additional information. takes a parameter to specify the name of the column in the table to use and takes an option parameter that tells altair to group the data into bins. Now what about the y axis? Since we want the y value to represent the number of things in the bin we need to have altair count them. Altair supports a number of aggregation functions to help summarize groups of data. In the case of binned data we make the y axis a string of ‘count()’ Technically we don’t need an axis object to tell Altair to use count we could simply say

A few words of explanation for the bar chart may make it clearer what is going on here.

Hopefully everything up to line 8 will look pretty similar to you. but on 8 we are have to get a bit more fancy with our encoding. We are telling Altair that our X axis is going to use the ratings data and we are adding the ‘:Q’ to be sure it knows that it is quantitative. You can remove the :Q and it will still work fine. The key to making the histogram is to tell Altair that we are going to put the X data into bins. Just like you did in the last project, but you don’t have to calculate it this time, Altair will do the work.

On line 9 we tell Altair that the y values will be the of the things that are in the bins. If you specify a function like count you are telling it how to aggregate the values on the other axis.

You Try

  1. What happens if you remove the ?

  2. What happens if you switch the x and y axes?

Kiva Graphs Revisited¶

The final step for this lab is to recreate the three graphs we made with the turtle for the Kiva data. You should refer back to the examples we just worked through to help you figure out what to do.

Make a scatter plot of the number of donors versus the time to raise the money for the loan. Make the size of the circle correspond to the loan amount. Make the color corresponds to the country.

Make a bar chart that shows the total amount of money loaned in each country.

Make a bar chart that shows the number of loans in each country.

Make a histogram that shows the distribution of the loan amounts.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Post Project Questions

    During this project I was primarily in my...
  • 1. Comfort Zone
  • 2. Learning Zone
  • 3. Panic Zone
    Completing this project took...
  • 1. Very little time
  • 2. A reasonable amount of time
  • 3. More time than is reasonable
    Based on my own interests and needs, the things taught in this project...
  • 1. Don't seem worth learning
  • 2. May be worth learning
  • 3. Are definitely worth learning
    For me to master the things taught in this project feels...
  • 1. Definitely within reach
  • 2. Within reach if I try my hardest
  • 3. Out of reach no matter how hard I try
You have attempted of activities on this page
Sours: https://runestone.academy/runestone/books/published/fopp/Projects/graphing_with_altair.html

altair-viz / altair Public

Hello, Happy New Year
I want to add a couple of vertical lines to my chart but I don't know how could you please help me?
vertical lines for two different date e.x: for Dec 09 and Dec 20

How I increase the resolution of the chart image?
when I download chart image they don't have quality

Code :

highlight=alt.selection( type='single', on='mouseover', fields=['DOY'], nearest=True) base=alt.Chart(NO2).encode( x=alt.X('monthdate(Timestamp):Q', title='Date'), y=alt.Y('mean(T_NO2):Q', title='NO2 Khorasan (mol/m^2)'), color=alt.Color('Year:O', scale=alt.Scale(scheme='magma'))) points=base.mark_circle().encode( opacity=alt.value(0), tooltip=[ alt.Tooltip('Month:O', title='Month'), alt.Tooltip('DOY:Q', title='DOY'), alt.Tooltip('T_NO2:Q', title='NO2') ]).add_selection(highlight) lines=base.mark_line().encode( size=alt.condition(~highlight, alt.value(3), alt.value(3))) rule1=base.mark_rule(color='red').encode( x={'Timestamp': "2020-11-09"}, size=alt.value(5) ) rule2=base.mark_rule(color='orange').encode( x={'Timestamp': "2020-11-20"}, size=alt.value(5), ) (points+lines).properties(width=600, height=350).interactive()

Untitled

visualization

Sours: https://github.com/altair-viz/altair/issues/2379
  1. Unique woodworking projects
  2. Ryan mcmahon stats
  3. Mccoys minneapolis
  4. Sccm boot image
  5. Trussville veterinarian

Create Stunning Visualizations with Altair

Have you ever gotten frustrated after looking at your visualization in Python? Have you ever thought that it can be done better with less effort and time? If so, this post is perfect for you because I would like to share about the Altair library, which will boost your productivity and make your visualisations more appealing.

I suppose you already know how visualizationis vital for any analysis and how it helps convey and translate an idea to a wider audience. Also, visualizing data is one of the first steps to explore it and understand where to dig deeper. Therefore, I would like to focus on the basic grammar of Altair using a scatter plot and then share with you some examples of various graphs. Before that, let us talk about Altair and get to know why it is so powerful.

Altair is a declarative statistical visualization library, which uses Vega and Vega-Lite grammars that help to describe the visual appearance and interactive behaviour of a visualization in a JSON format.

The key idea behind Altair that you are declaring links between data columns and visual encoding channels (e.g., x and y axes, colour, size, etc.) and the rest of the visualization process is handled by the library. Thus, it gives you more time to focus on data and analysis rather than explaining how to visualize data [1].

Altair's components

  1. Data: DataFrame used for visualization

2. Mark: How would you like the data to be visualized (line, bar, tick, point)?

3. Encoding: How the data will be represented (positions for x and y, colour, size)?

4. Transform: How would you like to transform the data before applying visualization (aggregate, fold, filter, etc.)?

5. Scale: Function for inputting and rendering data on the screen

6. Guide: Visual aids such as legend, ticks on the x and y axes.

As for the mark component, you can use the following basic mark properties:

Let us get our hands dirty and learn Altair's grammar using a scatter plot.

Installation

$ pip install altair vega_datasets

The equivalent for conda is

$ conda install -c conda-forge altair vega_datasets

Data

I will be using the following Vega datasets:

  1. data.gapminder()
  2. data.stocks()
  3. data.movies()

Let's import packages and look at the data

import pandas as pd
import altair as alt
from vega_datasets import data

Step 1: Simple scatter plot

Chart() is a fundamental object in Altair, which accepts a single argument — a DataFrame. Let us look at a simple scatter plot using Chart(), mark_point() and encode() objects.

alt.Chart(df_gm_2005).mark_point().encode(
alt.X(‘life_expect’),
alt.Y(‘fertility’))

Step 2: Adding interactiveness

By adding interactive() object to a scatter plot we can make it interactive. Also, let us define the size of the bubbles with alt.Size() to add more information to the plot.

alt.Chart(df_gm_2005).mark_point(filled=True).encode(
alt.X(‘life_expect’),
alt.Y(‘fertility’),
alt.Size(‘pop’)
).interactive()

Step 3: Adding colour

We can change the colour of the bubbles by adding alt.Color() in encode() object. It is great that we do not need to worry about each colour for each country because Altair does it for you.

alt.Chart(df_gm_2005).mark_point(filled=True).encode(
alt.X(‘life_expect’),
alt.Y(‘fertility’),
alt.Size(‘pop’),
alt.Color(‘country’),
alt.OpacityValue(0.7)
).interactive()

Step 4: Adding more information

We can add information to each dot by specifying Tooltip() in encode().

alt.Chart(df_gm_2005).mark_point(filled=True).encode(
alt.X(‘life_expect’),
alt.Y(‘fertility’),
alt.Size(‘pop’),
alt.Color(‘country’),
alt.OpacityValue(0.7),
tooltip = [alt.Tooltip(‘country’),
alt.Tooltip(‘fertility’),
alt.Tooltip(‘life_expect’),
alt.Tooltip(‘pop’),
alt.Tooltip(‘year’)]
).interactive()

Step 5: Making plot dynamic

Already looks amazing for the 2005 year’s data. Let’s add a bar to change the data and make the plot dynamic.

select_year = alt.selection_single(
name=’Select’, fields=[‘year’], init={‘year’: 1955},
bind=alt.binding_range(min=1955, max=2005, step=5)
)alt.Chart(df_gm).mark_point(filled=True).encode(
alt.X(‘life_expect’),
alt.Y(‘fertility’),
alt.Size(‘pop’),
alt.Color(‘country’),
alt.OpacityValue(0.7),
tooltip = [alt.Tooltip(‘country’),
alt.Tooltip(‘fertility’),
alt.Tooltip(‘life_expect’),
alt.Tooltip(‘pop’),
alt.Tooltip(‘year’)]
).add_selection(select_year).transform_filter(select_year).interactive()

Step 6: Changing the size and adding a title

Lastly, let us change the size of the plot and add a title

select_year = alt.selection_single(
name=’Select’, fields=[‘year’], init={‘year’: 1955},
bind=alt.binding_range(min=1955, max=2005, step=5)
)scatter_plot = alt.Chart(df_gm).mark_point(filled=True).encode(
alt.X(‘life_expect’),
alt.Y(‘fertility’),
alt.Size(‘pop’),
alt.Color(‘country’),
alt.OpacityValue(0.7),
tooltip = [alt.Tooltip(‘country’),
alt.Tooltip(‘fertility’),
alt.Tooltip(‘life_expect’),
alt.Tooltip(‘pop’),
alt.Tooltip(‘year’)]
).properties(
width=500,
height=500,
title=”Relationship between fertility and life expectancy for various countries by year”
).add_selection(select_year).transform_filter(select_year).interactive()scatter_plot.configure_title(
fontSize=16,
font=”Arial”,
anchor=”middle”,
color=”gray”)

The final output looks great and we can derive various insights from such a sophisticated visualization.

Now, knowing the basics of Altair's grammar, let us look at some other plots.

Box plot

box_plot = alt.Chart(df_gm_2005).mark_boxplot(size=100, extent=0.5).encode(
y=alt.Y(‘life_expect’, scale=alt.Scale(zero=False))
).properties(
width=400,
height=400,
title=”Distribution of life expectancy for various countries in 2005 year”
).configure_axis(
labelFontSize=14,
titleFontSize=14
).configure_mark(
opacity=0.6,
color=’darkmagenta’
)box_plot.configure_title(
fontSize=16,
font=”Arial”,
anchor=”middle”,
color=”gray”)

Histogram

histogram = alt.Chart(df_gm_2005).mark_bar().encode(
alt.X(“life_expect”, bin=alt.Bin(extent=[0, 100], step=10)),
y=”count()”
).properties(
width=400,
height=300,
title=”Distribution of population for various countries in 2005 year”
).configure_axis(
labelFontSize=14,
titleFontSize=14
).configure_mark(
opacity=0.5,
color=’royalblue’
)histogram.configure_title(
fontSize=16,
font=”Arial”,
anchor=”middle”,
color=”gray”)

Bar chart

bar_chart = alt.Chart(df_gm_ir).mark_bar(color=’seagreen’,
opacity=0.6
).encode(
x=’pop:Q’,
y=”year:O”
).properties(
width=400,
height=400,
title=”Population of Ireland”
)text = bar_chart.mark_text(
align=’left’,
baseline=’middle’,
dx=3
).encode(
text=’pop:Q’
)bar_chart + text

Line chart

line_chart = alt.Chart(df_stocks).mark_line().encode(
x=’date’,
y=’price’,
color=’symbol’
).properties(
width=400,
height=300,
title=”Daily closing stock prices”
)line_chart.configure_title(
fontSize=16,
font=”Arial”,
anchor=”middle”,
color=”gray”)

Multiple scatter plots

mult_scatter_plots = alt.Chart(df_movies).mark_circle().encode(
alt.X(alt.repeat(“column”), type=’quantitative’),
alt.Y(alt.repeat(“row”), type=’quantitative’),
color=’Major_Genre:N’
).properties(
width=150,
height=150
).repeat(
row=[‘US_Gross’, ‘Worldwide_Gross’, ‘IMDB_Rating’],
column=[‘US_Gross’, ‘Worldwide_Gross’, ‘IMDB_Rating’]
).interactive()mult_scatter_plots

Altair is a great tool to boost your productivity in visualizing data, where you only need to specify links between data and visual encoding channels. This allows you to put your thoughts directly to a plot without worrying about the time consuming "how" part.

For more details please find

Thanks for reading and please do comment below about your ideas on visualizing data with Altair. To see more posts from me, please subscribe to Medium and LinkedIn.

  1. Overview page¶. Overview — Altair 4.1.0 documentation. (n.d.). https://altair-viz.github.io/getting_started/overview.html.
Sours: https://towardsdatascience.com/create-stunning-visualizations-with-altair-f9af7ad5e9b
Grammar of Graphics and its Interactive Cousin Altair by Eberhard Hansis

Data Visualization: A Walkthrough in Python with Altair

Overview

Teaching: 10 min
Exercises: 60 min

Questions
  • How can we take an existing visualization and build an improved version in Python?

  • Can we use the library Altair to design the figures?

  • Can we make our graphic interactive?

Objectives
  • Apply the visualization principles learned during the first half to a practical problem.

  • Familiarize yourself with altair

This notebook is a follow-up to the visualization walkthrough. Instead of matplotlib and seaborn, we are going to use the Python library altair.

Matplotlib has been enormously successful at making Python viable as a standard language for scientific computing and data analysis. In recent years, however, there have been new developments both in terms of computation and in the data visualization world, and alternatives have emerged.

As we’ve mentioned previously, d3.js is a very powerful library to develop interactive visualizations using JavaScript. Because D3.js is pretty labour-intensive (and requires you to know some JavaScript), groups have started developing alternatives and extensions, some based on D3, some not, to make interactive visualization design more accessible to non-experts. For an overview of the different options in Python, PyViz is a great resource to explore!

One important note is that many of the packages involved are still fairly young, and so the library and the syntax might change quite frequently. Some are focused specifically on dealing with very large data sets (e.g. DataShader), others don’t handle large data sets well at the moment.

So using any of these packages carries a bit of a risk: they might not be super well documented, or they might be missing features, or their interface might change over the course of a year or so. If you’re willing to take that risk, however, you can do pretty amazing things. It’s also fair to say that most of them are open-source projects, that is, they thrive around a community of volunteers that help improve them. That could be you!. When you find a bug, file an issue. When there’s a feature missing you really need, contact them (also via an issue or mailing list or whatever means of communication they use). Many of those communities of developers are friendly and very interested in your feedback!

Altair, Vega-Lite and Vega

In this version of the walkthrough, we’re going to use a library called Altair. Altair is also part of a larger eco-system of libraries, and based on two lower-level libraries called Vega and Vega-Lite. These two software packages, created at the Interactive Data Lab at the University of Washington, specify what is called a data visualization grammar. Like the grammar of a language, Vega allows you to describe the different components of visualization, as well as their relationships to one another and their relationships to the data you’re trying to visualize.

This type of visualization grammar, like others of its kind, has one important advantage over : it’s much less confusing and much more clearly structured! matplotlib was originally designed to mimic the plotting behaviour of matlab. But it’s not writte in matlab, it’s written in Python. So the developers essentially created two interfaces, one that looks like matlab, and one that follows more typically Pythonic structures. As a result, there are usually several ways to do the same thing, which don’t always play well together, and this can get pretty confusing!

One issue with Vega is that you have to specify everything: you have to tell it that there are two axes, one horizontal labelled “x”, one vertical labelled “y”, in which direction they point, what the scale for them is, how many tick marks it should have, etc. That gets very tedious very quickly if you just want to quickly make a scatter plot! On the other hand, if you’re trying to make a super specific custom visualization, that freedom can come in very handy!

So in order to make it easier for researchers to do standard things like bar chars and line charts, the developers created Vega-Lite, a much simpler interface on top of Vega that will automatically try to make intelligent choices for its axes.

So this is great, but both Vega and Vega-Lite require you to specify your chart in json. While json is great, it’s not the most readable of formats (and there are so many curly braces!!!). Altair is essentially a Python interface to Vega-Lite that allows you to specify Vega-Lite charts in Python. One cool thing is that you can always export both Vega and Vega-Lite json specifications generated from your Python code. This can come in super handy if you’re trying to customize your plot in a way that Altair doesn’t allow, but Vega does.

A Quick Note on Versions

Because Altair is a direct translation of Vega-Lite, it usually lags behind the most recent release of Vega-Lite by a few weeks. Keep in mind that when new features get added to Vega-Lite, they will not be immediately available in Altair, and so the two documentations might be out of synch.

Altair in your Notebook

You can follow the installation instructions to get altair running in your notebook, JupyterLab or interface of choice.

Important: Because of the notebook-to-markdown conversion, the interactive plots in this notebook will not be interactive on the website. In order to check out the interactivity, please

Once you’ve done that, let’s give it a quick test using one of the standard data sets and the code snippet from the Altair website:

png

Ideally, this should display a scatter plot of the petal length of different species of Iris flowers.

We’re now ready to get started with our own chart.

As a reminder, here’s the original data visualization:

original infographic

Some internet business terminology to keep track of (or not):

  • “ROI” = “Return on Investment”, i.e. how much profit you get back after spending some money on advertising
  • “SEO” = “Search Engine Optimization”, the process of tweaking your website so that it appears far on top in search engines like Google or DuckDuckGo
  • “PPC” = “Pay-per-click” is an internet advertising model used to drive traffic to websites, in which an advertiser pays a publisher (typically a search engine, website owner, or a network of websites) when the ad is clicked (as per wikipedia
  • “PR” = “Public Relations”, is a strategic communication process that builds mutually beneficial relationships between organisations and the public, as per this website
  • “Direct Mail”: good old-fashioned snail mail advertising you get in the post
  • “Online Media Buys”: refers to matching an advertisement to an intended audience, I think. See also wikipedia

Exercise: Write down short statements regarding the following questions

  • When you first looked at the figure, what did you first study: the visualization or the text?
  • What key point do you think the data visualzation is meant to convey?
  • How well does the type of data visualization and its physical appearance (form, colours, contrast etc) convey the information given in the text?
  • What alternative forms might you choose to represent the data?

Note: This a data visualization exercise, not one in internet marketing. If some of those terms don’t make sense do you, that’s totally okay. If you can’t figure out what the figure is trying to tell you (honestly, I’m not sure, either), that’s fine, too! For a data visualization exercise like this, you can totally make up a message you want to bring across (in fact, I’m going to do exactly that further down below) and run with it. In reality, we’re scientists, however, and we don’t make up stories in general. In your work, you might be faced with two situations:

  • Exploratory data analysis: Often, data visualization is a key part of exploratory data analysis, where you encounter a new data set and you don’t know yet what’s in there. For example, data from a new telescope might contain systematic effects that lead to funny-looking data. Visualizing the data sets helps you figure out what your data looks like, what biases might be in it.
  • Explaining a result with a visualization: In our scientific (or non-scientific!) publications, we often use visualizations to explain a scientific result. In these cases, we already know the story, our scientific result, so in this case our task is to make sure that our visualization (1) represents the data accurately (everything else would be lying), and (2) that it allows the viewer to understand your results and how you’ve arrived there.

Getting the Data

As with the walkthrough, we first need to get the data in a machine-readable format from the figure.

For that, we’re also going to need and :

Now let’s store our data in a . We’ll do this a little differently from the previous walkthrough using matplotlib:

Here we’ve created two lists, one with the type of advertising, the other with the values for return on investment, and then passed both to a dictionary. We can now store these in a DataFrame:

adtyperoi
0SEO68.9
1Email56.7
2PPC52.4
3PR48.5
4Direct Mail37.4
5OMB19.9

Altair can work with DataFrames, but you could also save your data to either a json file or a csv (comma-separated values) file, and then pass it the file name and location (or URL) of the file. Pandas can write out both, which can sometimes help keeping the size of your notebook smaller.

But in this case, let’s use the DataFrame.

(Bar) Charts in Altair

All figures you generate in Altair follow a similar convention, and all are objects of class . usually takes your data as an input, and then you’ll use methods (functions that apply to a specific class of object) to specify what you actually want to plot.

Let’s give this a try:

png

What have we done? The syntax might look a little funny, but we’ve essentially called a number of methods on the class to tell it what to do. One thing we’ve used is the method. This tells Altair that the plot it should produce is a bar chart. Then we’ve used the method to actually tell it what values to plot.

You pretty much always have to call the method somewhere: you could pass in a DataFrame with many more columns (for example a column called “cost”), and then Altair wouldn’t know which ones to use and which ones to leave out. In the case above, we’ve told it to put “adtype” on the x-axis and “roi” on the y-axis. The syntax and after each tells Altair that “adtype” contains ordinal data (i.e. separate, distinct categories), and “roi” contains quantitative data (i.e. continuous numbers).

You might also notice that Altair has made some default choices about the dimensions of the chart, the units on the y-axis, the grid lines in the background, and so on. It also automatically adds axis labels when it has an idea of what labels to give it (it chose the column labels from our DataFrame in this case).

That chart looks great, but it’s a bit squashed. We can un-squash it by setting the size of the chart:

png

Looks better, doesn’t it? Unlike seaborn, Altair automatically gives the bars the same colour. You can specify the colour in two different ways. You can pass it directly to the method:

png

Any html colour code will work, along with any hexadecimal colour code you can generate on that website or others.

The other cool thing you can do is to do something called conditional formatting. Let’s say we want to colour online methods of advertising and offline methods of advertising separately. Let’s add a column to our DataFrame to do exactly that:

adtyperoionline
0SEO68.9yes
1Email56.7yes
2PPC52.4yes
3PR48.5no
4Direct Mail37.4no
5OMB19.9yes

I’ve told it that all but PR and Direct Mail are online versions. Instead of passing “color” to the method, we can also pass it to the method. Let’s see what happens when we do that:

png

It has now coloured the two bars differently, and it has automatically added a legend!

There are other values you can choose. For example, in a scatter plot, you could use as an encoding channel to vary the size of the points as a function of some property of each point. You can also encode , i.e. how transparent a plot element is:

png

Here, the colour remains the same, but offline methods are less opaque than online methods. For a list of all the different possible encodings, and how to use them, you can look at the relevant Altair documentation.

Warning: Being able to encode many different properties on the same plot doesn’t mean it’s a good idea to do so! People are unlikely to really understand more than 2-3 different dimensions on a plot. Wherever possible, try to use different encodings to reinforce important data properties (for example, you could use a “color” and “size” encoding using the same data property, so that for example in your scatter plot marks that are bigger will also be blue, and marks that are smaller will also be green). This helps viewers understand the structure of your data better.

We might not like Altair’s defaults for colour and opacity, so let’s change it. We can do that by using the function like so:

png

Here, we’ve told it to plot the online methods in orange, and then offline methods in blue. the function takes as first argument a condition, here , which basically says “take the data points in column “online” and find all rows for which the value is “yes”. The next two arguments specify what it should do if this condition is true (here, use an orange colour) and what it should do if the condition is false (use a blue colour). The function is pretty powerful and useful in Altair, and often used in interactive visualizations, so it’s worth understanding how it works! You can find more information in the Interaction section of the documentation.

Specifying Axes

What I currently don’t like about our plot are the axis labels. The column names we gave are pretty descriptive and short, so useful when you have to type them many times in a data analysis, but if you put them in your paper, few people would understand what they mean.

Let’s give our plot some more descriptive axis labels. For this, you’ll have to know that the encoding syntax we used above, where we wrote is a shorthand for a longer command. The shorthand is useful for quick plots where you don’t care about anything but the defaults, but when you want to specify more details, you might want to use a little bit more verbose syntax:

png

Here, we’ve used the and functions to specify more detail of our x- and y-axes, in this case by giving it the keyword with more descriptive titles.

Adding Text

In the previous walkthrough, we wanted to highlight the “Email” bar and give it text. We can do this in Altair, too, using a combination of the function we’ve seen before and the property.

For this, you should also know that it is possible to layer charts on top of each other. In our case, we’re going to layer a element and a element on top of each other in the same chart.

To do this, we’re going to save our marks in variables. So far, we’ve just typed the commands directly into the command field, and the notebook has automatically rendered the result. However, we can also save it in a variable of whatever name we choose, and then have the notebook render it later when we need it to.

So first, let’s generate our old bar chart, and let’s highlight “Email” using our conditional:

Executing that cell did not plot anything, because so far, we’ve only saved the chart specification in a variable, without telling the notebook to actually render it.

Let’s now use to draw our numbers:

png

That almost looks like our matplotlib plot!

As a last step, we’d like to sort the bar chart by height in a descending order. We can use this by using the “sort” property on :

png

Allright, that looks pretty similar to our matplotlib version.

However, one of the great things about altair is that it’s pretty easy to include interactivity. For a simple zooming functionality, all you need to do is add :

png

Granted, zooming around in a bar chart isn’t particularly satisfying, so let’s do something more fun. For example, perhaps we don’t want to highlight the bar labelled “E-mail”, but highlight a bar whenever someone clicks on it.

You can implement that using the function. We’ll use the “selection” function to determine the colour:

png

Clicking on each bar should highlight that bar in red.

We can also do an interval selection, where you drag a window and it will mark all bars within that window:

png

This should allow you to drag your mouse across the chart and highlight bars. The keyword in the function binds the rectangle to the x-axis (that is, you can only select along the x-axis, while you always select all of the y-axis).

Exercise: Try leaving out the keyword or changing it to . What happens?

We can also have it highlight a bar when we just run our mouse over it, without clicking. For that, we’re going to use our single selection again:

png

Maybe a useful thing would be if the plot also displayed some information every time you mouse over a bar. You can do this by adding a attribute to your method:

png

In the last step, let’s make a plot with two panels!

For this, we’re going to invent another data sets: for each of our types of advertising, we’re going to invent a cost in millions of dollars for a hypothetical company. Presumably, even though something has a high ROI, it might still not be feasible if it costs more than a company can afford.

Let’s come up with some values:

We are now going to make two bar charts, and then link them together. Let’s do this first without all the fancy formatting for clarity:

png

Making multi-panel plots is as simple as doing something like : the symbol tells altair that it should make a two-panel plot. For a vertically stacked multi-panel chart, you can use the symbol.

We now want to add our properties and selections back in, and we want to make sure the same is selected on both sides:

png

Now when you click on any of the bars on either side, it’ll highlight the corresponding bar on the other side. It becomes pretty easy to see that Direct Mail is a terrible idea (very expensive, because you have to actually produce and send physical letters), and that SEO and e-mail are cheap and effective (of course, we’ve just made that up!).

And that’s it for this tutorial! I very much encourage you to look at the ever-growing example gallery on the Altair website, and the Altair documentation more generally, which is great. Have fun exploring!

Key Points

  • altair is a powerful library for generating (interactive) visualizations

  • Matching the type of visualization to your type of data can drastically improve readability

  • Choosing an informative and high-contrast colour palette can help make the figure viewable to a wide range of viewers

Sours: https://huppenkothen.org/data-visualization-tutorial/13-walkthrough-altair/index.html

Line chart altair

Making Interactive Line Plots with Python Pandas and Altair

Line plot is an essential part of data analysis. It gives us an overview of how a quantity changes over sequential measurements. In case of working with time series, the importance of line plots becomes crucial.

Trend, seasonality, and correlation are some features that can be observed on carefully generated line plots. In this article, we will create interactive line plots using two Python libraries: Pandas and Altair.

Pandas provides the data and Altair makes beautiful and informative line plots. Although Pandas is also able to plot data, it is not an explicit data visualization library. Besides, we will make the plots interactive which cannot be accomplished with Pandas.

Let’s start with generating the data. A typical use case of line plots is analyzing stock prices. One of the simplest ways to get stock price data is library. We first need to import it along with Pandas (already installed in Google Colab).

import pandas as pd
from pandas_datareader import data

We will get the prices of 3 different stocks for a period of 1 year. The start date, end date, and the source need to be specified.

start = '2020-1-1'
end = '2020-12-31'
source = 'yahoo'

There is one more required information which is the name of the stock.

apple = data.DataReader("AAPL", start=start ,end=end, data_source=source).reset_index()[["Date", "Close"]]ibm = data.DataReader("IBM", start=start ,end=end, data_source=source).reset_index()[["Date", "Close"]]microsoft = data.DataReader("MSFT", start=start ,end=end, data_source=source).reset_index()[["Date", "Close"]]

We now have stock prices of Apple, IBM, and Microsoft in 2020. It is better to have them in a single data frame. Before combining, we need to add a column that indicates which stock a particular price belongs to.

The following code block adds relevant columns and then combines the data frames by using the function.

apple["Stock"] = "apple"
ibm["Stock"] = "ibm"
microsoft["Stock"] = "msft"stocks["Month"] = stocks.Date.dt.monthstocks = pd.concat([apple, ibm, microsoft])

We have also added the month information which might be useful for analysis. We can now start on creating the plots.

Altair

Altair is a statistical visualization library for Python. Its syntax is clean and easy to understand as we will see in the examples. It is also very simple to create interactive visualizations with Altair.

I will briefly explain the structure of Altair and then focus on creating interactive line plots. If you are new to Altair, here is an Altair tutorial as a 4-part series:

Here is a simple line plot that does not possess any interactivity.

alt.Chart(stocks).mark_line().encode(
x="Date",
y="Close",
color="Stock"
).properties(
height=300, width=500
)

The basic structure starts with a top-level Chart object. The data can be in the form of a Pandas data frame or a URL string pointing to a json or csv file. Then the type of visualization (e.g. , , and so on) is specified.

The function tells Altair what to plot in the given data frame. Thus, anything we write in the function must be linked to the data. The parameter distinguished different stock names. It is same as the parameter of Seaborn. Finally, we specify certain properties of the plot using the function.

One method for adding interactivity to a plot is through selections. A selection in Altair captures interactions from the user.

selection = alt.selection_multi(fields=["Stock"], bind="legend")alt.Chart(stocks).mark_line().encode(
x="Date",
y="Close",
color="Stock",
opacity=alt.condition(selection, alt.value(1), alt.value(0.1))
).properties(
height=300, width=500
).add_selection(
selection
)

The selection object above is based on the stock column which contains the names of the stocks. It is bound to the legend. We pass it to the parameter so the opacity of a line changes according to the selected stock name.

We also need to add the selection to the plot using the function. The following two images demonstrate how selection works. We just need to click on the stock name in the legend. Then, the plot is updated accordingly.

Altair provides other options to capture user interactions. For instance, we can create an interactive line plot that is updated with hovering your mouse on it.

The following code creates a selection object that performs the selection we have just described.

hover = alt.selection(
type="single", on="mouseover", fields=["Stock"], nearest=True
)

We will use the selection object to capture the nearest point on the plot and then highlight the line this point belongs to.

There are 3 components in the following code. The first one creates the line plot. The second one is a scatter plot drawn on the line plot and it is used for identifying the nearest point. We adjust the opacity so that the scatter plot is not visible.

The third one is responsible for highlighting the line that contains the captured point in the second plot.

# line plot
lineplot = alt.Chart(stocks).mark_line().encode(
x="Date:T",
y="Close:Q",
color="Stock:N",
)# nearest point
point = lineplot.mark_circle().encode(
opacity=alt.value(0)
).add_selection(hover)# highlight
singleline = lineplot.mark_line().encode(
size=alt.condition(~hover, alt.value(0.5), alt.value(3))
)

The interactive line plot can now be generated by combining the second and third plots.

point + singleline

The first image shows the original or raw plot. The second figure shows the updated version as I hover on the plot.

Conclusion

Altair is quite flexible in terms of the ways to add interactive components to the visualization. Once you have a comprehensive understanding of the elements of interactivity, you can enrich your visualizations.

Thank you for reading. Please let me know if you have any feedback.

Sours: https://towardsdatascience.com/making-interactive-line-plots-with-python-pandas-and-altair-7ee1d109e3dd
Altair Python Vega Dataset Example - How to Install Altair - Data Visualization using Altair

Area Chart with Altair in Python

Prerequisite: Introduction to Altair in Python

An Area Graph shows the change in a quantitative quantity with respect to some other variable. It is simply a line chart where the area under the curve is colored/shaded. It is best used to visualize trends over a period of time, where you want to see how the value of one variable changes over a period of time or with respect to another variable and do not care about the exact data values. Some modifications of the area chart are the stacked area chart and streamgraph.

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course

Area Graph is readily available in Altair and can be applied using the mark_area() function.



Creating an Area Chart

To make an area chart, simply select suitable variables from the dataset and map them to the x and y encoding, where the quantitative variable should be mapped to the x encoding.

The dataset used in this article is from the Vega_datasets library.

Code:

Python3

 
 
 
 
 

Output: 

Simple Area Chart using Altair

Customizing the Area Chart

The following simple customizations can be done on an area chart: 

  • Area Color: You can change the default color of the area color by setting the color parameter of the mark_area() method.
  • Opacity: You can change the default opacity of the area by setting the opacity parameter of the mark_area() method. It ranges from 0 to 1.
  • Line Color: You can also change the color of the actual line plot by specifying the value of the color key in the line dictionary parameter of the mark_area() method.

Example: 

Python3

 
 
 
 
 

Output: 

Customized Area Chart using Altair

 




My Personal Notesarrow_drop_up
Sours: https://www.geeksforgeeks.org/area-chart-with-altair-in-python/

Now discussing:

Soon to their house, landau, So imposing, rolled up, And in the landau - three overdressed crocodiles sitting - Mom and two daughters. In a row, They beat the zenki on daddy. Daddy's knees bent over, damn it, from fear, - Maybe I gave a blast from the foolishness. - Dad began to think, And with the intention, to fade, With a show-off, he has nothing to do with, Making.

His face a brick, Pulling his egg, Dad moved to the porch, And almost, already fucked.



2593 2594 2595 2596 2597